TherapyGym:评估和对齐治疗聊天机器人的临床忠实性与安全性

📄 中文摘要

随着大型语言模型(LLMs)在心理健康支持中的应用日益广泛,现有的评估方法如流畅性指标、偏好测试和通用对话基准未能有效捕捉心理治疗的临床关键维度。THERAPYGYM框架的提出旨在沿着两个临床支柱评估和改进治疗聊天机器人:忠实性和安全性。忠实性通过认知疗法评分量表(CTRS)进行评估,该量表作为自动化流程实施,评分依据是对认知行为疗法(CBT)技术在多轮会话中的遵循程度。安全性则采用多标签注释方案进行评估,涵盖特定于治疗的风险(例如,未能处理伤害或虐待问题)。为减少基于LLM的评判者的偏见和不可靠性,进一步进行了相关研究。

📄 English Summary

TherapyGym: Evaluating and Aligning Clinical Fidelity and Safety in Therapy Chatbots

The increasing use of large language models (LLMs) for mental health support has highlighted the inadequacy of existing evaluation methods, such as fluency metrics, preference tests, and generic dialogue benchmarks, in capturing the clinically critical dimensions of psychotherapy. The proposed THERAPYGYM framework aims to evaluate and enhance therapy chatbots along two clinical pillars: fidelity and safety. Fidelity is assessed using the Cognitive Therapy Rating Scale (CTRS), implemented as an automated pipeline that scores adherence to cognitive behavioral therapy (CBT) techniques across multi-turn sessions. Safety is evaluated through a multi-label annotation scheme that addresses therapy-specific risks, such as the failure to address harm or abuse. Additionally, measures are taken to mitigate bias and unreliability in LLM-based judges.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等