当一致性变成偏见:半结构化临床访谈中的面试官效应

📄 中文摘要

自动化抑郁症检测在医生与患者对话中逐渐受到关注,这得益于公共语料库的可用性和语言建模的进步。然而,模型的可解释性仍然有限:尽管报告了强劲的性能,但往往未能揭示预测背后的驱动因素。对ANDROIDS、DAIC-WOZ和E-DAIC三个数据集的分析显示,半结构化访谈中的面试官提示存在系统性偏见。训练于面试官发言的模型利用固定的提示和位置来区分抑郁症患者与对照组,通常在不使用参与者语言的情况下实现高分类分数。限制模型仅使用参与者的发言能够更广泛地分配决策证据,并反映出真实的语言线索。

📄 English Summary

When Consistency Becomes Bias: Interviewer Effects in Semi-Structured Clinical Interviews

Automatic depression detection from doctor-patient conversations has gained traction due to the availability of public corpora and advancements in language modeling. However, the interpretability of these models remains limited, as strong performance is often reported without clarity on the underlying drivers of predictions. An analysis of three datasets—ANDROIDS, DAIC-WOZ, and E-DAIC—reveals a systematic bias stemming from interviewer prompts in semi-structured interviews. Models trained on interviewer turns exploit fixed prompts and positions to differentiate between depressed and control subjects, frequently achieving high classification scores without utilizing participant language. By restricting models to participant utterances, decision evidence is distributed more broadly and reflects genuine linguistic cues.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等