MIMIC-RD:大语言模型能否在真实临床环境中进行罕见病的鉴别诊断?

📄 中文摘要

尽管罕见病影响着十分之一的美国人,但其鉴别诊断仍然具有挑战性。本研究提出了MIMIC-RD,这是一个通过将临床文本实体直接映射到Orphanet数据库来构建的罕见病鉴别诊断基准。研究团队采用了基于大语言模型的初步挖掘过程,并由四名医学注释员进行验证,以确认所识别的实体确实为罕见病。研究对145名患者的数据集进行评估,发现目前最先进的大语言模型在罕见病鉴别诊断方面表现不佳,突显了现有能力与临床需求之间存在显著差距。研究克服了现有评估方法依赖理想化临床案例研究或ICD编码的局限性,为改进罕见病的鉴别诊断提供了重要见解和未来发展方向。

📄 English Summary

MIMIC-RD: Can LLMs differentially diagnose rare diseases in real-world clinical settings?

This study introduces MIMIC-RD, a novel benchmark for rare disease differential diagnosis that addresses critical limitations in existing approaches. While rare diseases affect 1 in 10 Americans, their diagnosis remains challenging. The researchers developed a methodology that directly maps clinical text entities to Orphanet, overcoming the limitations of both idealized case studies and ICD code-based approaches. The process involved an initial LLM-based mining phase followed by validation from four medical annotators. Through evaluation of various models on a dataset of 145 patients, the study revealed significant performance gaps in current state-of-the-art LLMs for rare disease differential diagnosis. This work highlights the substantial distance between existing AI capabilities and real clinical needs, while providing valuable insights and directions for future improvements in rare disease diagnosis.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等