📄 中文摘要
多模态大型语言模型为健康人工智能的“黑箱”特性提供了有希望的解决方案,通过生成可解释的推理轨迹。然而,验证这些轨迹的有效性仍然是一个关键挑战。现有的评估方法要么不可扩展,依赖于人工临床审查,要么肤浅,利用代理指标(如问答)未能捕捉临床逻辑的语义正确性。研究提出了一种可重复的框架,用于评估心电图信号中的推理。推理被分解为两个不同的组成部分:感知,即在原始信号中准确识别模式;推理,即将领域知识逻辑应用于这些模式。
📄 English Summary
How Well Do Multimodal Models Reason on ECG Signals?
Multimodal large language models offer a promising solution to the 'black box' nature of health AI by generating interpretable reasoning traces. However, verifying the validity of these traces remains a critical challenge. Existing evaluation methods are either unscalable, relying on manual clinician review, or superficial, utilizing proxy metrics (e.g., QA) that fail to capture the semantic correctness of clinical logic. A reproducible framework is introduced for evaluating reasoning in ECG signals. Reasoning is decomposed into two distinct components: (i) Perception, which involves accurately identifying patterns within the raw signal, and (ii) Deduction, which entails the logical application of domain knowledge to those patterns.
Powered by Cloudflare Workers + Payload CMS + Claude 3.5
数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等