牛津大学32%的错误率：医疗大型语言模型究竟有多安全？

出处: Oxford’s 32% Error Rate: How Safe Are Medical LLMs, Really?

发布: 2026年2月11日

📄 中文摘要

一项与牛津大学相关的研究发现，大型语言模型在医疗摘要中产生临床不安全内容或幻觉的概率约为32%。这一缺陷并非微不足道，表明当前系统作为自主临床参与者的安全性不足。对于医疗领导者而言，核心问题在于：大型语言模型的失败频率、失败方式，以及治理和技术控制是否能够有效降低风险。研究指出，三分之一的临床问题输出排除了无监督的床边使用，但在严格控制的辅助工作流程中可能是可以接受的。

🏷️ 相关标签

#大型语言模型 #医疗安全 #错误率 #临床摘要 #幻觉

📄 English Summary

Oxford’s 32% Error Rate: How Safe Are Medical LLMs, Really?

A study affiliated with Oxford University found that large language models produce clinically unsafe content or hallucinations in approximately 32% of medical summaries. This is not a trivial flaw; it indicates that current systems are unsafe as autonomous clinical actors. For healthcare leaders, the key questions are how often LLMs fail, how they fail, and whether governance and technical controls can effectively mitigate the risks. The study highlights that a one-in-three chance of clinically problematic output rules out unsupervised bedside use, but may be acceptable in tightly controlled assistive workflows.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

Oxford’s 32% Error Rate: How Safe Are Medical LLMs, Really?

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误