医疗查询回答系统中的拼写纠正:方法、检索影响及实证评估
📄 中文摘要
医疗问答系统面临着一个持续的挑战:用户提交的查询中拼写错误的比例远高于他们所搜索的专业文档。研究通过对两个公共数据集的错误普查,首次对拼写纠正作为医疗问答检索预处理步骤进行了控制研究。数据集包括TREC 2017 LiveQA医学轨道(104个消费者健康问题)和HealthSearchQA(来自Google自动完成的4,436个健康查询)。结果显示,61.5%的真实医疗查询至少包含一个拼写错误,词汇级错误率为11.0%。研究评估了四种纠正方法,包括保守编辑距离和标准编辑距离等,分析了这些方法对检索效果的影响。
📄 English Summary
Spelling Correction in Healthcare Query-Answer Systems: Methods, Retrieval Impact, and Empirical Evaluation
Healthcare question-answering (QA) systems encounter a significant challenge due to the high rate of spelling errors in user-submitted queries, which exceeds that found in professional documents. This research presents the first controlled study of spelling correction as a retrieval preprocessing step in healthcare QA, utilizing real consumer queries. An error census was conducted across two public datasets: the TREC 2017 LiveQA Medical track, comprising 104 consumer health questions, and HealthSearchQA, which includes 4,436 health queries from Google autocomplete. Findings reveal that 61.5% of real medical queries contain at least one spelling error, with a token-level error rate of 11.0%. Four correction methods were evaluated, including conservative edit distance and standard edit distance, analyzing their impact on retrieval effectiveness.
Powered by Cloudflare Workers + Payload CMS + Claude 3.5
数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等