基于多模态一致性指导的无参考数据选择用于自动语音识别口音适应

出处: Multimodal Consistency-Guided Reference-Free Data Selection for ASR Accent Adaptation

发布: 2026年2月17日

📄 中文摘要

自动语音识别（ASR）系统在处理带有口音的语音时常常表现不佳，因为声学-语音和韵律的变化导致与训练数据的不匹配，使得标注的口音适应成本高昂。常见的伪标签选择启发式方法主要以文本为中心（如困惑度（PPL）过滤），可能偏向流畅但声学上不匹配的假设，从而在微调时导致错误放大。为了解决这一问题，提出了一种基于多模态一致性指导的无参考数据选择管道，旨在在无标签的转导协议下进行ASR口音适应。该管道首先通过基于子模块互信息的目标感知预选择步骤，提升查询相关性并减少后续计算。该方法有效提高了口音适应的效率和准确性。

🏷️ 相关标签

#自动语音识别 #口音适应 #数据选择 #多模态一致性 #无参考

📄 English Summary

Multimodal Consistency-Guided Reference-Free Data Selection for ASR Accent Adaptation

Automatic speech recognition (ASR) systems often perform poorly on accented speech due to acoustic-phonetic and prosodic shifts that create mismatches with training data, making labeled accent adaptation costly. Common pseudo-label selection heuristics are primarily text-centric, such as perplexity (PPL) filtering, which may favor fluent yet acoustically mismatched hypotheses, leading to error amplification during fine-tuning. To address this issue, a multimodal consistency-guided, reference-free data selection pipeline is proposed for ASR accent adaptation under a transductive, label-free protocol. The pipeline begins with a target-aware preselection step based on submodular mutual information to enhance query relevance and reduce downstream computation. This approach significantly improves the efficiency and accuracy of accent adaptation.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

Multimodal Consistency-Guided Reference-Free Data Selection for ASR Accent Adaptation

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误