📄 中文摘要
大型语言模型(LLM)驱动的自动语音识别(ASR)系统通过将冻结的语音编码器与预训练LLM通过轻量级连接器连接,在有限资源下取得了卓越性能。现有研究通常为每种语言训练一个独立的连接器,这种做法忽视了语言之间的亲缘关系。提出了一种高效且新颖的连接器共享策略,该策略基于语言家族成员关系,旨在实现每个语言家族仅使用一个连接器。通过利用语言学上的相似性,该方法显著减少了所需训练的连接器数量,从而优化了资源消耗和训练效率。这种策略不仅能有效降低多语言ASR系统的复杂性,还能在不同语言之间实现知识共享,特别是在低资源语言场景下,预期能通过共享家族内高资源语言的连接器参数,提升其识别性能。该方法的核心在于设计一种能够泛化到同家族内多种语言的连接器架构,并通过实证验证其在不同语言家族间的有效性和性能表现。实验结果表明,这种基于语言家族的连接器共享策略在保持甚至提升ASR性能的同时,显著降低了模型参数量和训练成本,为构建更高效、更具可扩展性的多语言ASR系统提供了一条新途径。
📄 English Summary
Language Family Matters: Evaluating LLM-Based ASR Across Linguistic Boundaries
Large Language Model (LLM)-powered Automatic Speech Recognition (ASR) systems achieve strong performance with limited resources by linking a frozen speech encoder to a pretrained LLM via a lightweight connector. Prior work trains a separate connector per language, overlooking linguistic relatedness. A novel and efficient connector-sharing strategy is proposed, based on linguistic family membership, enabling the use of a single connector per language family. This approach significantly reduces the number of connectors required for multilingual ASR systems, optimizing resource consumption and training efficiency by leveraging linguistic similarities. The strategy not only simplifies the complexity of multilingual ASR systems but also facilitates knowledge sharing across languages, particularly benefiting low-resource languages by sharing connector parameters with higher-resource languages within the same family. The core of this method lies in designing a connector architecture capable of generalizing across multiple languages within the same linguistic family, with its effectiveness and performance empirically validated across various language families. Experimental results demonstrate that this linguistic family-based connector-sharing strategy substantially reduces model parameters and training costs while maintaining or even improving ASR performance. This provides a new avenue for constructing more efficient and scalable multilingual ASR systems, showcasing the potential for cross-lingual transfer learning within a shared linguistic framework.
Powered by Cloudflare Workers + Payload CMS + Claude 3.5
数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等