📄 中文摘要
AI技术已经无处不在,成为了一项突破性的先进技术。然而,当前大型语言模型(LLMs)在支持语言方面存在明显的偏差,主要集中于高资源语言,尤其是英语。研究显示,领先的LLMs中超过92%的训练标记是英语。全球约有7000种语言,但大多数LLMs仅对少数语言提供有效支持,这一现象在AI技术的广泛应用中显得尤为突出,成为一个不容忽视的问题。
📄 English Summary
The Irony of Language Models That Don't Speak Your Language
AI technology has become ubiquitous, marking a breakthrough in advanced technology. However, a significant issue that remains largely unaddressed is that large language models (LLMs) are predominantly centered around high-resource languages, particularly English. Research indicates that over 92% of training tokens in leading LLMs are in English. With approximately 7,000 languages spoken worldwide, most LLMs only provide meaningful support for a limited number of languages, highlighting a critical disparity in the widespread application of AI technology.
Powered by Cloudflare Workers + Payload CMS + Claude 3.5
数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等