Indic-TunedLens:印度语言中的多语言模型解释

📄 中文摘要

多语言大型语言模型(LLMs)在印度等语言多样化地区的应用日益增加,但大多数解释工具仍然以英语为中心。已有研究表明,LLMs通常在以英语为中心的表示空间中运作,这使得跨语言解释成为一个紧迫的问题。Indic-TunedLens是一个专为印度语言设计的全新解释框架,通过学习共享的仿射变换来实现。与标准的Logit Lens直接解码中间激活不同,Indic-TunedLens为每种目标语言调整隐藏状态,使其与目标输出分布对齐,从而更忠实地解码模型表示。该框架在10种印度语言上进行了评估,显示出其有效性。

📄 English Summary

Indic-TunedLens: Interpreting Multilingual Models in Indian Languages

Multilingual large language models (LLMs) are increasingly utilized in linguistically diverse regions such as India, yet most interpretability tools remain focused on English. Prior studies indicate that LLMs often function within English-centric representation spaces, highlighting the urgent need for cross-lingual interpretability. Indic-TunedLens is introduced as a novel interpretability framework specifically designed for Indian languages, learning shared affine transformations. Unlike the standard Logit Lens, which directly decodes intermediate activations, Indic-TunedLens adjusts hidden states for each target language, aligning them with target output distributions to facilitate more faithful decoding of model representations. The framework has been evaluated across 10 Indian languages, demonstrating its effectiveness.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等