一种语言,两种书写:探究大型语言模型概念表示中的书写不变性
📄 中文摘要
研究表明,稀疏自编码器(SAEs)学习的特征是否代表抽象意义,或与文本书写方式相关。以塞尔维亚双书写法为实验基础,塞尔维亚语可在拉丁和西里尔两种书写系统中交替书写,且两者之间几乎完美的字符映射使得可以在保持意义不变的情况下,改变正字法。重要的是,这两种书写系统的标记化方式完全不同,完全不共享任何标记。对Gemma模型系列(270M-27B参数)中SAE特征激活的分析显示,在不同塞尔维亚书写系统中,相同句子激活的特征高度重叠,远超随机基线。显著的是,改变书写系统所导致的表示差异较小,表明SAE在捕捉抽象意义方面具有一定的鲁棒性。
📄 English Summary
One Language, Two Scripts: Probing Script-Invariance in LLM Concept Representations
This study investigates whether the features learned by Sparse Autoencoders (SAEs) represent abstract meaning or are tied to the way text is written. Using Serbian digraphia as a controlled testbed, where Serbian can be written in both Latin and Cyrillic scripts with a near-perfect character mapping, it allows for variation in orthography while keeping meaning constant. Notably, these scripts are tokenized completely differently, sharing no tokens. Analyzing SAE feature activations across the Gemma model family (270M-27B parameters) reveals that identical sentences in different Serbian scripts activate highly overlapping features, significantly exceeding random baselines. Remarkably, changing the script results in less representational difference, indicating a certain robustness of SAEs in capturing abstract meaning.
Powered by Cloudflare Workers + Payload CMS + Claude 3.5
数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等