在大型语言模型中探测模因：纠缠评估世界的范式

出处: Probing Memes in LLMs: A Paradigm for the Entangled Evaluation World

发布: 2026年3月6日

📄 中文摘要

当前大型语言模型（LLMs）的评估范式将模型与数据集分开描述，导致评估结果较为粗糙：数据集中的项目被视为预标记条目，而模型则通过整体得分（如准确率）来总结，忽视了不同属性项目间模型行为的多样性。为了解决这一问题，提出了将LLMs视为由模因构成的概念，模因是道金斯提出的文化基因，能够复制知识和行为。在这一视角下，Probing Memes范式重新构想了评估为一个模型与数据的纠缠世界，重点关注感知矩阵，该矩阵捕捉模型与项目之间的交互，从而使得探测属性得以实现。

🏷️ 相关标签

#大型语言模型 #模因 #评估范式 #感知矩阵 #模型行为

📄 English Summary

Probing Memes in LLMs: A Paradigm for the Entangled Evaluation World

Current evaluation paradigms for large language models (LLMs) characterize models and datasets separately, leading to coarse evaluations. Items in datasets are treated as pre-labeled entries, while models are summarized by overall scores like accuracy, ignoring the diversity of model behaviors across items with varying properties. To address this issue, LLMs are conceptualized as composed of memes, a notion introduced by Dawkins as cultural genes that replicate knowledge and behavior. Building on this perspective, the Probing Memes paradigm reconceptualizes evaluation as an entangled world of models and data, centering on a Perception Matrix that captures model-item interactions, enabling the exploration of Probe Properties.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

Probing Memes in LLMs: A Paradigm for the Entangled Evaluation World

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误