超越准确性：引入符号-机械方法进行可解释性评估

出处: Beyond Accuracy: Introducing a Symbolic-Mechanistic Approach to Interpretable Evaluation

发布: 2026年3月26日

📄 中文摘要

准确性评估无法可靠地区分真正的泛化与记忆、泄漏或脆弱启发式等捷径，尤其是在小数据环境下。提出了一种机制感知的评估方法，结合任务相关的符号规则与机械可解释性，生成算法的通过/失败评分，明确显示模型在哪些方面实现了泛化，在哪些方面则是利用了模式。在NL-to-SQL的实验中，训练了两个相同架构的模型，分别在不同条件下：一个没有模式信息（迫使记忆），一个有模式信息（实现基础）。标准评估显示，记忆模型在未见数据上达到了94%的字段名称准确率，虚假地暗示了其能力。

🏷️ 相关标签

#可解释性评估 #符号规则 #机械可解释性 #泛化 #NL-to-SQL

📄 English Summary

Beyond Accuracy: Introducing a Symbolic-Mechanistic Approach to Interpretable Evaluation

Accuracy-based evaluation fails to reliably distinguish genuine generalization from shortcuts such as memorization, leakage, or brittle heuristics, particularly in small-data regimes. A mechanism-aware evaluation approach is proposed that combines task-relevant symbolic rules with mechanistic interpretability, yielding algorithmic pass/fail scores that clearly indicate where models generalize versus exploit patterns. This is demonstrated in the context of NL-to-SQL by training two identical architectures under different conditions: one without schema information (forcing memorization) and one with schema (enabling grounding). Standard evaluation shows that the memorization model achieves 94% field-name accuracy on unseen data, falsely suggesting competence.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

Beyond Accuracy: Introducing a Symbolic-Mechanistic Approach to Interpretable Evaluation

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误