困惑性悖论：为何代码在大型语言模型提示中压缩效果优于数学

出处: The Perplexity Paradox: Why Code Compresses Better Than Math in LLM Prompts

发布: 2026年2月19日

📄 中文摘要

研究表明，代码生成在提示压缩方面具有较强的容忍度，而链式推理的性能则逐渐下降。通过对六个代码基准（HumanEval、MBPP、HumanEval+、MultiPL-E）和四个推理基准（GSM8K、MATH、ARC-Challenge、MMLU-STEM）的验证，确认了压缩阈值在不同语言和难度上的普适性。此外，首次进行的逐词困惑度分析揭示了“困惑性悖论”：代码语法标记在高困惑度下得以保留，而推理过程则受到影响。这些发现为理解代码与数学在提示压缩中的表现差异提供了新的视角，并为未来的自适应算法奠定了基础。

🏷️ 相关标签

#困惑性悖论 #代码生成 #提示压缩 #推理基准 #自适应算法

📄 English Summary

The Perplexity Paradox: Why Code Compresses Better Than Math in LLM Prompts

The study reveals that code generation exhibits a strong tolerance for aggressive prompt compression, while chain-of-thought reasoning degrades gradually. Validation across six code benchmarks (HumanEval, MBPP, HumanEval+, MultiPL-E) and four reasoning benchmarks (GSM8K, MATH, ARC-Challenge, MMLU-STEM) confirms the generalizability of the compression threshold across languages and difficulties. Furthermore, the first per-token perplexity analysis uncovers a 'perplexity paradox': code syntax tokens are preserved under high perplexity, whereas reasoning processes are adversely affected. These findings provide new insights into the differences in prompt compression performance between code and mathematics, laying the groundwork for future adaptive algorithms.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

The Perplexity Paradox: Why Code Compresses Better Than Math in LLM Prompts

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误