VeRA：大规模验证推理数据增强

出处: VeRA: Verified Reasoning Data Augmentation at Scale

发布: 2026年2月17日

📄 中文摘要

当前大多数评估方案的主要问题在于其“静态”特性：同样的问题被反复使用，导致记忆、格式利用和最终的饱和。为了衡量真正的人工智能进展，需要构建稳健的评估，而不是事后检测。为此，提出了VeRA（验证推理数据增强）框架，该框架将基准问题转换为可执行的规范，包含（i）带占位符的自然语言模板，（ii）一个生成器用于采样有效配置，以及（iii）一个确定性验证器用于验证参数并计算每个配置的正确答案。通过一个种子问题，VeRA能够自动生成多种新的评估问题，从而提高评估的多样性和有效性。

🏷️ 相关标签

#验证推理 #数据增强 #评估方案 #人工智能 #基准问题

📄 English Summary

VeRA: Verified Reasoning Data Augmentation at Scale

The main issue with most evaluation schemes today is their static nature, where the same problems are reused repeatedly, leading to memorization, format exploitation, and eventual saturation. To measure genuine AI progress, robust evaluation by design is necessary rather than post-hoc detection. In response, the VeRA (Verified Reasoning Data Augmentation) framework is proposed, which converts benchmark problems into executable specifications. This includes (i) a natural language template with placeholder slots, (ii) a coherent generator that samples valid configurations, and (iii) a deterministic verifier that validates parameters and calculates the corresponding correct answers for each configuration. From a single seed problem, VeRA can automatically generate a variety of new evaluation problems, enhancing the diversity and effectiveness of assessments.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

VeRA: Verified Reasoning Data Augmentation at Scale

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误