通过运行时事实实现 83.4% 的修复率的 SWE-bench 验证

出处: Achieving an 83.4% Fix Rate on SWE-bench Verified with Runtime Facts

发布: 2026年2月26日

📄 中文摘要

在最新的 SWE-bench 验证测试中，验证了一种新的 AI 调试范式：基于运行时事实的系统调试。通过在 Live-SWE-agent 架构中引入动态追踪机制，为模型提供运行时上下文，使用 Google Gemini 3 Pro 模型实现了理论上的 83.4% 修复率，标志着迄今为止在 SWE-bench 验证评估中已知的最高性能。与同一模型在原始 Live-SWE-agent 上的 77.4% 基线性能相比，成功修复了以前无法解决的复杂错误，充分利用了运行时事实作为决策依据。

🏷️ 相关标签

#AI调试 #运行时事实 #动态追踪 #SWE-bench #修复率

📄 English Summary

Achieving an 83.4% Fix Rate on SWE-bench Verified with Runtime Facts

The latest SWE-bench Verified tests validated a new AI debugging paradigm: systematic debugging based on Runtime Facts. By introducing a dynamic tracing mechanism into the Live-SWE-agent architecture to provide the model with runtime context, a theoretical combined fix rate of 83.4% was achieved using the Google Gemini 3 Pro model, marking the highest known performance on the SWE-bench Verified evaluation to date. Compared to the baseline performance of 77.4% of the same model on the original Live-SWE-agent, complex bugs that were previously unsolvable were successfully fixed by leveraging Runtime Facts as a decision-making tool.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

Achieving an 83.4% Fix Rate on SWE-bench Verified with Runtime Facts

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误