GPT-5.4 Mini以10%的成本匹配人类水平的计算机使用能力

出处: GPT-5.4 Mini Matches Human-Level Computer Use at 10% the Cost — Full Benchmark Breakdown

发布: 2026年3月20日

📄 中文摘要

OpenAI于3月17日发布了GPT-5.4 mini和nano版本。根据基准测试结果，GPT-5.4 mini在SWE-Bench Pro基准测试中得分为54.4%，而完整版本的GPT-5.4得分为57.7%，两者之间的差距缩小至3.3分，较上代产品的12分显著减少。在OSWorld-Verified测试中，mini版本得分为72.1%，接近人类基准72.4%，显示出小型模型在计算机操作方面达到了人类水平的表现。这些结果表明，GPT-5.4 mini在性能和成本效益方面具有显著优势。

🏷️ 相关标签

#GPT-5.4 #人工智能 #计算机操作 #基准测试 #成本效益

📄 English Summary

GPT-5.4 Mini Matches Human-Level Computer Use at 10% the Cost — Full Benchmark Breakdown

OpenAI released GPT-5.4 mini and nano on March 17. Benchmark results indicate that GPT-5.4 mini scored 54.4% on the SWE-Bench Pro, while the full GPT-5.4 scored 57.7%, narrowing the gap to 3.3 points from the previous generation's 12 points. In the OSWorld-Verified test, the mini version achieved a score of 72.1%, closely matching the human baseline of 72.4%. These results demonstrate that the small model performs at human-level computer operation, highlighting significant advantages in performance and cost-effectiveness for GPT-5.4 mini.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

GPT-5.4 Mini Matches Human-Level Computer Use at 10% the Cost — Full Benchmark Breakdown

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误