GPT-5.4 Mini以10%的成本匹配人类水平的计算机使用能力
📄 中文摘要
OpenAI于3月17日发布了GPT-5.4 mini和nano版本。根据基准测试结果,GPT-5.4 mini在SWE-Bench Pro基准测试中得分为54.4%,而完整版本的GPT-5.4得分为57.7%,两者之间的差距缩小至3.3分,较上代产品的12分显著减少。在OSWorld-Verified测试中,mini版本得分为72.1%,接近人类基准72.4%,显示出小型模型在计算机操作方面达到了人类水平的表现。这些结果表明,GPT-5.4 mini在性能和成本效益方面具有显著优势。
📄 English Summary
GPT-5.4 Mini Matches Human-Level Computer Use at 10% the Cost — Full Benchmark Breakdown
OpenAI released GPT-5.4 mini and nano on March 17. Benchmark results indicate that GPT-5.4 mini scored 54.4% on the SWE-Bench Pro, while the full GPT-5.4 scored 57.7%, narrowing the gap to 3.3 points from the previous generation's 12 points. In the OSWorld-Verified test, the mini version achieved a score of 72.1%, closely matching the human baseline of 72.4%. These results demonstrate that the small model performs at human-level computer operation, highlighting significant advantages in performance and cost-effectiveness for GPT-5.4 mini.
Powered by Cloudflare Workers + Payload CMS + Claude 3.5
数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等