测量人工智能代理在多步骤网络攻击场景中的进展

出处: Measuring AI Agents' Progress on Multi-Step Cyber Attack Scenarios

发布: 2026年3月13日

📄 中文摘要

研究评估了前沿人工智能模型在两个专门构建的网络攻击场景中的自主网络攻击能力，包括一个32步的企业网络攻击和一个7步的工业控制系统攻击。这些攻击需要在扩展的行动序列中链式整合异构能力。通过比较在18个月内发布的七个模型（2024年8月至2026年2月），并在不同的推理时间计算预算下进行评估，观察到两个能力趋势。首先，模型性能与推理时间计算呈对数线性增长，未观察到平台效应，从1000万到1亿个标记的提升可带来高达59%的性能提升，且操作员无需特定的技术复杂性。其次，每一代模型在固定时间预算下均优于其前代。

🏷️ 相关标签

#人工智能 #网络攻击 #模型评估 #能力趋势 #推理时间计算

📄 English Summary

Measuring AI Agents' Progress on Multi-Step Cyber Attack Scenarios

The study evaluates the autonomous cyber-attack capabilities of cutting-edge AI models on two purpose-built cyber ranges: a 32-step corporate network attack and a 7-step industrial control system attack, which require chaining heterogeneous capabilities across extended action sequences. By comparing seven models released over an eighteen-month period (from August 2024 to February 2026) at varying inference-time compute budgets, two capability trends are observed. First, model performance scales log-linearly with inference-time compute, with no observed plateau; increasing from 10M to 100M tokens yields performance gains of up to 59%, requiring no specific technical sophistication from the operator. Second, each successive model generation outperforms its predecessor at fixed time budgets.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

Measuring AI Agents' Progress on Multi-Step Cyber Attack Scenarios

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误