通过自动数据生成和细粒度评估扩展网络代理训练

出处: Scaling Web Agent Training through Automatic Data Generation and Fine-grained Evaluation

发布: 2026年2月16日

📄 中文摘要

该研究提出了一种可扩展的管道，用于自动生成高质量的网络代理训练数据。识别高质量训练实例的主要挑战在于轨迹评估，即量化任务完成的进展程度。研究引入了一种新颖的基于约束的评估框架，提供了对任务完成进展的细粒度评估。这一方法使得能够利用部分成功的轨迹，从而显著扩展可用的训练数据量。研究在一个新提出的基准测试上评估了该方法，名为BookingArena，该基准测试包含20个流行网站上的复杂预订任务，结果表明，提炼后的学生模型优于开源模型。

🏷️ 相关标签

#网络代理 #自动数据生成 #轨迹评估 #细粒度评估 #训练数据

📄 English Summary

Scaling Web Agent Training through Automatic Data Generation and Fine-grained Evaluation

A scalable pipeline for automatically generating high-quality training data for web agents is presented. A major challenge in identifying high-quality training instances is trajectory evaluation, which quantifies the progress made towards task completion. A novel constraint-based evaluation framework is introduced, providing fine-grained assessment of progress towards task completion. This enables the leverage of partially successful trajectories, significantly expanding the amount of usable training data. The method is evaluated on a new benchmark called BookingArena, which consists of complex booking tasks across 20 popular websites, demonstrating that the distilled student model outperforms open-source models.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

Scaling Web Agent Training through Automatic Data Generation and Fine-grained Evaluation

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误