📄 中文摘要
随着人工智能代理的出现,自动科学发现已成为一个可行的目标。许多近期研究构建了能够进行机器学习研究的代理系统,但缺乏系统化的训练方法,现有的大型语言模型往往生成看似合理但无效的想法。为了解决这一问题,提出了一种新颖的合成环境生成管道,专门针对机器学习代理。该管道自动合成与SWE-agent框架兼容的机器学习挑战,涵盖主题采样、数据集提议和代码生成。生成的合成任务基于真实的机器学习数据集,确保提出的数据集具备实际应用价值。
📄 English Summary
AI Scientist via Synthetic Task Scaling
The emergence of AI agents has made automatic scientific discovery a feasible goal. Recent works have built agentic systems capable of conducting machine learning research but lack a principled approach to train these agents. Current large language models often produce plausible yet ineffective ideas. To address this, a novel synthetic environment generation pipeline is proposed, specifically targeting machine learning agents. This pipeline automatically synthesizes machine learning challenges compatible with the SWE-agent framework, encompassing topic sampling, dataset proposal, and code generation. The resulting synthetic tasks are grounded in real machine learning datasets, ensuring that the proposed datasets have practical applicability.
Powered by Cloudflare Workers + Payload CMS + Claude 3.5
数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等