📄 中文摘要
随着地球科学数据的快速积累,数据存储库如PANGAEA面临显著的可扩展性挑战。尽管这些存储库拥有大量数据集,但引用指标显示,仍有相当一部分数据未被充分利用,限制了数据的可重用性。PANGAEA-GPT是一种层次化的多智能体框架,旨在实现自主的数据发现和分析。该框架采用集中式的主管-工作者拓扑结构,具备严格的数据类型感知路由、沙箱式确定性代码执行及通过执行反馈进行自我修正的能力,使得智能体能够诊断和解决运行时错误。通过涵盖物理海洋学和生态学的应用场景,展示了该框架的有效性。
📄 English Summary
A Hierarchical Multi-Agent System for Autonomous Discovery in Geoscientific Data Archives
The rapid accumulation of Earth science data has created significant scalability challenges for repositories like PANGAEA, which host vast collections of datasets. Despite this, citation metrics indicate that a substantial portion of these datasets remains underutilized, limiting their reusability. PANGAEA-GPT presents a hierarchical multi-agent framework designed for autonomous data discovery and analysis. Unlike standard Large Language Model (LLM) wrappers, this architecture implements a centralized Supervisor-Worker topology with strict data-type-aware routing, sandboxed deterministic code execution, and self-correction through execution feedback, enabling agents to diagnose and resolve runtime errors. Use-case scenarios spanning physical oceanography and ecology demonstrate the framework's effectiveness.
Powered by Cloudflare Workers + Payload CMS + Claude 3.5
数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等