大型语言模型代理能否担任首席财务官?动态企业环境中的资源配置基准

📄 中文摘要

大型语言模型(LLMs)使得智能系统能够在复杂任务中进行推理、规划和行动,但在不确定性下有效配置资源的能力仍然不明确。资源配置不同于短期反应决策,它需要在时间上承诺稀缺资源,同时平衡竞争目标并为未来需求保留灵活性。研究提出了EnterpriseArena,这是第一个用于评估代理在长期企业资源配置中的基准。该基准模拟了132个月的企业决策过程,结合了公司层面的财务数据、匿名商业文件、宏观经济和行业信号,以及经过专家验证的操作规则,体现了首席财务官风格的决策制定。该环境为部分可观察的,旨在评估LLM代理在动态环境中的表现。

📄 English Summary

Can LLM Agents Be CFOs? A Benchmark for Resource Allocation in Dynamic Enterprise Environments

Large language models (LLMs) have enabled agentic systems capable of reasoning, planning, and acting across complex tasks, yet their effectiveness in resource allocation under uncertainty remains uncertain. Unlike short-horizon reactive decisions, resource allocation requires committing scarce resources over time while balancing competing objectives and maintaining flexibility for future needs. This research introduces EnterpriseArena, the first benchmark for evaluating agents on long-horizon enterprise resource allocation. It simulates CFO-style decision-making over a 132-month period, integrating firm-level financial data, anonymized business documents, macroeconomic and industry signals, and expert-validated operating rules. The environment is partially observable, designed to assess LLM agents' performance in dynamic enterprise contexts.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等