FIRE:金融智能与推理评估的综合基准

📄 中文摘要

FIRE是一个全面的基准,旨在评估大型语言模型(LLMs)在理论金融知识和实际商业场景处理能力方面的表现。在理论评估方面,FIRE汇集了来自广泛认可的金融资格考试的多样化考题,以便评估LLMs对金融知识的深入理解和应用能力。此外,为了评估LLMs在现实金融任务中的实际价值,FIRE提出了一个系统的评估矩阵,分类复杂的金融领域,确保涵盖必要的子领域和商业活动。基于该评估矩阵,收集了3000个金融场景问题,包括封闭式决策问题,旨在全面考察LLMs的金融智能和推理能力。

📄 English Summary

FIRE: A Comprehensive Benchmark for Financial Intelligence and Reasoning Evaluation

FIRE is a comprehensive benchmark designed to evaluate the theoretical financial knowledge of large language models (LLMs) and their ability to handle practical business scenarios. For theoretical assessment, a diverse set of examination questions has been curated from widely recognized financial qualification exams, enabling the evaluation of LLMs' deep understanding and application of financial knowledge. Additionally, to assess the practical value of LLMs in real-world financial tasks, a systematic evaluation matrix has been proposed that categorizes complex financial domains and ensures coverage of essential subdomains and business activities. Based on this evaluation matrix, 3,000 financial scenario questions have been collected, consisting of closed-form decision questions, aimed at comprehensively assessing the financial intelligence and reasoning capabilities of LLMs.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等