RealChart2Code:利用真实数据和多任务评估推进图表到代码生成

📄 中文摘要

RealChart2Code是一个新的大规模基准,包含超过2800个实例,基于真实数据集并具有明确的分析意图任务。该基准首次系统性地评估了从大规模原始数据生成图表的能力,并在多轮对话环境中评估代码的迭代优化。对14个领先的视觉-语言模型(VLM)在RealChart2Code上的综合评估显示,与传统方法相比,性能显著下降。这一研究为图表生成领域提供了新的评估标准,推动了图表生成技术的发展。

📄 English Summary

RealChart2Code: Advancing Chart-to-Code Generation with Real Data and Multi-Task Evaluation

RealChart2Code is a new large-scale benchmark comprising over 2,800 instances grounded in authentic datasets with tasks that have clear analytical intent. It is the first benchmark to systematically evaluate chart generation from large-scale raw data and to assess iterative code refinement in a multi-turn conversational setting. A comprehensive evaluation of 14 leading Vision-Language Models (VLMs) on RealChart2Code reveals significant performance degradation compared to traditional methods. This research establishes a new evaluation standard for the field of chart generation, advancing the development of chart generation technologies.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等