ChartDiff: 大规模图表理解对比基准

出处: ChartDiff: A Large-Scale Benchmark for Comprehending Pairs of Charts

发布: 2026年4月1日

📄 中文摘要

ChartDiff 是首个针对跨图表比较总结的大规模基准，旨在填补现有图表理解基准在多图表比较推理方面的空白。该基准包含 8,541 对图表，涵盖多种数据源、图表类型和视觉风格，每对图表均附有 LLM 生成和人工验证的摘要，描述趋势、波动和异常的差异。通过使用 ChartDiff，评估了通用模型、专门针对图表的模型以及基于管道的模型。结果表明，前沿的通用模型在 GPT 基础质量上表现最佳，而专门模型的性能也得到了验证。

🏷️ 相关标签

#图表理解 #比较总结 #基准测试 #数据可视化 #人工智能

📄 English Summary

ChartDiff: A Large-Scale Benchmark for Comprehending Pairs of Charts

ChartDiff introduces the first large-scale benchmark for cross-chart comparative summarization, addressing the gap in existing benchmarks that focus primarily on single-chart interpretation. It consists of 8,541 chart pairs from diverse data sources, chart types, and visual styles, each annotated with LLM-generated and human-verified summaries that describe differences in trends, fluctuations, and anomalies. The benchmark enables the evaluation of general-purpose, chart-specialized, and pipeline-based models. Results indicate that frontier general-purpose models achieve the highest quality based on GPT metrics, while specialized models also demonstrate validated performance.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

ChartDiff: A Large-Scale Benchmark for Comprehending Pairs of Charts

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误