图表理解的多模态信息融合：MLLMs 的演变、局限性与认知增强

出处: Multimodal Information Fusion for Chart Understanding: A Survey of MLLMs -- Evolution, Limitations, and Cognitive Enhancement

发布: 2026年2月12日

📄 中文摘要

图表理解是一个典型的信息融合任务，要求无缝整合图形和文本数据以提取意义。多模态大语言模型（MLLMs）的出现彻底改变了这一领域，但基于MLLM的图表分析现状仍然零散，缺乏系统的组织。本研究提供了这一新兴前沿的全面路线图，结构化了该领域的核心组成部分。首先，分析了在图表中融合视觉和语言信息的基本挑战。接着，对下游任务和数据集进行了分类，提出了一种新的典范和非典范基准的分类法，以突出该领域不断扩展的范围。最后，呈现了对现有研究的综合评估，指出了未来的研究方向和潜在的认知增强方法。

🏷️ 相关标签

#多模态大语言模型 #图表理解 #信息融合 #视觉与语言信息 #认知增强

📄 English Summary

Multimodal Information Fusion for Chart Understanding: A Survey of MLLMs -- Evolution, Limitations, and Cognitive Enhancement

Chart understanding is a quintessential information fusion task that requires the seamless integration of graphical and textual data to extract meaning. The emergence of Multimodal Large Language Models (MLLMs) has revolutionized this domain; however, the landscape of MLLM-based chart analysis remains fragmented and lacks systematic organization. This survey provides a comprehensive roadmap for this nascent frontier by structuring the core components of the domain. It begins by analyzing the fundamental challenges of fusing visual and linguistic information in charts. Subsequently, it categorizes downstream tasks and datasets, introducing a novel taxonomy of canonical and non-canonical benchmarks to highlight the expanding scope of the field. Finally, a comprehensive evaluation of existing research is presented, identifying future research directions and potential cognitive enhancement methods.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

Multimodal Information Fusion for Chart Understanding: A Survey of MLLMs -- Evolution, Limitations, and Cognitive Enhancement

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误