深层探究:呼吁构建基于分布的视觉问答数据集

📄 中文摘要

视觉问答(VQA)已成为评估大型多模态模型(LMMs)图像理解能力的重要基准。然而,现有VQA数据集大多侧重于真实世界图像或简单的图表分析,很少有数据集专注于解释复杂的科学图表。许多分析图表的VQA数据集缺乏图表背后的原始数据,或者假设图表标记与底层数据之间存在一对一的对应关系。这种局限性使得模型难以理解数据分布、趋势、异常值或更深层次的统计模式,而这些是科学和数据分析领域中至关重要的信息。现有方法通常将图表视为静态图像进行处理,忽略了其作为数据可视化工具的本质。为解决这一问题,需要开发新的VQA数据集,其核心在于提供图表所代表的完整底层数据分布。

📄 English Summary

What Lies Beneath: A Call for Distribution-based Visual Question & Answer Datasets

Visual Question Answering (VQA) serves as a critical benchmark for evaluating the interpretive capabilities of large multimodal models (LMMs concerning images. Nevertheless, a significant portion of existing VQA datasets concentrates on real-world imagery or elementary diagrammatic analysis, with a notable scarcity of datasets specifically designed for interpreting intricate scientific charts. A common limitation among many chart-analyzing VQA datasets is the absence of the underlying data that generated the charts, or an implicit assumption of a direct one-to-one correspondence between chart marks and the raw data points. This constraint prevents models from comprehending data distributions, trends, outliers, or more profound statistical patterns, which are paramount in scientific and data analysis domains. Current methodologies frequently treat charts as static images, thereby neglecting their fundamental role as data visualization instruments. To address this deficiency, there is an imperative need for the development of novel VQA datasets that inherently incorporate the complete underlying data distributions represented by the charts. Such datasets should encompass a diverse array of scientific chart types, including but not limited to scatter plots, histograms, box plots, and heatmaps, meticulously linked to their original numerical data. Question formulation should extend beyond mere numerical extraction, encompassing queries that demand an understanding of data trends, correlations, outlier identification, statistical inference, and characteristics of data distributions (e.g., skewness, kurtosis, multimodality). For instance, questions could prompt the model to identify the most prevalent numerical range within a dataset, determine the presence of a positive correlation between two variables, or pinpoint anomalies within a data distribution.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等