评估大型语言模型在图表问答中的提示策略

出处: Evaluating Prompting Strategies for Chart Question Answering with Large Language Models

发布: 2026年3月25日

📄 中文摘要

提示策略对大型语言模型（LLM）的推理性能有显著影响，但在基于图表的问答中其作用尚未得到充分探讨。研究系统评估了四种广泛使用的提示范式（零样本、少样本、零样本思维链和少样本思维链），并在ChartQA数据集上进行了实验。该框架专注于结构化图表数据，将提示结构作为唯一的实验变量，并使用准确率和精确匹配两个指标来评估性能。结果显示，在1200个多样化的ChartQA样本中，少样本思维链提示 consistently yields the highest accuracy (up to 78.2%)，特别是在推理密集型问题上，而少样本提示则改善了格式。

🏷️ 相关标签

#提示策略 #大型语言模型 #图表问答 #推理性能

📄 English Summary

Evaluating Prompting Strategies for Chart Question Answering with Large Language Models

Prompting strategies significantly affect the reasoning performance of large language models (LLMs), yet their role in chart-based question answering remains underexplored. A systematic evaluation of four widely used prompting paradigms (Zero-Shot, Few-Shot, Zero-Shot Chain-of-Thought, and Few-Shot Chain-of-Thought) is presented, conducted on the ChartQA dataset. The framework operates exclusively on structured chart data, isolating prompt structure as the only experimental variable, and evaluates performance using two metrics: Accuracy and Exact Match. Results from 1,200 diverse ChartQA samples indicate that Few-Shot Chain-of-Thought prompting consistently yields the highest accuracy (up to 78.2%), particularly on reasoning-intensive questions, while Few-Shot prompting improves format.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

Evaluating Prompting Strategies for Chart Question Answering with Large Language Models

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误