📄 中文摘要
提示策略对大型语言模型(LLM)的推理性能有显著影响,但在基于图表的问答中其作用尚未得到充分探讨。研究系统评估了四种广泛使用的提示范式(零样本、少样本、零样本思维链和少样本思维链),并在ChartQA数据集上进行了实验。该框架专注于结构化图表数据,将提示结构作为唯一的实验变量,并使用准确率和精确匹配两个指标来评估性能。结果显示,在1200个多样化的ChartQA样本中,少样本思维链提示 consistently yields the highest accuracy (up to 78.2%),特别是在推理密集型问题上,而少样本提示则改善了格式。
📄 English Summary
Evaluating Prompting Strategies for Chart Question Answering with Large Language Models
Prompting strategies significantly affect the reasoning performance of large language models (LLMs), yet their role in chart-based question answering remains underexplored. A systematic evaluation of four widely used prompting paradigms (Zero-Shot, Few-Shot, Zero-Shot Chain-of-Thought, and Few-Shot Chain-of-Thought) is presented, conducted on the ChartQA dataset. The framework operates exclusively on structured chart data, isolating prompt structure as the only experimental variable, and evaluates performance using two metrics: Accuracy and Exact Match. Results from 1,200 diverse ChartQA samples indicate that Few-Shot Chain-of-Thought prompting consistently yields the highest accuracy (up to 78.2%), particularly on reasoning-intensive questions, while Few-Shot prompting improves format.
Powered by Cloudflare Workers + Payload CMS + Claude 3.5
数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等