学习选择视觉上下文示范

出处: Learning to Select Visual In-Context Demonstrations

发布: 2026年3月31日

📄 中文摘要

多模态大型语言模型（MLLMs）通过上下文学习（ICL）适应视觉任务，而示范的质量对其效果至关重要。目前主流的示范选择策略是无监督的k-最近邻（kNN）搜索。尽管该方法简单，但对于复杂的事实回归任务而言，其相似性优先的策略并不理想，往往选择冗余示例，无法全面捕捉任务的输出范围。该研究将选择过程重新定义为一个序列决策问题，并提出了学习选择示范（LSD）的方法，训练强化学习代理构建最优示范集。通过使用对抗DQN和以查询为中心的Transformer解码器，代理学习一种策略，以最大化MLLM的下游性能。

🏷️ 相关标签

#多模态大型语言模型 #上下文学习 #示范选择 #强化学习 #DQN

📄 English Summary

Learning to Select Visual In-Context Demonstrations

Multimodal Large Language Models (MLLMs) adapt to visual tasks through in-context learning (ICL), which is heavily reliant on the quality of demonstrations. The prevalent demonstration selection strategy employs unsupervised k-Nearest Neighbor (kNN) search. While this approach is straightforward, it proves sub-optimal for complex factual regression tasks, as it tends to select redundant examples that do not adequately represent the full output range of the task. This research reframes the selection process as a sequential decision-making problem and introduces Learning to Select Demonstrations (LSD), which trains a Reinforcement Learning agent to construct optimal demonstration sets. Utilizing a Dueling DQN with a query-centric Transformer Decoder, the agent learns a policy that maximizes the downstream performance of MLLMs.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

Learning to Select Visual In-Context Demonstrations

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误