视觉-语言模型中的自我中心偏差

出处: Egocentric Bias in Vision-Language Models

发布: 2026年2月19日

📄 中文摘要

该研究提出了FlipSet，一个用于视觉-语言模型（VLMs）中二级视觉视角采集（L2 VPT）的诊断基准。该任务要求模拟从另一个代理的视角进行180度旋转的2D字符字符串，旨在将空间变换与3D场景复杂性分离。对103个视觉-语言模型的评估显示出系统性的自我中心偏差：绝大多数模型的表现低于随机水平，约四分之三的错误重现了相机视角。控制实验揭示了组成性缺陷——模型在理论心智准确性和单独的心理旋转任务中表现良好，但在综合任务中却出现严重失败。

🏷️ 相关标签

#视觉视角 #自我中心偏差 #视觉-语言模型 #心理旋转 #基准测试

📄 English Summary

Egocentric Bias in Vision-Language Models

This study introduces FlipSet, a diagnostic benchmark for Level-2 visual perspective taking (L2 VPT) in vision-language models (VLMs). The task involves simulating 180-degree rotations of 2D character strings from another agent's perspective, isolating spatial transformations from the complexities of 3D scenes. Evaluation of 103 VLMs reveals a systematic egocentric bias: the majority perform below chance level, with approximately three-quarters of errors reproducing the camera viewpoint. Control experiments expose a compositional deficit—models achieve high theory-of-mind accuracy and above-chance mental rotation in isolation, yet fail catastrophically in integrated tasks.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

Egocentric Bias in Vision-Language Models

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误