Proact-VL：用于实时 AI 伴侣的主动视频语言模型

出处: Proact-VL: A Proactive VideoLLM for Real-Time AI Companions

发布: 2026年3月5日

📄 中文摘要

主动和实时的交互体验对于类人 AI 伴侣至关重要，但面临三大挑战：一是如何在持续流输入下实现低延迟推理，二是自主决定何时响应，三是控制生成内容的质量和数量以满足实时约束。该研究通过两个游戏场景——评论员和引导者，展示了 AI 伴侣的实例，这些场景适合进行自动评估。提出了实时游戏基准，这是一个大规模数据集，涵盖了三种代表性场景：单人评论、共同评论和用户引导，并展示了 Proact-VL，一个将多模态语言模型塑造成主动、实时交互的通用框架。

🏷️ 相关标签

#主动视频语言模型 #实时交互 #AI 伴侣 #低延迟推理 #实时游戏基准

📄 English Summary

Proact-VL: A Proactive VideoLLM for Real-Time AI Companions

Proactive and real-time interactive experiences are crucial for human-like AI companions, yet they encounter three main challenges: achieving low-latency inference under continuous streaming inputs, autonomously deciding when to respond, and controlling both the quality and quantity of generated content to meet real-time constraints. This research instantiates AI companions through two gaming scenarios, commentator and guide, selected for their suitability for automatic evaluation. A Live Gaming Benchmark is introduced, comprising a large-scale dataset with three representative scenarios: solo commentary, co-commentary, and user guidance. Proact-VL is presented as a general framework that shapes multimodal language models into proactive, real-time interactions.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

Proact-VL: A Proactive VideoLLM for Real-Time AI Companions

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误