回归以向前推进：用于高效灵活的大型多模态模型的递归变换器

出处: Looping Back to Move Forward: Recursive Transformers for Efficient and Flexible Large Multimodal Models

发布: 2026年2月11日

📄 中文摘要

大型多模态模型在视觉语言任务中取得了显著成功，但其庞大的参数量在训练和推理过程中往往未得到充分利用。研究提出了一种通过递归精炼重用模型参数的思路，以在不增加模型规模的情况下提取更强的多模态表示。提出的RecursiveVLM是一种针对大型多模态模型的递归变换器架构。其两项关键创新使得有效的递归成为可能：首先，递归连接器通过融合中间层隐藏状态并应用特定于模态的投影，来对齐递归步骤中的特征，尊重视觉和语言标记的不同统计结构；其次，采用了一种新的机制以增强模型的表达能力。

🏷️ 相关标签

#多模态模型 #递归变换器 #模型参数重用 #视觉语言任务

📄 English Summary

Looping Back to Move Forward: Recursive Transformers for Efficient and Flexible Large Multimodal Models

Large Multimodal Models (LMMs) have achieved significant success in vision-language tasks, yet their extensive parameter counts are often underutilized during training and inference. This research proposes a strategy of reusing model parameters through recursive refinement to extract stronger multimodal representations without increasing model size. The proposed RecursiveVLM is a recursive Transformer architecture designed for LMMs. Two key innovations enable effective recursion: firstly, a Recursive Connector aligns features across recursion steps by fusing intermediate-layer hidden states and applying modality-specific projections, respecting the distinct statistical structures of vision and language tokens; secondly, a novel mechanism enhances the model's expressive capability.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

Looping Back to Move Forward: Recursive Transformers for Efficient and Flexible Large Multimodal Models

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误