通过领域感知层选择理解视觉-语言模型中的剪枝机制

出处: Understanding Pruning Regimes in Vision-Language Models Through Domain-Aware Layer Selection

发布: 2026年3月24日

📄 中文摘要

研究表明，基于变换器的视觉-语言模型（VLMs）存在显著的深度冗余，但去除特定解码器层的效果尚不明确，尤其是在需要感知与多步推理紧密结合的领域中。通过领域感知的激活相似性，研究了结构化解码器层剪枝，测量每一层在数学与非数学输入下对表示的转化强度。这一方法生成了简单的数学感知、非数学感知和混合排名标准，识别出在目标领域内输入输出激活变化最小的层。在两个最先进的VLMs和一系列数学及一般多模态基准测试中，发现了一致的三种机制结构。

🏷️ 相关标签

#视觉-语言模型 #剪枝机制 #领域感知 #激活相似性 #多模态基准

📄 English Summary

Understanding Pruning Regimes in Vision-Language Models Through Domain-Aware Layer Selection

This study reveals that transformer-based vision-language models (VLMs) exhibit significant depth redundancy, yet the impact of removing specific decoder layers is not well understood, particularly in domains requiring a tight coupling between perception and multi-step reasoning. Structured decoder layer pruning is investigated through the lens of domain-aware activation similarity, measuring how strongly each layer transforms representations for math versus non-math inputs. This approach yields simple math-aware, non-math-aware, and mixed ranking criteria that identify layers whose input-output activations change the least within a target domain. Across two state-of-the-art VLMs and a broad suite of math and general multimodal benchmarks, a consistent three-regime structure is uncovered.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

Understanding Pruning Regimes in Vision-Language Models Through Domain-Aware Layer Selection

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误