📄 中文摘要
近期视觉-语言模型(例如ColPali)实现了细粒度的视觉文档检索(VDR),但其索引向量尺寸开销巨大。免训练剪枝方案(例如基于EOS注意力的方法)可以在不进行模型适应的情况下,将索引向量尺寸减少约60%。然而,在极端压缩场景(大于80%)下,这些方法通常表现不如随机选择。以往研究(例如Light-ColPali)将此归因于一个结论,即这些方法在高度压缩时性能下降。本研究提出一种名为“结构化锚点剪枝”(Structural Anchor Pruning, SAP)的新方法,旨在解决这一问题。
📄 English Summary
Look in the Middle: Structural Anchor Pruning for Scalable Visual RAG Indexing
Recent Vision-Language Models (e.g., ColPali) enable fine-grained Visual Document Retrieval (VDR), yet they incur prohibitive index vector size overheads. Training-free pruning solutions, such as EOS-attention based methods, can reduce index vector size by approximately 60% without model adaptation. However, these methods often underperform random selection in high-compression scenarios (exceeding 80%). Prior research (e.g., Light-ColPali) attributes this performance degradation in high-compression settings to a fundamental limitation of existing pruning strategies. This work introduces Structural Anchor Pruning (SAP), a novel method designed to address the challenges of scalable VDR indexing. SAP focuses on identifying and preserving highly informative and structurally representative 'anchor' features within visual documents, thereby significantly reducing index size while maintaining retrieval performance. The approach operates without requiring additional model training, instead leveraging the inherent attention mechanisms of visual models and the structural information of documents to guide the pruning process. Specifically, SAP analyzes the importance distribution across different regions of a document and prioritizes the retention of feature vectors located at the visual center or closely associated with core content. This 'Look in the Middle' strategy effectively captures both global semantics and crucial details of the document, ensuring high retrieval accuracy even under extreme compression. SAP has been evaluated across multiple visual document datasets, demonstrating superior retrieval performance compared to existing training-free pruning methods and random selection baselines when compressing index sizes by over 80%. In contrast to methods like Light-ColPali that necessitate model adaptation, SAP offers an efficient indexing optimization solution without sacrificing model generality.