OTPrune:通过最优传输实现分布对齐的视觉令牌剪枝

📄 中文摘要

OTPrune是一个无训练框架,将视觉令牌剪枝问题表述为通过最优传输实现的分布对齐。该方法通过最小化全量和剪枝后令牌分布之间的2-瓦瑟斯坦距离,能够在降低推理成本的同时保持局部多样性和全局代表性。OTPrune还推导出一个可处理的子模目标,便于高效优化,并在理论上证明了其有效性。该框架为多模态大语言模型(MLLMs)在视觉语言推理中的应用提供了新的思路,尤其是在处理冗余视觉令牌时,展现出显著的优势。

📄 English Summary

OTPrune: Distribution-Aligned Visual Token Pruning via Optimal Transport

OTPrune is a training-free framework that formulates visual token pruning as distribution alignment via optimal transport (OT). By minimizing the 2-Wasserstein distance between the full and pruned token distributions, OTPrune effectively reduces inference costs while preserving local diversity and global representativeness. Additionally, a tractable submodular objective is derived to enable efficient optimization, and theoretical proofs of its effectiveness are provided. This framework offers a novel approach for multi-modal large language models (MLLMs) in visual-language reasoning, particularly in addressing the issue of redundant visual tokens.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等