📄 中文摘要
多模态预训练在构建通用表示方面效果显著,但在许多实际部署场景中,下游微调阶段往往只侧重使用其中一种模态。传统的预训练策略对所有模态一视同仁,这可能导致对实际应用中关键模态的表示优化不足。本文提出了一种名为“微调引导预训练”(Finetune-Informed Pretraining, FIP)的模型无关方法,旨在将表示学习偏向于下游任务中主要使用的模态。FIP的核心思想是在预训练阶段通过引入模态感知损失或注意力机制,根据下游任务对特定模态的依赖程度,有选择性地增强或弱化不同模态的贡献。
📄 English Summary
Finetune-Informed Pretraining Boosts Downstream Performance
Multimodal pretraining is highly effective for developing general-purpose representations. However, in numerous practical deployments, downstream fine-tuning predominantly relies on a single modality. Conventional pretraining strategies treat all modalities uniformly, which can lead to suboptimal representations for the modality that is critical for the actual application. This paper introduces Finetune-Informed Pretraining (FIP), a model-agnostic methodology designed to bias representation learning towards the primary modality utilized during downstream fine-tuning. The central tenet of FIP involves integrating modality-aware loss functions or attention mechanisms during the pretraining phase. These mechanisms selectively enhance or diminish the contribution of different modalities based on their anticipated importance in downstream tasks. For instance, if a downstream task heavily depends on the text modality, FIP incorporates strategies during pretraining to allocate more resources to learning textual features or assign higher weights to the text modality when integrating multimodal information. This biased learning can be achieved by introducing regularization terms into the pretraining objective function or by adjusting the loss weights for tasks associated with different modalities. Specifically, FIP can leverage prior knowledge about downstream tasks, such as feature importance scores for a particular modality in a downstream task, or estimate modality importance during pretraining using a small subset of downstream task data. By adopting this approach, FIP ensures that the pretrained model generates more robust and discriminative representations for the most crucial modality in downstream tasks, thereby significantly boosting performance on tasks where that modality is dominant.
Powered by Cloudflare Workers + Payload CMS + Claude 3.5
数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等