OMG-Agent:解耦粗粒度到细粒度智能体工作流的鲁棒缺失模态生成
📄 中文摘要
多模态学习在现实世界中常面临模态缺失问题,现有方法通常通过联合训练或模态补全来处理,但往往难以在复杂场景下保持鲁棒性。本文提出OMG-Agent,一种新颖的缺失模态生成框架,其核心在于解耦的粗粒度到细粒度智能体工作流。OMG-Agent包含两个协同工作的智能体:粗粒度智能体和细粒度智能体。粗粒度智能体首先利用现有模态信息生成缺失模态的初步表示,捕捉其主要特征和全局结构。随后,细粒度智能体在此基础上,通过迭代细化和多模态信息融合,逐步完善缺失模态的细节,确保生成质量和与现有模态的一致性。这种解耦设计使得模型能够有效应对不同程度的模态缺失,并增强了生成结果的鲁棒性。实验结果表明,OMG-Agent在多个基准数据集上显著优于现有最先进方法,尤其在处理复杂缺失模式和提高生成模态的下游任务性能
📄 English Summary
OMG-Agent: Toward Robust Missing Modality Generation with Decoupled Coarse-to-Fine Agentic Workflows
Multimodal learning frequently encounters missing modality challenges in real-world applications. Existing approaches, often relying on joint training or modality imputation, struggle to maintain robustness in complex scenarios. This paper introduces OMG-Agent, a novel framework for robust missing modality generation, characterized by its decoupled coarse-to-fine agentic workflows. OMG-Agent comprises two collaborative agents: a coarse-grained agent and a fine-grained agent. The coarse-grained agent initially leverages available modality information to generate a preliminary representation of the missing modality, capturing its primary features and global structure. Subsequently, the fine-grained agent iteratively refines this initial representation, incorporating multimodal information to progressively enhance the details of the missing modality, ensuring high generation quality and consistency with existing modalities. This decoupled design enables the model to effectively handle varying degrees of modality absence and significantly improves the robustness of the generated outputs. Experimental results demonstrate that OMG-Agent substantially outperforms state-of-the-art methods across multiple benchmark datasets, particularly excelling in addressing intricate missing patterns and boosting downstream task performance when utilizing the generated modalities. This work presents a promising new solution for the missing modality problem in multimodal learning.
Powered by Cloudflare Workers + Payload CMS + Claude 3.5
数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等