📄 中文摘要
联邦学习(FL)在非独立同分布(non-IID)客户端数据的挑战下,严重影响全球模型性能,尤其是在多模态感知环境中。传统方法往往无法解决客户端之间的语义差异,导致多媒体系统的感知能力不足。为了解决这一问题,提出了SemanticFL框架,利用预训练扩散模型的丰富语义表示,为本地训练提供隐私保护的指导。该方法利用预训练的Stable Diffusion模型的多层语义表示,包括VAE编码的潜变量和U-Net层次结构,增强了模型在异质数据环境下的性能。通过这种方式,SemanticFL能够有效提升多模态感知任务的准确性和鲁棒性。
📄 English Summary
Diffusion-Guided Semantic Consistency for Multimodal Heterogeneity
Federated learning (FL) faces significant challenges due to non-independent and identically distributed (non-IID) client data, which severely impacts global model performance, particularly in multimodal perception scenarios. Conventional approaches often fail to address the semantic discrepancies among clients, resulting in suboptimal performance for multimedia systems that require robust perception capabilities. To tackle this issue, SemanticFL is introduced as a novel framework that harnesses the rich semantic representations from pre-trained diffusion models to provide privacy-preserving guidance for local training. This approach utilizes multi-layer semantic representations from a pre-trained Stable Diffusion model, including VAE-encoded latents and U-Net hierarchical structures, thereby enhancing the model's performance in heterogeneous data environments. SemanticFL effectively improves the accuracy and robustness of multimodal perception tasks.
Powered by Cloudflare Workers + Payload CMS + Claude 3.5
数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等