基于关系图的差分去噪和扩散注意力融合的多模态对话情感识别

📄 中文摘要

在实际场景中,音频和视频信号常常受到环境噪声和有限采集条件的影响,导致提取的特征中包含过多噪声。此外,不同模态之间的数据质量和信息承载能力存在不平衡。这两个问题共同导致信息失真和权重偏差,损害整体识别性能。现有大多数方法忽视了噪声模态的影响,依赖隐式加权来建模模态的重要性,未能明确考虑文本模态在情感理解中的主导贡献。为了解决这些问题,提出了一种关系感知的去噪和扩散注意力融合方法,旨在提高多模态对话情感识别的准确性。

📄 English Summary

Relational graph-driven differential denoising and diffusion attention fusion for multimodal conversation emotion recognition

In real-world scenarios, audio and video signals are often affected by environmental noise and limited acquisition conditions, resulting in excessive noise in the extracted features. Additionally, there is an imbalance in data quality and information carrying capacity among different modalities. These issues lead to information distortion and weight bias, impairing overall recognition performance. Most existing methods neglect the impact of noisy modalities and rely on implicit weighting to model modality importance, failing to explicitly account for the predominant contribution of the textual modality in emotion understanding. To address these challenges, a relation-aware denoising and diffusion attention fusion method is proposed to enhance the accuracy of multimodal conversation emotion recognition.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等