动态融合感知图卷积神经网络用于对话中的多模态情感识别

📄 中文摘要

多模态情感识别在对话中旨在识别和理解说话者在发言互动中表达的情感,涉及文本、音频、图像等多种模态。现有研究表明,图卷积网络(GCN)能够通过建模说话者之间的依赖关系来提升多模态情感识别的性能。然而,现有方法通常使用固定参数处理不同情感类型的多模态特征,忽视了不同模态之间融合的动态性,这迫使模型在多个情感类别之间平衡性能,从而限制了模型在某些特定情感上的表现。为此,提出了一种动态融合感知图卷积神经网络(DF-GCN),旨在增强多模态情感识别的鲁棒性和准确性。

📄 English Summary

Dynamic Fusion-Aware Graph Convolutional Neural Network for Multimodal Emotion Recognition in Conversations

Multimodal emotion recognition in conversations (MERC) aims to identify and understand the emotions expressed by speakers during utterance interactions across various modalities such as text, audio, and images. Existing studies have demonstrated that Graph Convolutional Networks (GCN) can enhance MERC performance by modeling dependencies between speakers. However, current methods typically employ fixed parameters for processing multimodal features corresponding to different emotion types, neglecting the dynamic nature of fusion among modalities. This limitation forces the model to compromise performance across multiple emotion categories, hindering its effectiveness for specific emotions. To address this, a Dynamic Fusion-Aware Graph Convolutional Neural Network (DF-GCN) is proposed to improve the robustness and accuracy of multimodal emotion recognition.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等