MPA:用于少样本学习的多模态原型增强

📄 中文摘要

少样本学习(FSL)旨在从少量标注样本中识别新类别,已广泛应用于自然科学、遥感和医学图像等领域。然而,现有方法大多仅关注视觉模态,直接从原始支持图像计算原型,缺乏全面和丰富的多模态信息。为了解决这些局限性,提出了一种新颖的多模态原型增强FSL框架,称为MPA。该框架包括基于大型语言模型的多变语义增强(LMSE)、层次多视图增强(HMA)和自适应不确定类吸收器(AUCA)。LMSE利用大型语言模型生成多样的类别释义,以增强原型的语义信息,从而提高少样本学习的效果。

📄 English Summary

MPA: Multimodal Prototype Augmentation for Few-Shot Learning

Few-shot learning (FSL) aims to recognize new classes from only a few labeled examples and has been widely applied in fields such as natural science, remote sensing, and medical imaging. However, most existing methods focus solely on the visual modality and compute prototypes directly from raw support images, lacking comprehensive and rich multimodal information. To address these limitations, a novel Multimodal Prototype Augmentation FSL framework, called MPA, is proposed. This framework includes LLM-based Multi-Variant Semantic Enhancement (LMSE), Hierarchical Multi-View Augmentation (HMA), and an Adaptive Uncertain Class Absorber (AUCA). LMSE leverages large language models to generate diverse paraphrased category descriptions, enhancing the semantic information of prototypes and improving the effectiveness of few-shot learning.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等