复制研究:面向视觉语言模型的联邦文本驱动提示生成

📄 中文摘要

视觉语言模型如CLIP展现了显著的零样本能力,但在联邦学习场景中的适应性面临重大挑战,尤其是在对未见类别的泛化方面。原始的FedTPG论文通过引入一个文本驱动的提示生成网络,动态生成以类别名称为条件的提示,从而改善了联邦环境下的跨类别泛化能力。本研究对FedTPG进行了忠实的复制研究,评估了预训练模型在六个多样化的视觉数据集上的表现,包括Caltech101、Oxford Flowers、FGVC Aircraft、Oxford Pets、Food-101和DTD。评估结果与原始论文的结果相差不超过0.2%。

📄 English Summary

Replication Study: Federated Text-Driven Prompt Generation for Vision-Language Models

Vision-language models such as CLIP have shown remarkable zero-shot capabilities; however, their adaptation to federated learning scenarios poses significant challenges, particularly in generalizing to unseen classes. The original FedTPG paper addresses this limitation by introducing a text-driven prompt generation network that dynamically creates prompts conditioned on class names, enhancing cross-class generalization in federated settings. A faithful replication study of FedTPG was conducted, evaluating the pre-trained model across six diverse vision datasets: Caltech101, Oxford Flowers, FGVC Aircraft, Oxford Pets, Food-101, and DTD. The evaluation achieved results within 0.2% of the original paper's findings.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等