基于解码器的语义知识蒸馏

出处: Decoder-based Sense Knowledge Distillation

发布: 2026年2月27日

📄 中文摘要

大型语言模型(LLMs)能够学习捕捉丰富语义信息的上下文嵌入,但往往忽视了诸如词义和关系等结构化词汇知识。以往的研究表明,结合词义词典可以改善编码器模型的知识蒸馏,但将其应用于生成模型的解码器仍然面临挑战。提出了一种基于解码器的语义知识蒸馏(DSKD)框架,该框架在训练解码器风格的LLMs时整合了词汇资源,而无需在推理时进行词典查找。通过在多种基准上的广泛实验,DSKD显著提升了解码器的知识蒸馏性能,增强了生成能力。

📄 English Summary

Decoder-based Sense Knowledge Distillation

Large language models (LLMs) learn contextual embeddings that capture rich semantic information, yet they often overlook structured lexical knowledge such as word senses and relationships. Previous research has shown that incorporating sense dictionaries can enhance knowledge distillation for encoder models; however, applying this approach to decoder-based generative models remains challenging. The proposed Decoder-based Sense Knowledge Distillation (DSKD) framework integrates lexical resources into the training of decoder-style LLMs without requiring dictionary lookup at inference time. Extensive experiments across diverse benchmarks demonstrate that DSKD significantly improves knowledge distillation performance for decoders, thereby enhancing their generative capabilities.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等