基于解码器的语义知识蒸馏

出处: Decoder-based Sense Knowledge Distillation

发布: 2026年2月27日

📄 中文摘要

大型语言模型（LLMs）能够学习捕捉丰富语义信息的上下文嵌入，但往往忽视了诸如词义和关系等结构化词汇知识。以往的研究表明，结合词义词典可以改善编码器模型的知识蒸馏，但将其应用于生成模型的解码器仍然面临挑战。提出了一种基于解码器的语义知识蒸馏（DSKD）框架，该框架在训练解码器风格的LLMs时整合了词汇资源，而无需在推理时进行词典查找。通过在多种基准上的广泛实验，DSKD显著提升了解码器的知识蒸馏性能，增强了生成能力。

🏷️ 相关标签

#知识蒸馏 #解码器 #词汇资源 #语义信息

📄 English Summary

Decoder-based Sense Knowledge Distillation

Large language models (LLMs) learn contextual embeddings that capture rich semantic information, yet they often overlook structured lexical knowledge such as word senses and relationships. Previous research has shown that incorporating sense dictionaries can enhance knowledge distillation for encoder models; however, applying this approach to decoder-based generative models remains challenging. The proposed Decoder-based Sense Knowledge Distillation (DSKD) framework integrates lexical resources into the training of decoder-style LLMs without requiring dictionary lookup at inference time. Extensive experiments across diverse benchmarks demonstrate that DSKD significantly improves knowledge distillation performance for decoders, thereby enhancing their generative capabilities.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

Decoder-based Sense Knowledge Distillation

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误