📄 中文摘要
探讨了在智能体框架中部署小型、专用代码大型语言模型(LLMs)的有效策略,以平衡性能、限制和成本。比较了构建小型多任务代码LLMs的两种主要方法:数据混合(data mixing)和模型合并(model merging)。通过对Qwen Coder和DeepSeek Coder两个模型家族的2B和7B参数规模模型进行了广泛实验。实验结果表明,数据混合方法通过在单一模型上进行多任务训练,可以有效地让模型学习到不同任务的通用表示和共享知识。这种方法通常涉及精心设计的数据采样和任务加权策略,以确保模型在所有目标任务上都能达到良好的性能。
📄 English Summary
Multi-task Code LLMs: Data Mix or Model Merge?
Investigating effective strategies for deploying smaller, specialized code Large Language Models (LLMs) within agentic frameworks, aiming to balance performance, constraints, and costs. This research compares two primary approaches for creating small, multi-task code LLMs: data mixing and model merging. Extensive experiments were conducted across two model families, Qwen Coder and DeepSeek Coder, at two scales: 2B and 7B parameters. The findings indicate that data mixing, by training a single model on multiple tasks, effectively enables the model to learn general representations and shared knowledge across diverse tasks. This approach typically involves carefully designed data sampling and task weighting strategies to ensure robust performance across all target tasks. Conversely, model merging focuses on integrating multiple expert models, each trained for a specific task, through methods such as weight averaging, knowledge distillation, or more sophisticated merging algorithms. This method aims to leverage the specialized knowledge of each expert model to create a versatile model capable of handling multiple tasks. The study meticulously analyzes the performance of both methods across various code-related tasks, including code generation, code completion, and code understanding, evaluating their performance differences, training efficiency, and resource consumption at different model scales. Results reveal the respective advantages and disadvantages of data mixing and model merging in different scenarios. For instance, data mixing might be more suitable for training tasks with high correlation, while model merging could offer better flexibility and performance when task disparities are significant.