推测解码缩放法则（SDSL）：简化吞吐量优化

出处: Speculative Decoding Scaling Laws (SDSL): Throughput Optimization Made Simple

发布: 2026年3月13日

📄 中文摘要

推测解码是一种利用多个语言模型加速推理的技术。以往的研究通过实验方法优化推理管道的吞吐量，这涉及到大规模语言模型（LLM）的训练，成本较高。该研究提出了一种理论，分析性地将预训练LLM的关键超参数与基于推测解码的下游推理系统的吞吐量效率联系起来。该理论能够在预训练之前预测推理系统各组件的吞吐量最优超参数，从而简化了优化过程。

🏷️ 相关标签

#推测解码 #吞吐量优化 #超参数 #预训练 #推理系统

📄 English Summary

Speculative Decoding Scaling Laws (SDSL): Throughput Optimization Made Simple

Speculative decoding is a technique that accelerates inference by utilizing multiple language models. Previous research has relied on experimental methods to optimize the throughput of the inference pipeline, which involves training large language models (LLMs) and can be costly. This study proposes a theory that analytically connects the key hyperparameters of pre-trained LLMs to the throughput efficiency of a downstream inference system based on speculative decoding. The theory enables the prediction of throughput-optimal hyperparameters for the components of an inference system prior to their pre-training, thereby simplifying the optimization process.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

Speculative Decoding Scaling Laws (SDSL): Throughput Optimization Made Simple

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误