MoE-SpAc：基于推测激活效用的高效MoE推理在异构边缘场景中的应用

出处: MoE-SpAc: Efficient MoE Inference Based on Speculative Activation Utility in Heterogeneous Edge Scenarios

发布: 2026年3月12日

📄 中文摘要

Mixture-of-Experts (MoE) 模型在性能扩展方面具有优势，但在边缘设备上面临严重的内存限制。现有的卸载策略由于自回归专家激活的动态和低信息特性，难以克服I/O瓶颈。研究提出了一种新的框架MoE-SpAc，利用推测解码（SD）作为内存管理的有用前瞻传感器，而不仅仅是计算加速器。该框架集成了推测效用估计器以跟踪专家需求，异构工作负载平衡器通过在线整数优化动态划分计算，以及异步执行引擎以统一执行过程，从而提高了MoE模型在边缘设备上的推理效率。

🏷️ 相关标签

#混合专家模型 #推测解码 #内存管理 #异构工作负载 #推理效率

📄 English Summary

MoE-SpAc: Efficient MoE Inference Based on Speculative Activation Utility in Heterogeneous Edge Scenarios

Mixture-of-Experts (MoE) models offer scalable performance but encounter significant memory constraints on edge devices. Existing offloading strategies struggle with I/O bottlenecks due to the dynamic and low-information nature of autoregressive expert activation. This research proposes a novel framework, MoE-SpAc, which repurposes Speculative Decoding (SD) not only as a compute accelerator but also as an informative lookahead sensor for memory management. The framework integrates a Speculative Utility Estimator to track expert demand, a Heterogeneous Workload Balancer to dynamically partition computation via online integer optimization, and an Asynchronous Execution Engine to unify the execution process, thereby enhancing the inference efficiency of MoE models on edge devices.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

MoE-SpAc: Efficient MoE Inference Based on Speculative Activation Utility in Heterogeneous Edge Scenarios

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误