流式专家

出处: Streaming experts

发布: 2026年3月24日

📄 中文摘要

Dan Woods的实验展示了流式专家技术的应用,该技术允许在内存不足以容纳整个模型时,通过从SSD流式传输必要的专家权重来运行更大的混合专家模型。最近,Dan在48GB内存的情况下成功运行了Qwen3.5-397B-A17B模型。而在短短五天后,另一位用户在96GB内存的M2 Max MacBook Pro上成功运行了Kimi K2.5模型,该模型拥有1万亿参数,并且在任意时刻激活32B权重。这一进展展示了流式专家技术在处理超大模型方面的潜力。

📄 English Summary

Streaming experts

Dan Woods' experiments have showcased the application of streaming experts technology, which allows for running larger Mixture-of-Experts models on hardware that lacks sufficient RAM by streaming the necessary expert weights from SSD. Recently, Dan successfully ran the Qwen3.5-397B-A17B model with 48GB of RAM. Just five days later, another user reported running the colossal Kimi K2.5 model, which has 1 trillion parameters and 32B active weights at any given time, on a 96GB M2 Max MacBook Pro. This advancement highlights the potential of streaming experts technology in handling ultra-large models.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等