MoE 在 8GB VRAM 上以 2.4 倍的速度超越 Dense 27B — 35B-A3B 基准测试的意外发现
📄 中文摘要
在一项基准测试中,三种 Qwen3.5 模型在相同硬件上进行了比较。测试环境为 RTX 4060 8GB,结果显示,尽管三种模型的 VRAM 消耗相近(7.1-7.7GB),但速度差异显著,分别为 33.0、3.57 和 8.61 t/s。其中,Qwen3.5-9B 的速度最高,达到了 33.0 t/s,而 Qwen3.5-27B 的速度仅为 3.57 t/s。Qwen3.5-35B-A3B 则以 8.61 t/s 的速度表现出色,显示出 MoE 模型在性能上的优势,尤其是在资源有限的情况下。
📄 English Summary
MoE Beat Dense 27B by 2.4x on 8GB VRAM — The 35B-A3B Benchmark Nobody Expected
Benchmark tests were conducted comparing three Qwen3.5 models on the same hardware, specifically an RTX 4060 with 8GB of VRAM. The results indicated that while the VRAM consumption across the models was similar (ranging from 7.1 to 7.7GB), there was a significant variation in speed, recorded at 33.0, 3.57, and 8.61 t/s respectively. The Qwen3.5-9B model achieved the highest speed at 33.0 t/s, whereas the Qwen3.5-27B model lagged behind at 3.57 t/s. In contrast, the Qwen3.5-35B-A3B model demonstrated impressive performance at 8.61 t/s, highlighting the advantages of MoE models, particularly in resource-constrained environments.
Powered by Cloudflare Workers + Payload CMS + Claude 3.5
数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等