Claude 感觉缓慢,但将团队转向开放权重模型真的能解决问题吗?
📄 中文摘要
Claude 在团队中的速度问题主要体现在 TTFT(首次响应时间)上,而非原始解码速度。实际使用测量显示,TTFT 的 p50 为 4.2s 至 6.8s,p90 为 14.5s 至 28.1s;而 Claude Sonnet 的解码速度 p50 为 176 tok/s。这说明 Claude 在开始时的延迟让整体体验显得缓慢。由此引发了一个问题:是否应该将团队转向自托管的开放权重模型?自托管的设置在 TTFT 上可能有显著的改善,值得进一步探讨。
📄 English Summary
Claude Feels Slow. But Is Moving a Team to Open-Weight Models Actually the Fix?
Claude has a significant speed issue for the team, primarily in TTFT (time to first token), rather than raw decoding speed. Actual usage measurements indicate that TTFT p50 ranges from 4.2s to 6.8s, and p90 ranges from 14.5s to 28.1s, while Claude Sonnet's decoding speed p50 is 176 tok/s. This suggests that while Claude isn't particularly slow once it starts, the initial delay can make the entire experience feel sluggish. This raises the question of whether the team should transition to self-hosted open-weight models, which could potentially offer substantial improvements in TTFT.
Powered by Cloudflare Workers + Payload CMS + Claude 3.5
数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等