Taalas以每秒17,000个令牌提供Llama 3.1 8B服务

出处: Taalas serves Llama 3.1 8B at 17,000 tokens/second

发布: 2026年2月20日

📄 中文摘要

一家新的加拿大硬件初创公司Taalas刚刚宣布其首款产品——Llama 3.1 8B模型的定制硬件实现，能够以惊人的每秒17,000个令牌的速度运行。该公司将其硬件称为“硅Llama”，采用了激进的量化技术，结合了3位和6位参数。下一代产品将使用4位参数，预计在新模型的开发上有较长的提前期。用户可以在chatjimmy.ai上体验该技术，尽管演示视频速度极快，观看时更像是截图。

🏷️ 相关标签

#Taalas #Llama 3.1 #硬件 #量化 #AI

📄 English Summary

Taalas serves Llama 3.1 8B at 17,000 tokens/second

A new Canadian hardware startup, Taalas, has announced its first product: a custom hardware implementation of the Llama 3.1 8B model, capable of running at an impressive speed of 17,000 tokens per second. The company describes its hardware as 'Silicon Llama,' which utilizes aggressive quantization by combining 3-bit and 6-bit parameters. The next generation of their product is expected to use 4-bit parameters, indicating a long lead time for developing new models. Users can try out the technology at chatjimmy.ai, although the demo video is so fast that it resembles a screenshot.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

Taalas serves Llama 3.1 8B at 17,000 tokens/second

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误