如何在手机上运行 400B 参数的大型语言模型(真的可以)

📄 中文摘要

最近有一个演示展示了 iPhone 17 Pro 在本地运行一个 400B 参数的大型语言模型,而不是通过云 API 或者代理。这一现象引发了许多人的怀疑,认为这是不可能的,因为如此庞大的模型需要巨大的内存。实际上,这一切得益于巧妙的工程设计,解决了一个即将影响许多人的问题:如何在可用内存不足的情况下运行超大模型。对于一个 400B 参数的模型,即使是使用 FP16 格式,也需要大约 800GB 的内存,而即便是 4 位量化,所需的内存仍然是一个巨大的挑战。

📄 English Summary

How to Run a 400B Parameter LLM on a Phone (Yes, Really)

A recent demo showcased an iPhone 17 Pro running a 400B parameter large language model locally, without relying on cloud API calls or clever proxies. This sparked skepticism about its feasibility due to the immense memory requirements of such a model. However, it turns out to be a result of ingenious engineering that addresses a pressing issue: how to run models that exceed available RAM. A 400B parameter model in FP16 requires approximately 800GB of memory, and even with 4-bit quantization, the memory demands remain a significant challenge.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等