📄 中文摘要
通过分析 LLM API 的使用成本,发现每次调用都需发送相同的“系统提示”,导致不必要的开支。采用提示缓存技术后,可以在首次发送后重复使用相同的系统提示,从而显著减少输入成本。具体而言,Claude 的缓存控制可以在缓存命中时降低 90% 的输入费用,而 Gemini 和 OpenAI 的缓存机制也能实现不同程度的成本节约。通过这种方式,整体输入成本可降至原来的五分之一。
📄 English Summary
LLM API 비용을 88% 줄인 방법
An analysis of LLM API costs reveals that sending the same 'system prompt' with each call leads to unnecessary expenses. By implementing prompt caching, the same system prompt can be sent once and reused thereafter, significantly reducing input costs. Specifically, Claude's cache control can cut input costs by 90% on cache hits, while Gemini and OpenAI's caching mechanisms also provide varying degrees of savings. This approach can reduce overall input costs to one-fifth of the original.
Powered by Cloudflare Workers + Payload CMS + Claude 3.5
数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等