我为何开始计算令牌(以及这如何改变了我的开发工作流程)

📄 中文摘要

在日常使用大型语言模型(LLMs)如Claude、GPT-4和Gemini时,很多用户可能并未意识到自己在无形中消耗了大量资金。通过对Claude API使用情况的审计,发现自己在几乎每个请求中发送的上下文超出了需要的40%。问题主要在于冗余的系统提示、完整文件内容的传输以及过长的对话历史。解决方案并非复杂的优化,而是提升了对令牌使用的意识,从而有效降低了成本。

📄 English Summary

Why I Started Counting Tokens (And How It Changed My Dev Workflow)

Many users of large language models (LLMs) like Claude, GPT-4, and Gemini may be unknowingly spending excessive amounts of money. A quick audit of Claude API usage revealed that 40% more context was being sent than necessary in nearly every request. The issues stemmed from redundant system prompts, transmitting full file contents instead of snippets, and conversation histories that should have been truncated. The solution was not a complex optimization but rather an increased awareness of token usage, leading to significant cost savings.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等