我停止了对同一分类任务调用 GPT-4 达 10,000 次
📄 中文摘要
在构建内部工具的过程中,频繁调用 LLM API 进行相同的分类任务,面临两个主要问题:数据敏感性和成本。对于处理合同、病历或内部日志的团队,将这些数据发送到第三方 API 并不总是可行的。同时,随着调用次数的增加,按 token 计费的成本也在上升。因此,开发了一个开源 CLI 工具,可以从标记的示例中训练一个小型本地文本分类器。用户只需提供约 50 个输入/输出对,即可完成训练。
📄 English Summary
I stopped calling GPT-4 for the same classification task 10,000 times
Repeatedly calling an LLM API for the same classification tasks led to two significant issues: data sensitivity and cost. For teams dealing with contracts, patient records, or internal logs, sending such data to a third-party API is often not feasible. Additionally, the cost per token increases with scale, making it expensive for structured pattern matching. To address these challenges, an open-source CLI tool was developed that trains a small local text classifier from labeled examples. Users need to provide around 50 input/output pairs to train the model.
Powered by Cloudflare Workers + Payload CMS + Claude 3.5
数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等