我构建了一个验证管道,阻止不符合模式检查的 AI 生成文件写入磁盘

📄 中文摘要

使用本地 LLM 生成结构化的 Markdown 知识文件后,随着文件数量的增加,知识库变得杂乱无章。常见问题包括字段类型错误、无效的枚举值、日期格式不正确以及不在分类法中的域名等。这些问题导致数据查询无效,图谱失去作用。问题的根源在于缺乏“LLM 输出”和“写入磁盘的文件”之间的契约。为了解决这个问题,构建了一个验证门,位于 LLM 和文件系统之间,确保只有通过验证的文件才能写入磁盘。

📄 English Summary

I built a validation pipeline that blocks AI-generated files from reaching disk if they fail schema checks

After generating structured Markdown knowledge files using local LLMs, the knowledge base becomes cluttered with issues such as incorrect field types, invalid enum values, improperly formatted dates, and domains not present in the taxonomy. These problems render data queries ineffective and the graph useless. The root of the issue lies in the absence of a contract between 'LLM output' and 'file that reaches disk.' To address this, a validation gate has been implemented between the LLM and the filesystem, ensuring that only validated files can be committed to disk.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等