当 Gemini 产生虚假专利号时:修复 FTS5 + LLM 分析管道

📄 中文摘要

构建了一种专利分析管道,将 SQLite FTS5 搜索与 Gemini 的分析能力相结合。该管道的核心思想是输入研究假设,生成搜索关键词,通过 FTS5 查询 350 万个专利,然后让 Gemini 分析实际结果。然而,实际操作中,每个假设都未返回数据库命中,Gemini 便开始虚构专利号。该管道分为三个阶段:首先,Gemini 接收研究主题并生成具体假设及建议的搜索关键词;其次,使用这些关键词进行 FTS5 MATCH 查询;最后,分析查询结果。该系统旨在将数据库搜索的精确性与 LLM 的分析能力结合起来。

📄 English Summary

When Gemini Hallucinates Patent Numbers: Fixing the FTS5 + LLM Analysis Pipeline

A patent analysis pipeline has been developed that combines SQLite FTS5 search with the analytical capabilities of Gemini. The core idea is to input a research hypothesis, generate search keywords, query 3.5 million patents via FTS5, and then have Gemini analyze the actual results. However, in practice, every hypothesis returned zero database hits, leading Gemini to invent patent numbers. The pipeline consists of three stages: first, Gemini receives a research topic and generates specific hypotheses along with suggested search keywords; second, these keywords are used in FTS5 MATCH queries; and finally, the results are analyzed. This system aims to marry the precision of database searches with the analytical power of LLMs.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等