构建基于本地优先的 RAG 研究工具：使用 Nemotron + vLLM + 工具调用

出处: Built a Local-First RAG Research Tool with Nemotron + vLLM + Tool Calling

发布: 2026年3月22日

📄 中文摘要

构建了一款完全在单个 GPU 上运行的本地优先 RAG 研究工具。该工具结合了工具调用和 RAG 方法，经过了一番探索。技术栈包括使用 Nemotron Nano 9B v2 日文模型和 vLLM（FP16，RTX 5090），后端采用 FastAPI、SQLite FTS5 和 Jinja2，所有功能集成在一个 app.py 文件中，同时使用 NVIDIA 的官方解析器插件进行工具调用和推理。系统在接收到问题后，首先通过 LLM 提取双语关键词（英语和日语），然后在本地源和 DuckDuckGo 网络搜索中进行 FTS5 搜索。

🏷️ 相关标签

#本地优先 #RAG #工具调用 #Nemotron #vLLM

📄 English Summary

Built a Local-First RAG Research Tool with Nemotron + vLLM + Tool Calling

A local-first RAG research tool has been built to run entirely on a single GPU. The combination of tool calling and RAG required some exploration. The tech stack includes the Nemotron Nano 9B v2 Japanese model on vLLM (FP16, RTX 5090), with the backend utilizing FastAPI, SQLite FTS5, and Jinja2, all integrated into a single app.py file. NVIDIA's official parser plugins are used for tool calling and reasoning. When a question is asked, the system first extracts bilingual keywords (EN+JA) via LLM and then performs an FTS5 search on local sources and DuckDuckGo web search.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

Built a Local-First RAG Research Tool with Nemotron + vLLM + Tool Calling

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误