如何构建一个真正理解教科书的离线 AI 辅导员(LFM2 + RAG)

📄 中文摘要

在学习《Murphy's Grammar in Use》时,常常遇到AI解释语法时冗长且不切实际的问题。希望有一个简单的工具,能够直接从书中提取练习题并进行检查,而无需互联网连接、订阅或GPU支持。为了解决标准RAG管道在处理跨页填空题时的盲目分块问题,开发了一种正则表达式解析器,能够在内容接触到大语言模型之前直接从PDF中提取练习题,确保任务是从书中复制而非生成,从而消除了幻觉的可能性。

📄 English Summary

How I Built an Offline AI Tutor That Actually Understands Textbooks (LFM2 + RAG)

While studying from 'Murphy's Grammar in Use', the author faced issues with AI explanations that were overly verbose and theatrical. A simpler solution was desired: a tool that could extract exercises directly from the book and check answers without requiring internet access, subscriptions, or GPU resources. To address the problem of standard RAG pipelines blindly chunking content, the author developed a regex parser that extracts exercises directly from the PDF before they interact with the LLM. This approach ensures that tasks are copied from the book rather than generated, eliminating the possibility of hallucinations.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等