📄 中文摘要
最近,台湾的AI社区中一则消息引发热议:三款廉价模型——DeepSeek V3.2、Xiaomi MiMo-v2-pro和MiniMax M2.7,在教育评估中通过结构化辩论击败了Claude Sonnet 4.6,准确率分别为88%和76%。这三款模型的调用成本约为Claude的1/17。MAGI(源自《新世纪福音战士》的超级计算机命名)是一种协调者模式,中央引擎向三个具有不同角色(科学家、同理心、务实主义者)的LLM节点发送问题,三者之间不直接交流,而是通过协调者进行调解。该协议ICE(Iterative Consensus Ensemble)分为三个阶段。
📄 English Summary
When Three Cheap Models Beat Claude — Through Arguing, Not Voting
Recently, a post went viral in Taiwan's AI community, revealing that three inexpensive models—DeepSeek V3.2, Xiaomi MiMo-v2-pro, and MiniMax M2.7—outperformed Claude Sonnet 4.6 in educational assessments through structured debate, achieving 88% versus 76% accuracy. The cost per call for these models is approximately 1/17th that of Claude. MAGI, named after the supercomputers in Evangelion, is an orchestrator pattern where a central engine sends questions to three LLM nodes, each embodying a persona: scientist, empath, and pragmatist. They do not communicate directly; instead, an orchestrator mediates all interactions. The protocol, called ICE (Iterative Consensus Ensemble), operates in three phases.