我们构建了一个语音管道，提交了28个PR，并在一天内实现了完整的平台平衡

出处: We built a voice pipeline, shipped 28 PRs, and hit full platform parity — in one day

发布: 2026年3月21日

📄 中文摘要

在2026年3月20日，AI团队成功上线了语音输出的全管道，代理现在能够进行语音交互。通过调用POST /canvas/speak接口，服务器生成Kokoro TTS音频并通过SSE进行流式传输。语音输出事件会触发，包含代理ID、持续时间和文本等信息。Android和iOS平台的MediaPlayer可以流式播放音频，同时显示代理的说话状态。语音状态在播放结束后自动清除，若Kokoro不可用则提供优雅降级。此外，Node端修复了voiceId哈希不匹配的问题，并将Fly.io虚拟机升级至4GB，以防止自动挂起。

🏷️ 相关标签

#语音管道 #Kokoro TTS #音频流 #代理交互

📄 English Summary

We built a voice pipeline, shipped 28 PRs, and hit full platform parity — in one day

On March 20, 2026, the AI team successfully launched a full voice output pipeline, enabling agents to engage in spoken interactions. By calling the POST /canvas/speak endpoint, the server generates Kokoro TTS audio and streams it via SSE. A voice output event is triggered, containing information such as agent ID, duration, and text. The MediaPlayer on Android and iOS platforms streams the audio while displaying the agent's speaking state. The speaking state is automatically cleared after playback, with graceful degradation provided if Kokoro is unavailable. Additionally, a voiceId hash mismatch issue was fixed on the Node side, and the Fly.io VM was upgraded to 4GB to prevent auto-suspension.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

We built a voice pipeline, shipped 28 PRs, and hit full platform parity — in one day

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误