我们构建了一个语音管道,提交了28个PR,并在一天内实现了完整的平台平衡
📄 中文摘要
在2026年3月20日,AI团队成功上线了语音输出的全管道,代理现在能够进行语音交互。通过调用POST /canvas/speak接口,服务器生成Kokoro TTS音频并通过SSE进行流式传输。语音输出事件会触发,包含代理ID、持续时间和文本等信息。Android和iOS平台的MediaPlayer可以流式播放音频,同时显示代理的说话状态。语音状态在播放结束后自动清除,若Kokoro不可用则提供优雅降级。此外,Node端修复了voiceId哈希不匹配的问题,并将Fly.io虚拟机升级至4GB,以防止自动挂起。
📄 English Summary
We built a voice pipeline, shipped 28 PRs, and hit full platform parity — in one day
On March 20, 2026, the AI team successfully launched a full voice output pipeline, enabling agents to engage in spoken interactions. By calling the POST /canvas/speak endpoint, the server generates Kokoro TTS audio and streams it via SSE. A voice output event is triggered, containing information such as agent ID, duration, and text. The MediaPlayer on Android and iOS platforms streams the audio while displaying the agent's speaking state. The speaking state is automatically cleared after playback, with graceful degradation provided if Kokoro is unavailable. Additionally, a voiceId hash mismatch issue was fixed on the Node side, and the Fly.io VM was upgraded to 4GB to prevent auto-suspension.
Powered by Cloudflare Workers + Payload CMS + Claude 3.5
数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等