基于混合现实的机器人导航接口:结合空间指向和大型语言模型的语音交互

📄 中文摘要

随着技术的进步,机器人导航变得更加直观,逐渐从传统的二维显示转向空间感知的混合现实(MR)系统。然而,现有的MR接口通常依赖于手动“空中点击”手势进行目标设置,这对于初学者来说可能显得重复且身体负担较重。提出了一种新的混合现实机器人导航接口(MRPoS),该框架用自然的多模态接口替代复杂的手势,结合了空间指向和基于大型语言模型(LLM)的语音交互。通过利用这两种信息,系统能够将口头意图转化为通过MR技术可视化的导航目标。综合实验结果表明,该方法显著提高了用户的操作体验和导航效率。

📄 English Summary

MRPoS: Mixed Reality-Based Robot Navigation Interface Using Spatial Pointing and Speech with Large Language Model

Recent advancements have made robot navigation more intuitive by transitioning from traditional 2D displays to spatially aware Mixed Reality (MR) systems. However, current MR interfaces often rely on manual 'air tap' gestures for goal placement, which can be repetitive and physically demanding, especially for beginners. A new Mixed Reality-Based Robot Navigation Interface using Spatial Pointing and Speech (MRPoS) is proposed, which replaces complex hand gestures with a natural, multimodal interface that combines spatial pointing with Large Language Model (LLM)-based speech interaction. By leveraging both modalities, the system translates verbal intent into navigation goals visualized by MR technology. Comprehensive experimental results indicate that this approach significantly enhances user experience and navigation efficiency.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等