LatentAM:实时、大规模潜在高斯注意力映射通过在线字典学习

📄 中文摘要

LatentAM 是一个在线 3D 高斯喷溅(3DGS)映射框架,旨在从流式 RGB-D 观测中构建可扩展的潜在特征图,以实现开放词汇的机器人感知。该框架提出了一种在线字典学习方法,避免了使用特定模型的解码器来提取高维视觉-语言模型(VLM)嵌入,具有模型无关性和无预训练的优点,能够在测试时与不同的 VLM 进行即插即用的集成。具体而言,该方法将每个高斯原语与一个紧凑的查询向量关联,该向量可以通过带有可学习字典的注意力机制转换为近似的 VLM 嵌入。字典从流式观测中高效初始化,确保了实时处理能力。

📄 English Summary

LatentAM: Real-Time, Large-Scale Latent Gaussian Attention Mapping via Online Dictionary Learning

LatentAM is an online 3D Gaussian Splatting (3DGS) mapping framework designed to build scalable latent feature maps from streaming RGB-D observations for open-vocabulary robotic perception. It introduces an online dictionary learning approach that avoids the need for model-specific decoders to distill high-dimensional Vision-Language Model (VLM) embeddings, offering model-agnostic and pretraining-free advantages for plug-and-play integration with various VLMs at test time. Specifically, the method associates each Gaussian primitive with a compact query vector, which can be transformed into approximate VLM embeddings through an attention mechanism utilizing a learnable dictionary. The dictionary is efficiently initialized from streaming observations, ensuring real-time processing capabilities.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等