LiteVLA-Edge: 嵌入式机器人量化设备端多模态控制

出处: LiteVLA-Edge: Quantized On-Device Multimodal Control for Embedded Robotics

发布: 2026年3月5日

📄 中文摘要

Vision-Language-Action (VLA) 模型为感知、语言调节和动作生成提供了统一框架，但许多现有系统由于计算需求和推理延迟，难以在嵌入式机器人环境中部署。LiteVLA-Edge 是一种面向部署的 VLA 管道，支持在 Jetson Orin 类硬件上进行完全的设备端推理。该方法结合了在 FP32 下的监督图像到动作微调、后训练的 4 位 GGUF 量化以及通过 llama.cpp 运行时的 GPU 加速推理。在我们的部署配置下，LiteVLA-Edge 实现了平均端到端延迟为 150.5 毫秒（约 6.6 Hz），并且完全离线操作。

🏷️ 相关标签

#多模态控制 #嵌入式机器人 #量化 #推理延迟

📄 English Summary

LiteVLA-Edge: Quantized On-Device Multimodal Control for Embedded Robotics

Vision-Language-Action (VLA) models offer a unified framework for perception, language conditioning, and action generation. However, many existing systems face challenges in deployment within embedded robotic environments due to their computational demands and inference latency. LiteVLA-Edge presents a deployment-oriented VLA pipeline designed for fully on-device inference on Jetson Orin-class hardware. The approach integrates supervised image-to-action fine-tuning in FP32 with post-training 4-bit GGUF quantization and GPU-accelerated inference via the llama.cpp runtime. Under the specified deployment configuration, LiteVLA-Edge achieves a mean end-to-end latency of 150.5 ms (approximately 6.6 Hz) while operating entirely offline.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

LiteVLA-Edge: Quantized On-Device Multimodal Control for Embedded Robotics

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误