LLaMo：扩展预训练语言模型以实现统一的运动理解与生成，采用连续自回归标记

出处: LLaMo: Scaling Pretrained Language Models for Unified Motion Understanding and Generation with Continuous Autoregressive Tokens

发布: 2026年2月16日

📄 中文摘要

LLaMo是一个统一框架，旨在解决运动语言生成与理解中的挑战。现有方法通常通过对配对的运动-文本数据进行微调来训练大型语言模型（LLMs），但这可能导致语言能力的灾难性遗忘，因为可用的文本-运动对的规模有限。此外，先前的方法通常通过量化将运动转换为离散表示，以便与语言模型集成，这引入了显著的抖动伪影。LLaMo通过引入连续自回归标记，克服了这些问题，能够有效地整合运动和语言信息，推动多模态生成和理解的进一步发展。

🏷️ 相关标签

#LLaMo #运动理解 #语言生成 #多模态 #预训练模型

📄 English Summary

LLaMo: Scaling Pretrained Language Models for Unified Motion Understanding and Generation with Continuous Autoregressive Tokens

LLaMo is a unified framework designed to address challenges in motion-language generation and understanding. Existing approaches often fine-tune large language models (LLMs) on paired motion-text data, which can lead to catastrophic forgetting of linguistic capabilities due to the limited scale of available text-motion pairs. Furthermore, prior methods typically convert motion into discrete representations through quantization for integration with language models, introducing significant jitter artifacts. LLaMo overcomes these issues by introducing continuous autoregressive tokens, effectively integrating motion and language information, and advancing the development of multimodal generation and understanding.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

LLaMo: Scaling Pretrained Language Models for Unified Motion Understanding and Generation with Continuous Autoregressive Tokens

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误