LLaMo:扩展预训练语言模型以实现统一的运动理解与生成,采用连续自回归标记
📄 中文摘要
LLaMo是一个统一框架,旨在解决运动语言生成与理解中的挑战。现有方法通常通过对配对的运动-文本数据进行微调来训练大型语言模型(LLMs),但这可能导致语言能力的灾难性遗忘,因为可用的文本-运动对的规模有限。此外,先前的方法通常通过量化将运动转换为离散表示,以便与语言模型集成,这引入了显著的抖动伪影。LLaMo通过引入连续自回归标记,克服了这些问题,能够有效地整合运动和语言信息,推动多模态生成和理解的进一步发展。
📄 English Summary
LLaMo: Scaling Pretrained Language Models for Unified Motion Understanding and Generation with Continuous Autoregressive Tokens
LLaMo is a unified framework designed to address challenges in motion-language generation and understanding. Existing approaches often fine-tune large language models (LLMs) on paired motion-text data, which can lead to catastrophic forgetting of linguistic capabilities due to the limited scale of available text-motion pairs. Furthermore, prior methods typically convert motion into discrete representations through quantization for integration with language models, introducing significant jitter artifacts. LLaMo overcomes these issues by introducing continuous autoregressive tokens, effectively integrating motion and language information, and advancing the development of multimodal generation and understanding.
Powered by Cloudflare Workers + Payload CMS + Claude 3.5
数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等