CLUTCH：用于解锁文本条件的手部动作建模的上下文化语言模型

出处: CLUTCH: Contextualized Language model for Unlocking Text-Conditioned Hand motion modelling in the wild

发布: 2026年2月23日

📄 中文摘要

手在日常生活中扮演着重要角色，但自然手部动作的建模仍然未得到充分探索。现有的文本到手部动作生成或手部动画字幕的方法依赖于有限的、在实验室环境中捕获的数据集，这使得其在“野外”场景中的扩展成本高昂。此外，当前模型及其训练方案在文本与动作对齐的动画保真度方面存在困难。为了解决这些问题，提出了‘3D Hands in the Wild’（3D-HIW）数据集，包含32K个3D手部动作序列及其对齐文本，并提出了CLUTCH，一个基于大型语言模型的手部动画系统，具有两个关键创新：（a）SHIFT，一种新颖的VQ-VAE架构用于手部动作的标记化；（b）几何细化阶段。

🏷️ 相关标签

#手部动作建模 #文本条件生成 #3D数据集 #动画保真度

📄 English Summary

CLUTCH: Contextualized Language model for Unlocking Text-Conditioned Hand motion modelling in the wild

Hands play a crucial role in daily life, yet the modeling of natural hand motions remains underexplored. Existing methods for text-to-hand-motion generation or hand animation captioning rely on limited, studio-captured datasets, making them costly to scale to 'in-the-wild' settings. Furthermore, contemporary models and their training schemes struggle with capturing animation fidelity in text-motion alignment. To address these challenges, the '3D Hands in the Wild' (3D-HIW) dataset is introduced, containing 32K 3D hand-motion sequences with aligned text. Additionally, CLUTCH, an LLM-based hand animation system, is proposed, featuring two critical innovations: (a) SHIFT, a novel VQ-VAE architecture for tokenizing hand motion, and (b) a geometric refinement stage to enhance animation quality.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

CLUTCH: Contextualized Language model for Unlocking Text-Conditioned Hand motion modelling in the wild

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误