SelfieAvatar:从自拍视频实时驱动头部虚拟形象

📄 中文摘要

头部虚拟形象重演技术专注于从单目视频创建可动画化的个性化虚拟形象,是社交信号理解、游戏、人机交互和计算机视觉等应用的基础组件。近年来,基于3D可变形模型(3DMM)的面部重建方法在实现高保真面部估计方面取得了显著进展。然而,这些方法在捕捉整个头部(包括头发、耳朵和颈部等非面部区域)方面存在不足,导致生成的虚拟形象在这些区域缺乏真实感和细节。此外,现有的头部虚拟形象重演系统通常需要复杂的设置、长时间的训练或大量的计算资源,这限制了它们在实时应用中的实用性,尤其是在资源受限的移动设备上。为了解决这些挑战,SelfieAvatar提出了一种新颖的实时头部虚拟形象重演框架。该框架首先通过结合深度学习和传统几何方法,从单个自拍视频中构建一个高质量的头部虚拟形象。具体来说,它利用了先进的神经渲染技术来捕捉面部细节和纹理,并辅以显式几何建模来重建头发和颈部区域,从而确保了整个头部的逼真度。SelfieAvatar引入了一种创新的头部姿态和表情解耦机制,允许用户通过简单的输入(例如另一个自拍视频或一系列姿态参数)来驱动虚拟形象,实现精细的控制。为了优化实时性能,该系统采用了轻量级的神经网络架构和高效的渲染管线,能够在标准消费级硬件上实现高帧率渲染,而无需牺牲视觉质量。在训练阶段,SelfieAvatar利用自监督学习范式,减少了对大量标注数据的依赖,从而降低了数据采集和标注成本。实验证明,SelfieAvatar在生成逼真、可控的头部虚拟形象方面优于现有技术,并且能够以实时帧率进行重演,为虚拟会议、内容创作和增强现实应用提供了强大的工具。

📄 English Summary

SelfieAvatar: Real-time Head Avatar reenactment from a Selfie Video

Head avatar reenactment technology focuses on creating animatable personal avatars from monocular videos, serving as a foundational element for applications like social signal understanding, gaming, human-machine interaction, and computer vision. Recent advances in 3D Morphable Model (3DMM)-based facial reconstruction methods have achieved remarkable high-fidelity face estimation. However, these methods often struggle to capture the entire head, including non-facial regions such as hair, ears, and neck, leading to a lack of realism and detail in these areas of the generated avatars. Furthermore, existing head avatar reenactment systems typically require complex setups, lengthy training times, or substantial computational resources, which limits their practicality for real-time applications, especially on resource-constrained mobile devices. To address these challenges, SelfieAvatar proposes a novel real-time head avatar reenactment framework. This framework first constructs a high-quality head avatar from a single selfie video by combining deep learning and traditional geometric methods. Specifically, it leverages advanced neural rendering techniques to capture facial details and textures, complemented by explicit geometric modeling to reconstruct hair and neck regions, thereby ensuring the realism of the entire head. SelfieAvatar introduces an innovative decoupling mechanism for head pose and expression, allowing users to drive the avatar with simple inputs (e.g., another selfie video or a set of pose parameters) for fine-grained control. To optimize for real-time performance, the system employs a lightweight neural network architecture and an efficient rendering pipeline, enabling high frame rate rendering on standard consumer hardware without sacrificing visual quality. During the training phase, SelfieAvatar utilizes a self-supervised learning paradigm, reducing reliance on large amounts of annotated data and thus lowering data acquisition and annotation costs. Experimental results demonstrate that SelfieAvatar outperforms existing techniques in generating realistic and controllable head avatars, capable of real-time reenactment, providing a powerful tool for virtual meetings, content creation, and augmented reality applications.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等