KPM-Bench:一种用于细粒度运动中心视频理解的运动学解析基准

📄 中文摘要

尽管视频字幕生成模型近年来取得了显著进展,但在准确描述细粒度运动细节方面仍面临重大挑战,并且存在严重的幻觉问题。这些挑战在生成运动中心视频的字幕时尤为突出,因为对复杂动作和肢体动态的精确描绘至关重要,但往往被忽视。为了解决这一问题,提出了一种自动化注释管道,该管道将基于运动学的运动计算与语言解析相结合,使复杂人类动作的详细分解和描述成为可能。在此基础上,构建并发布了运动学解析运动基准(KPM-Bench),这是一个旨在促进运动中心视频理解的开放源代码数据集。

📄 English Summary

KPM-Bench: A Kinematic Parsing Motion Benchmark for Fine-grained Motion-centric Video Understanding

Recent advancements in video captioning models have not fully addressed the challenges of accurately describing fine-grained motion details, leading to significant hallucination issues. These challenges are particularly evident in motion-centric videos, where precise depiction of intricate movements and limb dynamics is essential but often overlooked. To bridge this gap, an automated annotation pipeline is proposed, integrating kinematic-based motion computation with linguistic parsing to enable detailed decomposition and description of complex human motions. Based on this pipeline, the Kinematic Parsing Motion Benchmark (KPM-Bench) is constructed and released as a novel open-source dataset aimed at facilitating fine-grained motion-centric video understanding.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等