SemanticMoments:通过第三矩特征实现无训练的运动相似性

📄 中文摘要

检索基于语义运动的视频是一个基本但尚未解决的问题。现有的视频表示方法过于依赖静态外观和场景上下文,而忽视了运动动态,这种偏差源自其训练数据和目标。传统的运动中心输入如光流缺乏理解高层次运动所需的语义基础。为证明这种固有偏差,提出了SimMotion基准,结合了受控的合成数据和新的人工标注的真实世界数据集。研究显示,现有模型在这些基准上的表现不佳,常常无法将运动与外观区分开。为了解决这一问题,提出了SemanticMoments,这是一种简单的无训练方法,旨在改善运动相似性检索。该方法通过第三矩特征有效捕捉运动信息,提供了一种新的视角。

📄 English Summary

SemanticMoments: Training-Free Motion Similarity via Third Moment Features

Retrieving videos based on semantic motion is a fundamental yet unresolved challenge. Existing video representation methods overly depend on static appearance and scene context, neglecting motion dynamics, a bias inherited from their training data and objectives. Traditional motion-centric inputs, such as optical flow, lack the semantic grounding necessary for understanding high-level motion. To illustrate this inherent bias, the SimMotion benchmarks are introduced, combining controlled synthetic data with a new human-annotated real-world dataset. The results reveal that existing models perform poorly on these benchmarks, often failing to disentangle motion from appearance. To bridge this gap, SemanticMoments is proposed, a simple, training-free method that effectively captures motion information through third moment features, offering a new perspective on motion similarity retrieval.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等