无标记手语翻译:对该领域进展的无偏评估

📄 中文摘要

手语翻译(SLT)旨在自动将视觉手语视频转换为口语文本,反之亦然。近年来,尽管该领域取得了快速进展,但性能提升的真正来源往往不明确。报告的性能提升是源于方法上的创新,还是由于不同的基础架构、训练优化、超参数调整,甚至是评估指标计算的差异?该研究通过在统一的代码库中重新实现关键贡献,提供了对近期无标记手语翻译模型的全面研究。通过标准化预处理、视频编码器和训练设置,确保了各方法之间的公平比较。分析结果显示,许多性能提升的来源并非如预期的那样。

📄 English Summary

Gloss-Free Sign Language Translation: An Unbiased Evaluation of Progress in the Field

Sign Language Translation (SLT) aims to automatically convert visual sign language videos into spoken language text and vice versa. Despite rapid advancements in recent years, the true sources of performance improvements often remain ambiguous. Performance gains may stem from methodological innovations, the selection of different backbones, training optimizations, hyperparameter tuning, or even variations in evaluation metric calculations. This study presents a comprehensive investigation of recent gloss-free SLT models by re-implementing key contributions within a unified codebase. A fair comparison is ensured by standardizing preprocessing, video encoders, and training setups across all methods. The analysis reveals that many of the performance enhancements are not as straightforward as anticipated.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等