DEAF:音频语言模型声学忠实度诊断评估基准

📄 中文摘要

DEAF(音频语言模型声学忠实度诊断评估)是一个包含2700多个冲突刺激的基准,旨在系统性地评估音频多模态大型语言模型(Audio MLLMs)在处理声学信号时的真实能力。研究聚焦于三个声学维度:情感韵律、背景声音和说话者身份。通过设计一个多层次的评估框架,逐步增加文本的影响力,从内容中的语义冲突到误导性提示及其组合,能够有效地将内容驱动的偏差与提示引起的偏差进行区分。这一基准为理解音频模型的声学处理能力提供了重要的工具和方法。

📄 English Summary

DEAF: A Benchmark for Diagnostic Evaluation of Acoustic Faithfulness in Audio Language Models

DEAF (Diagnostic Evaluation of Acoustic Faithfulness) is a benchmark comprising over 2,700 conflict stimuli aimed at systematically evaluating the true capabilities of Audio Multimodal Large Language Models (Audio MLLMs) in processing acoustic signals. The study focuses on three acoustic dimensions: emotional prosody, background sounds, and speaker identity. A controlled multi-level evaluation framework is designed to progressively increase textual influence, ranging from semantic conflicts in content to misleading prompts and their combinations. This approach effectively disentangles content-driven bias from prompt-induced bias, providing essential tools and methodologies for understanding the acoustic processing capabilities of audio models.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等