FuzzingRL: 强化模糊测试揭示视觉语言模型的缺陷

📄 中文摘要

视觉语言模型(VLM)容易出现错误,识别这些错误发生的地方对于确保人工智能系统的可靠性和安全性至关重要。研究提出了一种自动生成问题的方法,旨在故意诱导VLM产生错误响应,从而揭示其脆弱性。该方法的核心在于模糊测试和强化微调:通过视觉和语言模糊化,将单一输入查询转化为大量多样化的变体。基于模糊测试的结果,问题生成器通过对抗性强化微调进行进一步指导,以生成越来越具挑战性的查询,从而触发模型失败。通过这种方法,可以持续识别和分析VLM的潜在缺陷。

📄 English Summary

FuzzingRL: Reinforcement Fuzz-Testing for Revealing VLM Failures

Vision Language Models (VLMs) are susceptible to errors, making it crucial to identify where these errors occur to ensure the reliability and safety of AI systems. This research proposes an approach that automatically generates questions designed to deliberately induce incorrect responses from VLMs, thereby revealing their vulnerabilities. The core of this approach lies in fuzz testing and reinforcement fine-tuning: a single input query is transformed into a large set of diverse variants through vision and language fuzzing. Based on the outcomes of fuzz testing, the question generator is further guided by adversarial reinforcement fine-tuning to produce increasingly challenging queries that trigger model failures. This method allows for the consistent identification and analysis of potential flaws in VLMs.

Powered by Cloudflare Workers + Payload CMS + Claude 3.5

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等