当人工智能说服：针对人类信任的人工智能辅助决策的对抗性解释攻击

出处: When AI Persuades: Adversarial Explanation Attacks on Human Trust in AI-Assisted Decision Making

发布: 2026年2月6日

📄 中文摘要

人工智能辅助决策中，通过操纵解释影响人类信任的对抗性攻击值得关注。即使底层模型性能保持不变，攻击者也能通过生成误导性解释，显著改变用户对人工智能的信任度，进而影响人类决策。具体而言，攻击者可利用解释的复杂性、一致性或归因方式等特定特征，诱导用户过度信任或不信任人工智能建议。这种攻击不仅限于模型性能的感知，更深层次地改变了人类对人工智能决策过程的理解和接受程度。多种攻击策略经用户研究验证有效，揭示了当前可解释人工智能（XAI）系统在抵御此类攻击方面的脆弱性。研究结果强调了开发鲁棒的XAI方法以防范恶意操纵的重要性，并呼吁在设计和部署人工智能系统时，需充分考虑其解释可能被滥用的风险，以维护人机协作的可靠性。

🏷️ 相关标签

#对抗性攻击 #可解释人工智能 #人类信任 #决策制定 #安全威胁

📄 English Summary

When AI Persuades: Adversarial Explanation Attacks on Human Trust in AI-Assisted Decision Making

This research investigates adversarial explanation attacks targeting human trust in AI-assisted decision-making. We demonstrate that attackers can significantly manipulate user trust in AI, and consequently influence human decisions, by generating misleading explanations, even when the underlying model's performance remains unchanged. Specifically, attackers can exploit specific characteristics of explanations, such as complexity, consistency, or attribution methods, to induce either over-trust or under-trust in AI recommendations. These attacks go beyond merely altering the perception of model performance; they fundamentally change how humans understand and accept AI decision processes. We designed and implemented various attack strategies, validating their effectiveness through user studies, which revealed the vulnerability of current explainable AI (XAI) systems to such manipulations. Our findings underscore the critical importance of developing robust XAI methods to defend against malicious manipulation. Furthermore, this study calls for careful consideration of the potential for explanation misuse during the design and deployment of AI systems, to maintain the reliability and security of human-AI collaboration. This work offers a novel perspective on understanding and mitigating potential security threats posed by AI explanations.

数据源: OpenAI, Google AI, DeepMind, AWS ML Blog, HuggingFace 等

📄 中文摘要

🏷️ 相关标签

📄 English Summary

When AI Persuades: Adversarial Explanation Attacks on Human Trust in AI-Assisted Decision Making

🏷️ Related Tags

📚 相关文章

AI 编程创造了新一类创作者。我就是其中之一。

人工智能成为我学习的助手

Claude CLI "泄露": 没有人赢，AI 仍然幻觉，企业仍在犯同样的错误