📄 中文摘要
反事实解释(CFEs)作为一种重要的可解释人工智能(XAI)技术,旨在通过提供最小的输入扰动来改变模型预测,从而帮助用户理解模型决策。然而,现有CFEs方法缺乏统一的理论框架,导致其质量评估和比较困难。本文首次提出了CFEs的公理基础,通过引入形式化公理来刻画理想CFEs的期望属性。这些公理涵盖了CFEs的有效性、稀疏性、可行性、稳定性、多样性和公平性等关键方面。作者证明了这些公理是相互独立的,并探讨了它们之间的关系。此外,本文还提出了一个基于公理的CFEs评估框架,并使用该框架对现有主流CFEs方法进行了实证分析。研究结果表明,没有单一方法能完全满足所有公理,不同方法在不同公理上表现出权衡。这项工作为CFEs的设计、评估和比较提供了严格的理论基础和指导,有助于推动可解释人工智能领域的发展,并为未来CFEs方法的研究提供了新
📄 English Summary
Axiomatic Foundations of Counterfactual Explanations
Counterfactual explanations (CFEs) are a prominent technique in explainable artificial intelligence (XAI), aiming to help users understand model decisions by identifying minimal input perturbations that alter a model's prediction. Despite their growing popularity, current CFE methods lack a unified theoretical framework, making it challenging to assess and compare their quality systematically. This paper addresses this gap by proposing the first axiomatic foundations for CFEs. The authors introduce a set of formal axioms that characterize desirable properties of ideal CFEs, encompassing key aspects such as validity, sparsity, feasibility, stability, diversity, and fairness. They demonstrate the independence of these axioms and explore their interrelationships. Furthermore, the paper proposes an axiom-based evaluation framework for CFEs and empirically analyzes several state-of-the-art CFE methods using this framework. The findings reveal that no single method perfectly satisfies all axioms, and different methods exhibit trade-offs across various axiomatic properties. This work provides a rigorous theoretical grounding and guidance for the design, evaluation, and comparison of CFEs, contributing significantly to the advancement of explainable AI. It also opens new avenues for future research in developing more robust and principled CFE methods that align with these foundational principles.