CX-ToM: Counterfactual explanations with theory-of-mind for enhancing human trust in image recognition models.

Akula, Arjun R; Wang, Keze; Liu, Changsong; Saba-Sadiya, Sari; Lu, Hongjing; Todorovic, Sinisa; Chai, Joyce; Zhu, Song-Chun

Akula, Arjun R; Wang, Keze; Liu, Changsong; Saba-Sadiya, Sari; Lu, Hongjing; Todorovic, Sinisa; Chai, Joyce; Zhu, Song-Chun.

Afiliação

Akula AR; Department of Statistics, UCLA, Los Angeles, CA 90024, USA.
Wang K; Department of Statistics, UCLA, Los Angeles, CA 90024, USA.
Liu C; Department of Statistics, UCLA, Los Angeles, CA 90024, USA.
Saba-Sadiya S; Department of Computer Science, University of Michigan, Ann Arbor, MI 48109, USA.
Lu H; Department of Statistics, UCLA, Los Angeles, CA 90024, USA.
Todorovic S; Department of Computer Science, Oregon State University, Corvallis, OR 97331, USA.
Chai J; Department of Computer Science, University of Michigan, Ann Arbor, MI 48109, USA.
Zhu SC; Beijing Institute for General AI (BIGAI), Tsinghua University, Peking University, Beijing 100871, China.

iScience ; 25(1): 103581, 2022 Jan 21.

Article em En | MEDLINE | ID: mdl-35036861

ABSTRACT

ABSTRACT

We propose CX-ToM, short for counterfactual explanations with theory-of-mind, a new explainable AI (XAI) framework for explaining decisions made by a deep convolutional neural network (CNN). In contrast to the current methods in XAI that generate explanations as a single shot response, we pose explanation as an iterative communication process, i.e., dialogue between the machine and human user. More concretely, our CX-ToM framework generates a sequence of explanations in a dialogue by mediating the differences between the minds of the machine and human user. To do this, we use Theory of Mind (ToM) which helps us in explicitly modeling the human's intention, the machine's mind as inferred by the human, as well as human's mind as inferred by the machine. Moreover, most state-of-the-art XAI frameworks provide attention (or heat map) based explanations. In our work, we show that these attention-based explanations are not sufficient for increasing human trust in the underlying CNN model. In CX-ToM, we instead use counterfactual explanations called fault-lines which we define as follows given an input image I for which a CNN classification model M predicts class c pred , a fault-line identifies the minimal semantic-level features (e.g., stripes on zebra), referred to as explainable concepts, that need to be added to or deleted from I to alter the classification category of I by M to another specified class c alt . Extensive experiments verify our hypotheses, demonstrating that our CX-ToM significantly outperforms the state-of-the-art XAI models.

Palavras-chave

Artificial intelligence; Computer science; Human-computer interaction

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Tipo de estudo: Prognostic_studies Idioma: En Revista: IScience Ano de publicação: 2022 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google