Testing domain knowledge and risk of bias of a large-scale general artificial intelligence model in mental health.

Heinz, Michael V; Bhattacharya, Sukanya; Trudeau, Brianna; Quist, Rachel; Song, Seo Ho; Lee, Camilla M; Jacobson, Nicholas C

Heinz, Michael V; Bhattacharya, Sukanya; Trudeau, Brianna; Quist, Rachel; Song, Seo Ho; Lee, Camilla M; Jacobson, Nicholas C.

Afiliação

Heinz MV; Center for Technology and Behavioral Health, Geisel School of Medicine, Dartmouth College, Lebanon, NH, USA.
Bhattacharya S; Department of Psychiatry, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA.
Trudeau B; Center for Technology and Behavioral Health, Geisel School of Medicine, Dartmouth College, Lebanon, NH, USA.
Quist R; Center for Technology and Behavioral Health, Geisel School of Medicine, Dartmouth College, Lebanon, NH, USA.
Song SH; Center for Technology and Behavioral Health, Geisel School of Medicine, Dartmouth College, Lebanon, NH, USA.
Lee CM; Center for Technology and Behavioral Health, Geisel School of Medicine, Dartmouth College, Lebanon, NH, USA.
Jacobson NC; Center for Technology and Behavioral Health, Geisel School of Medicine, Dartmouth College, Lebanon, NH, USA.

Digit Health ; 9: 20552076231170499, 2023.

Article em En | MEDLINE | ID: mdl-37101589

ABSTRACT

ABSTRACT

Background:

With a rapidly expanding gap between the need for and availability of mental health care, artificial intelligence (AI) presents a promising, scalable solution to mental health assessment and treatment. Given the novelty and inscrutable nature of such systems, exploratory measures aimed at understanding domain knowledge and potential biases of such systems are necessary for ongoing translational development and future deployment in high-stakes healthcare settings.

Methods:

We investigated the domain knowledge and demographic bias of a generative, AI model using contrived clinical vignettes with systematically varied demographic features. We used balanced accuracy (BAC) to quantify the model's performance. We used generalized linear mixed-effects models to quantify the relationship between demographic factors and model interpretation.

Findings:

We found variable model performance across diagnoses; attention deficit hyperactivity disorder, posttraumatic stress disorder, alcohol use disorder, narcissistic personality disorder, binge eating disorder, and generalized anxiety disorder showed high BAC (0.70 ≤ BAC ≤ 0.82); bipolar disorder, bulimia nervosa, barbiturate use disorder, conduct disorder, somatic symptom disorder, benzodiazepine use disorder, LSD use disorder, histrionic personality disorder, and functional neurological symptom disorder showed low BAC (BAC ≤ 0.59).

Interpretation:

Our findings demonstrate initial promise in the domain knowledge of a large AI model, with performance variability perhaps due to the more salient hallmark symptoms, narrower differential diagnosis, and higher prevalence of some disorders. We found limited evidence of model demographic bias, although we do observe some gender and racial differences in model outcomes mirroring real-world differential prevalence estimates.

Palavras-chave

Digital health; artificial intelligence; bias in mental health; digital mental health; digital mental health assessment

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Contexto em Saúde: 8_ODS3_consumo_sustancias_psicoactivas Base de dados: MEDLINE Tipo de estudo: Etiology_studies / Prognostic_studies / Risk_factors_studies Idioma: En Revista: Digit Health Ano de publicação: 2023 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google