Assessing dimensions of thought disorder with large language models: The tradeoff of accuracy and consistency.

Pugh, Samuel L; Chandler, Chelsea; Cohen, Alex S; Diaz-Asper, Catherine; Elvevåg, Brita; Foltz, Peter W

Pugh, Samuel L; Chandler, Chelsea; Cohen, Alex S; Diaz-Asper, Catherine; Elvevåg, Brita; Foltz, Peter W.

Afiliação

Pugh SL; Department of Computer Science, University of Colorado Boulder, United States; Institute of Cognitive Science, University of Colorado Boulder, United States.
Chandler C; Institute of Cognitive Science, University of Colorado Boulder, United States.
Cohen AS; Department of Psychology, Louisiana State University, United States; Center for Computation and Technology, Louisiana State University, United States.
Diaz-Asper C; Department of Psychology, Marymount University, United States.
Elvevåg B; Department of Clinical Medicine, University of Tromsø-The Arctic University of Norway, Norway; Norwegian Center for Clinical Artificial Intelligence, University Hospital of North Norway, Norway. Electronic address: brita.elvevag@uit.no.
Foltz PW; Institute of Cognitive Science, University of Colorado Boulder, United States.

Psychiatry Res ; 341: 116119, 2024 Nov.

Article em En | MEDLINE | ID: mdl-39226873

ABSTRACT

ABSTRACT

Natural Language Processing (NLP) methods have shown promise for the assessment of formal thought disorder, a hallmark feature of schizophrenia in which disturbances to the structure, organization, or coherence of thought can manifest as disordered or incoherent speech. We investigated the suitability of modern Large Language Models (LLMs - e.g., GPT-3.5, GPT-4, and Llama 3) to predict expert-generated ratings for three dimensions of thought disorder (coherence, content, and tangentiality) assigned to speech samples collected from both patients with a diagnosis of schizophrenia (n = 26) and healthy control participants (n = 25). In addition to (1) evaluating the accuracy of LLM-generated ratings relative to human experts, we also (2) investigated the degree to which the LLMs produced consistent ratings across multiple trials, and we (3) sought to understand the factors that impacted the consistency of LLM-generated output. We found that machine-generated ratings of the level of thought disorder in speech matched favorably those of expert humans, and we identified a tradeoff between accuracy and consistency in LLM ratings. Unlike traditional NLP methods, LLMs were not always consistent in their predictions, but these inconsistencies could be mitigated with careful parameter selection and ensemble methods. We discuss implications for NLP-based assessment of thought disorder and provide recommendations of best practices for integrating these methods in the field of psychiatry.

Assuntos

Processamento de Linguagem Natural; Esquizofrenia; Pensamento; Humanos; Feminino; Esquizofrenia/diagnóstico; Esquizofrenia/fisiopatologia; Masculino; Adulto; Pensamento/fisiologia; Pessoa de Meia-Idade; Psicologia do Esquizofrênico

Palavras-chave

Incoherence; LLM; Language; Natural language processing; Psychiatry; Schizophrenia; Speech

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Esquizofrenia / Pensamento / Processamento de Linguagem Natural Limite: Adult / Female / Humans / Male / Middle aged Idioma: En Revista: Psychiatry Res Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google