Using natural language processing to facilitate the harmonisation of mental health questionnaires: a validation study using real-world data.

McElroy, Eoin; Wood, Thomas; Bond, Raymond; Mulvenna, Maurice; Shevlin, Mark; Ploubidis, George B; Hoffmann, Mauricio Scopel; Moltrecht, Bettina

McElroy, Eoin; Wood, Thomas; Bond, Raymond; Mulvenna, Maurice; Shevlin, Mark; Ploubidis, George B; Hoffmann, Mauricio Scopel; Moltrecht, Bettina.

Afiliação

McElroy E; School of Psychology, Ulster University, Coleraine, UK. e.mcelroy@ulster.ac.uk.
Wood T; Fast Data Science, London, UK.
Bond R; School of Computing, Ulster University, Belfast, UK.
Mulvenna M; School of Computing, Ulster University, Belfast, UK.
Shevlin M; School of Psychology, Ulster University, Coleraine, UK.
Ploubidis GB; Centre for Longitudinal Studies, University College London, London, UK.
Hoffmann MS; Department of Neuropsychiatry, Universidade Federal de Santa Maria (UFSM), Avenida Roraima 1000, Building 26, office 1353, Santa Maria, 97105-900, Brazil.
Moltrecht B; Graduate Program in Psychiatry and Behavioral Sciences, Universidade Federal Do Rio Grande Do Sul, Rua RamiroBarcelos 2350, Porto Alegre, 90035-003, Brazil.

BMC Psychiatry ; 24(1): 530, 2024 Jul 24.

Article em En | MEDLINE | ID: mdl-39049010

ABSTRACT

ABSTRACT

BACKGROUND:

Pooling data from different sources will advance mental health research by providing larger sample sizes and allowing cross-study comparisons; however, the heterogeneity in how variables are measured across studies poses a challenge to this process.

METHODS:

This study explored the potential of using natural language processing (NLP) to harmonise different mental health questionnaires by matching individual questions based on their semantic content. Using the Sentence-BERT model, we calculated the semantic similarity (cosine index) between 741 pairs of questions from five questionnaires. Drawing on data from a representative UK sample of adults (N = 2,058), we calculated a Spearman rank correlation for each of the same pairs of items, and then estimated the correlation between the cosine values and Spearman coefficients. We also used network analysis to explore the model's ability to uncover structures within the data and metadata.

RESULTS:

We found a moderate overall correlation (r = .48, p < .001) between the two indices. In a holdout sample, the cosine scores predicted the real-world correlations with a small degree of error (MAE = 0.05, MedAE = 0.04, RMSE = 0.064) suggesting the utility of NLP in identifying similar items for cross-study data pooling. Our NLP model could detect more complex patterns in our data, however it required manual rules to decide which edges to include in the network.

CONCLUSIONS:

This research shows that it is possible to quantify the semantic similarity between pairs of questionnaire items from their meta-data, and these similarity indices correlate with how participants would answer the same two items. This highlights the potential of NLP to facilitate cross-study data pooling in mental health research. Nevertheless, researchers are cautioned to verify the psychometric equivalence of matched items.

Assuntos

Saúde Mental; Processamento de Linguagem Natural; Humanos; Inquéritos e Questionários/normas; Adulto; Feminino; Masculino; Semântica; Pessoa de Meia-Idade; Reino Unido

Palavras-chave

Data pooling; Harmonisation; Meta-analysis; Retrospective data harmonisation

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Processamento de Linguagem Natural / Saúde Mental Limite: Adult / Female / Humans / Male / Middle aged País/Região como assunto: Europa Idioma: En Revista: BMC Psychiatry Assunto da revista: PSIQUIATRIA Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Reino Unido País de publicação: Reino Unido

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google