Using natural language processing and machine learning to replace human content coders.

Wang, Yilei; Tian, Jingyuan; Yazar, Yagizhan; Ones, Deniz S; Landers, Richard N

Wang, Yilei; Tian, Jingyuan; Yazar, Yagizhan; Ones, Deniz S; Landers, Richard N.

Afiliación

Wang Y; Department of Psychology.
Tian J; Department of Psychology.
Yazar Y; Department of Psychology.
Ones DS; Department of Psychology.
Landers RN; Department of Psychology.

Psychol Methods ; 2022 Aug 25.

Article en En | MEDLINE | ID: mdl-36006759

RESUMEN

Content analysis is a common and flexible technique to quantify and make sense of qualitative data in psychological research. However, the practical implementation of content analysis is extremely labor-intensive and subject to human coder errors. Applying natural language processing (NLP) techniques can help address these limitations. We explain and illustrate these techniques to psychological researchers. For this purpose, we first present a study exploring the creation of psychometrically meaningful predictions of human content codes. Using an existing database of human content codes, we build an NLP algorithm to validly predict those codes, at generally acceptable standards. We then conduct a Monte-Carlo simulation to model how four dataset characteristics (i.e., sample size, unlabeled proportion of cases, classification base rate, and human coder reliability) influence content classification performance. The simulation indicated that the influence of sample size and unlabeled proportion on model classification performance tended to be curvilinear. In addition, base rate and human coder reliability had a strong effect on classification performance. Finally, using these results, we offer practical recommendations to psychologists on the necessary dataset characteristics to achieve valid prediction of content codes to guide researchers on the use of NLP models to replace human coders in content analysis research. (PsycInfo Database Record (c) 2022 APA, all rights reserved).

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Tipo de estudio: Guideline / Prognostic_studies / Qualitative_research Idioma: En Revista: Psychol Methods Asunto de la revista: PSICOLOGIA Año: 2022 Tipo del documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google