Using Natural Language Processing to Evaluate the Quality of Supervisor Narrative Comments in Competency-Based Medical Education.

Spadafore, Maxwell; Yilmaz, Yusuf; Rally, Veronica; Chan, Teresa M; Russell, Mackenzie; Thoma, Brent; Singh, Sim; Monteiro, Sandra; Pardhan, Alim; Martin, Lynsey; Monrad, Seetha U; Woods, Rob

Spadafore, Maxwell; Yilmaz, Yusuf; Rally, Veronica; Chan, Teresa M; Russell, Mackenzie; Thoma, Brent; Singh, Sim; Monteiro, Sandra; Pardhan, Alim; Martin, Lynsey; Monrad, Seetha U; Woods, Rob.

Acad Med ; 99(5): 534-540, 2024 05 01.

Article en En | MEDLINE | ID: mdl-38232079

ABSTRACT

ABSTRACT

PURPOSE:

Learner development and promotion rely heavily on narrative assessment comments, but narrative assessment quality is rarely evaluated in medical education. Educators have developed tools such as the Quality of Assessment for Learning (QuAL) tool to evaluate the quality of narrative assessment comments; however, scoring the comments generated in medical education assessment programs is time intensive. The authors developed a natural language processing (NLP) model for applying the QuAL score to narrative supervisor comments.

METHOD:

Samples of 2,500 Entrustable Professional Activities assessments were randomly extracted and deidentified from the McMaster (1,250 comments) and Saskatchewan (1,250 comments) emergency medicine (EM) residency training programs during the 2019-2020 academic year. Comments were rated using the QuAL score by 25 EM faculty members and 25 EM residents. The results were used to develop and test an NLP model to predict the overall QuAL score and QuAL subscores.

RESULTS:

All 50 raters completed the rating exercise. Approximately 50% of the comments had perfect agreement on the QuAL score, with the remaining resolved by the study authors. Creating a meaningful suggestion for improvement was the key differentiator between high- and moderate-quality feedback. The overall QuAL model predicted the exact human-rated score or 1 point above or below it in 87% of instances. Overall model performance was excellent, especially regarding the subtasks on suggestions for improvement and the link between resident performance and improvement suggestions, which achieved 85% and 82% balanced accuracies, respectively.

CONCLUSIONS:

This model could save considerable time for programs that want to rate the quality of supervisor comments, with the potential to automatically score a large volume of comments. This model could be used to provide faculty with real-time feedback or as a tool to quantify and track the quality of assessment comments at faculty, rotation, program, or institution levels.

Asunto(s)

Educación Basada en Competencias; Internado y Residencia; Procesamiento de Lenguaje Natural; Humanos; Educación Basada en Competencias/métodos; Internado y Residencia/normas; Competencia Clínica/normas; Narración; Evaluación Educacional/métodos; Evaluación Educacional/normas; Medicina de Emergencia/educación; Docentes Médicos/normas

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Procesamiento de Lenguaje Natural / Educación Basada en Competencias / Internado y Residencia Tipo de estudio: Prognostic_studies Límite: Humans Idioma: En Revista: Acad Med Asunto de la revista: EDUCACAO Año: 2024 Tipo del documento: Article

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google