Automated Identification of Fall-related Injuries in Unstructured Clinical Notes.

Ge, Wendong; Coelho, Lilian M G; Donahue, Maria A; Rice, Hunter J; Blacker, Deborah; Hsu, John; Newhouse, Joseph P; Hernandez-Diaz, Sonia; Haneuse, Sebastien; Westover, M Brandon; Moura, Lidia M V R

Ge, Wendong; Coelho, Lilian M G; Donahue, Maria A; Rice, Hunter J; Blacker, Deborah; Hsu, John; Newhouse, Joseph P; Hernandez-Diaz, Sonia; Haneuse, Sebastien; Westover, M Brandon; Moura, Lidia M V R.

Afiliación

Ge W; Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts.
Coelho LMG; Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts.
Donahue MA; Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts.
Rice HJ; Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts.
Blacker D; Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts.
Hsu J; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts.
Newhouse JP; Department of Psychiatry, Massachusetts General Hospital, Boston, Massachusetts.
Hernandez-Diaz S; Department of Health Care Policy, Harvard Medical School, Boston, Massachusetts.
Haneuse S; Mongan Institute, Massachusetts General Hospital, Boston, Massachusetts.
Westover MB; Department of Medicine, Harvard Medical School, Boston, Massachusetts.
Moura LMVR; Department of Health Care Policy, Harvard Medical School, Boston, Massachusetts.

Am J Epidemiol ; 2024 Jul 26.

Article en En | MEDLINE | ID: mdl-39060160

ABSTRACT

ABSTRACT

Fall-related injuries (FRIs) are a major cause of hospitalizations among older patients, but identifying them in unstructured clinical notes poses challenges for large-scale research. In this study, we developed and evaluated Natural Language Processing (NLP) models to address this issue. We utilized all available clinical notes from the Mass General Brigham for 2,100 older adults, identifying 154,949 paragraphs of interest through automatic scanning for FRI-related keywords. Two clinical experts directly labeled 5,000 paragraphs to generate benchmark-standard labels, while 3,689 validated patterns were annotated, indirectly labeling 93,157 paragraphs as validated-standard labels. Five NLP models, including vanilla BERT, RoBERTa, Clinical-BERT, Distil-BERT, and SVM, were trained using 2,000 benchmark paragraphs and all validated paragraphs. BERT-based models were trained in three stages Masked Language Modeling, General Boolean Question Answering (QA), and QA for FRI. For validation, 500 benchmark paragraphs were used, and the remaining 2,500 for testing. Performance metrics (precision, recall, F1 scores, Area Under ROC [AUROC] or Precision-Recall [AUPR] curves) were employed by comparison, with RoBERTa showing the best performance. Precision was 0.90 [0.88-0.91], recall [0.90-0.93], F1 score 0.90 [0.89-0.92], AUROC and AUPR curves of 0.96 [0.95-0.97]. These NLP models accurately identify FRIs from unstructured clinical notes, potentially enhancing clinical notes-based research efficiency.

Palabras clave

BERT; Fall-related injuries; Machine-Learning; Medicare; Natural Language Processing

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: Am J Epidemiol Año: 2024 Tipo del documento: Article Pais de publicación: Estados Unidos

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google