Closing the loop: automatically identifying abnormal imaging results in scanned documents.

Kumar, Akshat; Goodrum, Heath; Kim, Ashley; Stender, Carly; Roberts, Kirk; Bernstam, Elmer V

Kumar, Akshat; Goodrum, Heath; Kim, Ashley; Stender, Carly; Roberts, Kirk; Bernstam, Elmer V.

Affiliation

Kumar A; School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, USA.
Goodrum H; McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, Texas, USA.
Kim A; School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, USA.
Stender C; McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, Texas, USA.
Roberts K; McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, Texas, USA.
Bernstam EV; School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, USA.

J Am Med Inform Assoc ; 29(5): 831-840, 2022 04 13.

Article in En | MEDLINE | ID: mdl-35146510

ABSTRACT

OBJECTIVES: Scanned documents (SDs), while common in electronic health records and potentially rich in clinically relevant information, rarely fit well with clinician workflow. Here, we identify scanned imaging reports requiring follow-up with high recall and practically useful precision. MATERIALS AND METHODS: We focused on identifying imaging findings for 3 common causes of malpractice claims: (1) potentially malignant breast (mammography) and (2) lung (chest computed tomography [CT]) lesions and (3) long-bone fracture (X-ray) reports. We train our ClinicalBERT-based pipeline on existing typed/dictated reports classified manually or using ICD-10 codes, evaluate using a test set of manually classified SDs, and compare against string-matching (baseline approach). RESULTS: A total of 393 mammograms, 305 chest CT, and 683 bone X-ray reports were manually reviewed. The string-matching approach had an F1 of 0.667. For mammograms, chest CTs, and bone X-rays, respectively: models trained on manually classified training data and optimized for F1 reached an F1 of 0.900, 0.905, and 0.817, while separate models optimized for recall achieved a recall of 1.000 with precisions of 0.727, 0.518, and 0.275. Models trained on ICD-10-labelled data and optimized for F1 achieved F1 scores of 0.647, 0.830, and 0.643, while those optimized for recall achieved a recall of 1.0 with precisions of 0.407, 0.683, and 0.358. DISCUSSION: Our pipeline can identify abnormal reports with potentially useful performance and so decrease the manual effort required to screen for abnormal findings that require follow-up. CONCLUSION: It is possible to automatically identify clinically significant abnormalities in SDs with high recall and practically useful precision in a generalizable and minimally laborious way.

Subject(s)

Electronic Health Records; Tomography, X-Ray Computed; Natural Language Processing; Research Report

Key words

classification; electronic health records; machine learning; natural language processing; radiology

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Tomography, X-Ray Computed / Electronic Health Records Type of study: Guideline / Prognostic_studies Language: En Journal: J Am Med Inform Assoc Journal subject: INFORMATICA MEDICA Year: 2022 Document type: Article Affiliation country: United States Country of publication: United kingdom

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google