Automating Stroke Data Extraction From Free-Text Radiology Reports Using Natural Language Processing: Instrument Validation Study.

Yu, Amy Y X; Liu, Zhongyu A; Pou-Prom, Chloe; Lopes, Kaitlyn; Kapral, Moira K; Aviv, Richard I; Mamdani, Muhammad

Yu, Amy Y X; Liu, Zhongyu A; Pou-Prom, Chloe; Lopes, Kaitlyn; Kapral, Moira K; Aviv, Richard I; Mamdani, Muhammad.

Affiliation

Yu AYX; Department of Medicine (Neurology), University of Toronto - Sunnybrook Health Sciences Centre, Toronto, ON, Canada.
Liu ZA; Department of Medicine (Neurology), University of Toronto - Sunnybrook Health Sciences Centre, Toronto, ON, Canada.
Pou-Prom C; Unity Health Toronto, Toronto, ON, Canada.
Lopes K; Department of Medicine (Neurology), University of Toronto - Sunnybrook Health Sciences Centre, Toronto, ON, Canada.
Kapral MK; Department of Medicine (General Internal Medicine), University of Toronto - University Health Network, Toronto, ON, Canada.
Aviv RI; Department of Radiology, Division of Neuroradiology, University of Ottawa, Ottawa, ON, Canada.
Mamdani M; Department of Medicine, Unity Health Toronto, University of Toronto, Toronto, ON, Canada.

JMIR Med Inform ; 9(5): e24381, 2021 May 04.

Article in En | MEDLINE | ID: mdl-33944791

ABSTRACT

ABSTRACT

BACKGROUND:

Diagnostic neurovascular imaging data are important in stroke research, but obtaining these data typically requires laborious manual chart reviews.

OBJECTIVE:

We aimed to determine the accuracy of a natural language processing (NLP) approach to extract information on the presence and location of vascular occlusions as well as other stroke-related attributes based on free-text reports.

METHODS:

From the full reports of 1320 consecutive computed tomography (CT), CT angiography, and CT perfusion scans of the head and neck performed at a tertiary stroke center between October 2017 and January 2019, we manually extracted data on the presence of proximal large vessel occlusion (primary outcome), as well as distal vessel occlusion, ischemia, hemorrhage, Alberta stroke program early CT score (ASPECTS), and collateral status (secondary outcomes). Reports were randomly split into training (n=921) and validation (n=399) sets, and attributes were extracted using rule-based NLP. We reported the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and the overall accuracy of the NLP approach relative to the manually extracted data.

RESULTS:

The overall prevalence of large vessel occlusion was 12.2%. In the training sample, the NLP approach identified this attribute with an overall accuracy of 97.3% (95.5% sensitivity, 98.1% specificity, 84.1% PPV, and 99.4% NPV). In the validation set, the overall accuracy was 95.2% (90.0% sensitivity, 97.4% specificity, 76.3% PPV, and 98.5% NPV). The accuracy of identifying distal or basilar occlusion as well as hemorrhage was also high, but there were limitations in identifying cerebral ischemia, ASPECTS, and collateral status.

CONCLUSIONS:

NLP may improve the efficiency of large-scale imaging data collection for stroke surveillance and research.

Key words

data extraction; diagnostic imaging; imaging; natural language processing; neurovascular; stroke; stroke surveillance; surveillance

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Type of study: Guideline / Prognostic_studies / Risk_factors_studies Language: En Journal: JMIR Med Inform Year: 2021 Document type: Article Affiliation country: Canada

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google