ABSTRACT
Background Patterns of metastasis in cancer are increasingly relevant to prognostication and treatment planning but have historically been documented by means of autopsy series. Purpose To show the feasibility of using natural language processing (NLP) to gather accurate data from radiology reports for assessing spatial and temporal patterns of metastatic spread in a large patient cohort. Materials and Methods In this retrospective longitudinal study, consecutive patients who underwent CT from July 2009 to April 2019 and whose CT reports followed a departmental structured template were included. Three radiologists manually curated a sample of 2219 reports for the presence or absence of metastases across 13 organs; these manually curated reports were used to develop three NLP models with an 80%-20% split for training and test sets. A separate random sample of 448 manually curated reports was used for validation. Model performance was measured by accuracy, precision, and recall for each organ. The best-performing NLP model was used to generate a final database of metastatic disease across all patients. For each cancer type, statistical descriptive reports were provided by analyzing the frequencies of metastatic disease at the report and patient levels. Results In 91 665 patients (mean age ± standard deviation, 61 years ± 15; 46 939 women), 387 359 reports were labeled. The best-performing NLP model achieved accuracies from 90% to 99% across all organs. Metastases were most frequently reported in abdominopelvic (23.6% of all reports) and thoracic (17.6%) nodes, followed by lungs (14.7%), liver (13.7%), and bones (9.9%). Metastatic disease tropism is distinct among common cancers, with the most common first site being bones in prostate and breast cancers and liver among pancreatic and colorectal cancers. Conclusion Natural language processing may be applied to cancer patients' CT reports to generate a large database of metastatic phenotypes. Such a database could be combined with genomic studies and used to explore prognostic imaging phenotypes with relevance to treatment planning. © RSNA, 2021 Online supplemental material is available for this article.
Subject(s)
Data Management/methods , Databases, Factual/statistics & numerical data , Electronic Health Records , Natural Language Processing , Neoplasms/epidemiology , Tomography, X-Ray Computed/methods , Feasibility Studies , Female , Humans , Longitudinal Studies , Male , Middle Aged , Neoplasm Metastasis , Reproducibility of Results , Retrospective StudiesABSTRACT
PURPOSE: To assess the accuracy of a natural language processing (NLP) model in extracting splenomegaly described in patients with cancer in structured computed tomography radiology reports. METHODS: In this retrospective study between July 2009 and April 2019, 3,87,359 consecutive structured radiology reports for computed tomography scans of the chest, abdomen, and pelvis from 91,665 patients spanning 30 types of cancer were included. A randomized sample of 2,022 reports from patients with colorectal cancer, hepatobiliary cancer (HB), leukemia, Hodgkin lymphoma (HL), and non-HL patients was manually annotated as positive or negative for splenomegaly. NLP model training/testing was performed on 1,617/405 reports, and a new validation set of 400 reports from all cancer subtypes was used to test NLP model accuracy, precision, and recall. Overall survival was compared between the patient groups (with and without splenomegaly) using Kaplan-Meier curves. RESULTS: The final cohort included 3,87,359 reports from 91,665 patients (mean age 60.8 years; 51.2% women). In the testing set, the model achieved accuracy of 92.1%, precision of 92.2%, and recall of 92.1% for splenomegaly. In the validation set, accuracy, precision, and recall were 93.8%, 92.9%, and 86.7%, respectively. In the entire cohort, splenomegaly was most frequent in patients with leukemia (32.5%), HB (17.4%), non-HL (9.1%), colorectal cancer (8.5%), and HL (5.6%). A splenomegaly label was associated with an increased risk of mortality in the entire cohort (hazard ratio 2.10; 95% CI, 1.98 to 2.22; P < .001). CONCLUSION: Automated splenomegaly labeling by NLP of radiology report demonstrates good accuracy, precision, and recall. Splenomegaly is most frequently reported in patients with leukemia, followed by patients with HB.
Subject(s)
Colorectal Neoplasms , Leukemia , Radiology , Female , Humans , Male , Middle Aged , Natural Language Processing , Retrospective Studies , Splenomegaly/diagnostic imaging , Splenomegaly/etiologyABSTRACT
PURPOSE: Natural language processing (NLP) applied to radiology reports can help identify clinically relevant M1 subcategories of patients with colorectal cancer (CRC). The primary purpose was to compare the overall survival (OS) of CRC according to American Joint Committee on Cancer TNM staging and explore an alternative classification. The secondary objective was to estimate the frequency of metastasis for each organ. METHODS: Retrospective study of CRC who underwent computed tomography (CT) chest, abdomen, and pelvis between July 1, 2009, and March 26, 2019, at a tertiary cancer center, previously labeled for the presence or absence of metastasis by an NLP prediction model. Patients were classified in M0, M1a, M1b, and M1c (American Joint Committee on Cancer), or an alternative classification on the basis of the metastasis organ number: M1, single; M2, two; M3, three or more organs. Cox regression models were used to estimate hazard ratios; Kaplan-Meier curves were used to visualize survival curves using the two M1 subclassifications. RESULTS: Nine thousand nine hundred twenty-eight patients with a total of 48,408 CT chest, abdomen, and pelvis reports were included. On the basis of NLP prediction, the median OS of M1a, M1b, and M1c was 4.47, 1.72, and 1.52 years, respectively. The median OS of M1, M2, and M3 was 4.24, 2.05, and 1.04 years, respectively. Metastases occurred most often in liver (35.8%), abdominopelvic lymph nodes (32.9%), lungs (29.3%), peritoneum (22.0%), thoracic nodes (19.9%), bones (9.2%), and pelvic organs (7.5%). Spleen and adrenal metastases occurred in < 5%. CONCLUSION: NLP applied to a large radiology report database can identify clinically relevant metastatic phenotypes and be used to investigate new M1 substaging for CRC. Patients with three or more metastatic disease organs have the worst prognosis, with an OS of 1 year.
Subject(s)
Colorectal Neoplasms , Natural Language Processing , Colorectal Neoplasms/diagnostic imaging , Colorectal Neoplasms/pathology , Humans , Phenotype , Prognosis , Retrospective Studies , Tomography , Tomography, X-Ray ComputedABSTRACT
The development of digital cancer twins relies on the capture of high-resolution representations of individual cancer patients throughout the course of their treatment. Our research aims to improve the detection of metastatic disease over time from structured radiology reports by exposing prediction models to historical information. We demonstrate that Natural language processing (NLP) can generate better weak labels for semi-supervised classification of computed tomography (CT) reports when it is exposed to consecutive reports through a patient's treatment history. Around 714,454 structured radiology reports from Memorial Sloan Kettering Cancer Center adhering to a standardized departmental structured template were used for model development with a subset of the reports included for validation. To develop the models, a subset of the reports was curated for ground-truth: 7,732 total reports in the lung metastases dataset from 867 individual patients; 2,777 reports in the liver metastases dataset from 315 patients; and 4,107 reports in the adrenal metastases dataset from 404 patients. We use NLP to extract and encode important features from the structured text reports, which are then used to develop, train, and validate models. Three models-a simple convolutional neural network (CNN), a CNN augmented with an attention layer, and a recurrent neural network (RNN)-were developed to classify the type of metastatic disease and validated against the ground truth labels. The models use features from consecutive structured text radiology reports of a patient to predict the presence of metastatic disease in the reports. A single-report model, previously developed to analyze one report instead of multiple past reports, is included and the results from all four models are compared based on accuracy, precision, recall, and F1-score. The best model is used to label all 714,454 reports to generate metastases maps. Our results suggest that NLP models can extract cancer progression patterns from multiple consecutive reports and predict the presence of metastatic disease in multiple organs with higher performance when compared with a single-report-based prediction. It demonstrates a promising automated approach to label large numbers of radiology reports without involving human experts in a time- and cost-effective manner and enables tracking of cancer progression over time.