Your browser doesn't support javascript.
loading
Using machine learning to parse breast pathology reports.
Yala, Adam; Barzilay, Regina; Salama, Laura; Griffin, Molly; Sollender, Grace; Bardia, Aditya; Lehman, Constance; Buckley, Julliette M; Coopey, Suzanne B; Polubriaginof, Fernanda; Garber, Judy E; Smith, Barbara L; Gadd, Michele A; Specht, Michelle C; Gudewicz, Thomas M; Guidi, Anthony J; Taghian, Alphonse; Hughes, Kevin S.
Affiliation
  • Yala A; Department of Electrical Engineering and Computer Science, CSAIL, MIT, Cambridge, USA.
  • Barzilay R; Department of Electrical Engineering and Computer Science, CSAIL, MIT, Cambridge, USA.
  • Salama L; Department of Radiation Oncology, MGH, Boston, USA.
  • Griffin M; Division of Surgical Oncology, MGH, Boston, USA. megriff@post.harvard.edu.
  • Sollender G; Geisel School of Medicine at Dartmouth, Hanover, USA.
  • Bardia A; Department of Medical Oncology, MGH, Boston, USA.
  • Lehman C; Department of Radiology, MGH, Boston, USA.
  • Buckley JM; Division of Surgical Oncology, MGH, Boston, USA.
  • Coopey SB; Division of Surgical Oncology, MGH, Boston, USA.
  • Polubriaginof F; Department of Biomedical Informatics, Columbia University, New York, USA.
  • Garber JE; Department of Medical Oncology, DFCI, Boston, USA.
  • Smith BL; Division of Surgical Oncology, MGH, Boston, USA.
  • Gadd MA; Division of Surgical Oncology, MGH, Boston, USA.
  • Specht MC; Division of Surgical Oncology, MGH, Boston, USA.
  • Gudewicz TM; Department of Pathology, MGH, Boston, USA.
  • Guidi AJ; Department of Pathology, NWH, Newton, USA.
  • Taghian A; Department of Radiation Oncology, MGH, Boston, USA.
  • Hughes KS; Division of Surgical Oncology, MGH, Boston, USA.
Breast Cancer Res Treat ; 161(2): 203-211, 2017 01.
Article in En | MEDLINE | ID: mdl-27826755
ABSTRACT

PURPOSE:

Extracting information from electronic medical record is a time-consuming and expensive process when done manually. Rule-based and machine learning techniques are two approaches to solving this problem. In this study, we trained a machine learning model on pathology reports to extract pertinent tumor characteristics, which enabled us to create a large database of attribute searchable pathology reports. This database can be used to identify cohorts of patients with characteristics of interest.

METHODS:

We collected a total of 91,505 breast pathology reports from three Partners hospitals Massachusetts General Hospital, Brigham and Women's Hospital, and Newton-Wellesley Hospital, covering the period from 1978 to 2016. We trained our system with annotations from two datasets, consisting of 6295 and 10,841 manually annotated reports. The system extracts 20 separate categories of information, including atypia types and various tumor characteristics such as receptors. We also report a learning curve analysis to show how much annotation our model needs to perform reasonably.

RESULTS:

The model accuracy was tested on 500 reports that did not overlap with the training set. The model achieved accuracy of 90% for correctly parsing all carcinoma and atypia categories for a given patient. The average accuracy for individual categories was 97%. Using this classifier, we created a database of 91,505 parsed pathology reports.

CONCLUSIONS:

Our learning curve analysis shows that the model can achieve reasonable results even when trained on a few annotations. We developed a user-friendly interface to the database that allows physicians to easily identify patients with target characteristics and export the matching cohort. This model has the potential to reduce the effort required for analyzing large amounts of data from medical records, and to minimize the cost and time required to glean scientific insight from these data.
Subject(s)
Key words

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Breast Neoplasms / Electronic Health Records / Data Mining / Machine Learning Type of study: Prognostic_studies Limits: Female / Humans Language: En Journal: Breast Cancer Res Treat Year: 2017 Document type: Article Affiliation country: United States

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Breast Neoplasms / Electronic Health Records / Data Mining / Machine Learning Type of study: Prognostic_studies Limits: Female / Humans Language: En Journal: Breast Cancer Res Treat Year: 2017 Document type: Article Affiliation country: United States