Your browser doesn't support javascript.
loading
Text mining applied to electronic cardiovascular procedure reports to identify patients with trileaflet aortic stenosis and coronary artery disease.
Small, Aeron M; Kiss, Daniel H; Zlatsin, Yevgeny; Birtwell, David L; Williams, Heather; Guerraty, Marie A; Han, Yuchi; Anwaruddin, Saif; Holmes, John H; Chirinos, Julio A; Wilensky, Robert L; Giri, Jay; Rader, Daniel J.
Afiliación
  • Small AM; Department of Medicine and Cardiovascular Institute, University of Pennsylvania Perelman School of Medicine, PA, USA.
  • Kiss DH; Department of Medicine and Cardiovascular Institute, University of Pennsylvania Perelman School of Medicine, PA, USA.
  • Zlatsin Y; Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA.
  • Birtwell DL; Institute for Translational Medicine and Therapeutics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA.
  • Williams H; Institute for Translational Medicine and Therapeutics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA.
  • Guerraty MA; Department of Medicine and Cardiovascular Institute, University of Pennsylvania Perelman School of Medicine, PA, USA.
  • Han Y; Department of Medicine and Cardiovascular Institute, University of Pennsylvania Perelman School of Medicine, PA, USA.
  • Anwaruddin S; Department of Medicine and Cardiovascular Institute, University of Pennsylvania Perelman School of Medicine, PA, USA.
  • Holmes JH; Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA.
  • Chirinos JA; Department of Medicine and Cardiovascular Institute, University of Pennsylvania Perelman School of Medicine, PA, USA.
  • Wilensky RL; Department of Medicine and Cardiovascular Institute, University of Pennsylvania Perelman School of Medicine, PA, USA.
  • Giri J; Department of Medicine and Cardiovascular Institute, University of Pennsylvania Perelman School of Medicine, PA, USA.
  • Rader DJ; Department of Medicine and Cardiovascular Institute, University of Pennsylvania Perelman School of Medicine, PA, USA; Institute for Translational Medicine and Therapeutics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA; Department of Genetics, University of Pennsylvan
J Biomed Inform ; 72: 77-84, 2017 08.
Article en En | MEDLINE | ID: mdl-28624641
ABSTRACT

BACKGROUND:

Interrogation of the electronic health record (EHR) using billing codes as a surrogate for diagnoses of interest has been widely used for clinical research. However, the accuracy of this methodology is variable, as it reflects billing codes rather than severity of disease, and depends on the disease and the accuracy of the coding practitioner. Systematic application of text mining to the EHR has had variable success for the detection of cardiovascular phenotypes. We hypothesize that the application of text mining algorithms to cardiovascular procedure reports may be a superior method to identify patients with cardiovascular conditions of interest.

METHODS:

We adapted the Oracle product Endeca, which utilizes text mining to identify terms of interest from a NoSQL-like database, for purposes of searching cardiovascular procedure reports and termed the tool "PennSeek". We imported 282,569 echocardiography reports representing 81,164 individuals and 27,205 cardiac catheterization reports representing 14,567 individuals from non-searchable databases into PennSeek. We then applied clinical criteria to these reports in PennSeek to identify patients with trileaflet aortic stenosis (TAS) and coronary artery disease (CAD). Accuracy of patient identification by text mining through PennSeek was compared with ICD-9 billing codes.

RESULTS:

Text mining identified 7115 patients with TAS and 9247 patients with CAD. ICD-9 codes identified 8272 patients with TAS and 6913 patients with CAD. 4346 patients with AS and 6024 patients with CAD were identified by both approaches. A randomly selected sample of 200-250 patients uniquely identified by text mining was compared with 200-250 patients uniquely identified by billing codes for both diseases. We demonstrate that text mining was superior, with a positive predictive value (PPV) of 0.95 compared to 0.53 by ICD-9 for TAS, and a PPV of 0.97 compared to 0.86 for CAD.

CONCLUSION:

These results highlight the superiority of text mining algorithms applied to electronic cardiovascular procedure reports in the identification of phenotypes of interest for cardiovascular research.
Asunto(s)
Palabras clave

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Estenosis de la Válvula Aórtica / Fenotipo / Enfermedad de la Arteria Coronaria / Minería de Datos Tipo de estudio: Prognostic_studies Límite: Humans Idioma: En Revista: J Biomed Inform Asunto de la revista: INFORMATICA MEDICA Año: 2017 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Estenosis de la Válvula Aórtica / Fenotipo / Enfermedad de la Arteria Coronaria / Minería de Datos Tipo de estudio: Prognostic_studies Límite: Humans Idioma: En Revista: J Biomed Inform Asunto de la revista: INFORMATICA MEDICA Año: 2017 Tipo del documento: Article País de afiliación: Estados Unidos