Development and validation of phenotype classifiers across multiple sites in the observational health data sciences and informatics network.

Kashyap, Mehr; Seneviratne, Martin; Banda, Juan M; Falconer, Thomas; Ryu, Borim; Yoo, Sooyoung; Hripcsak, George; Shah, Nigam H

Kashyap, Mehr; Seneviratne, Martin; Banda, Juan M; Falconer, Thomas; Ryu, Borim; Yoo, Sooyoung; Hripcsak, George; Shah, Nigam H.

Afiliación

Kashyap M; Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, California, USA.
Seneviratne M; Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, California, USA.
Banda JM; Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, California, USA.
Falconer T; Department of Computer Science, Georgia State University, Atlanta, Georgia, USA.
Ryu B; Department of Biomedical Informatics, Columbia University, New York, New York, USA.
Yoo S; Office of eHealth and Business, Seoul National University Bundang Hospital, Gyeonggi-do, South Korea.
Hripcsak G; Office of eHealth and Business, Seoul National University Bundang Hospital, Gyeonggi-do, South Korea.
Shah NH; Department of Biomedical Informatics, Columbia University, New York, New York, USA.

J Am Med Inform Assoc ; 27(6): 877-883, 2020 06 01.

Article en En | MEDLINE | ID: mdl-32374408

RESUMEN

OBJECTIVE: Accurate electronic phenotyping is essential to support collaborative observational research. Supervised machine learning methods can be used to train phenotype classifiers in a high-throughput manner using imperfectly labeled data. We developed 10 phenotype classifiers using this approach and evaluated performance across multiple sites within the Observational Health Data Sciences and Informatics (OHDSI) network. MATERIALS AND METHODS: We constructed classifiers using the Automated PHenotype Routine for Observational Definition, Identification, Training and Evaluation (APHRODITE) R-package, an open-source framework for learning phenotype classifiers using datasets in the Observational Medical Outcomes Partnership Common Data Model. We labeled training data based on the presence of multiple mentions of disease-specific codes. Performance was evaluated on cohorts derived using rule-based definitions and real-world disease prevalence. Classifiers were developed and evaluated across 3 medical centers, including 1 international site. RESULTS: Compared to the multiple mentions labeling heuristic, classifiers showed a mean recall boost of 0.43 with a mean precision loss of 0.17. Performance decreased slightly when classifiers were shared across medical centers, with mean recall and precision decreasing by 0.08 and 0.01, respectively, at a site within the USA, and by 0.18 and 0.10, respectively, at an international site. DISCUSSION AND CONCLUSION: We demonstrate a high-throughput pipeline for constructing and sharing phenotype classifiers across sites within the OHDSI network using APHRODITE. Classifiers exhibit good portability between sites within the USA, however limited portability internationally, indicating that classifier generalizability may have geographic limitations, and, consequently, sharing the classifier-building recipe, rather than the pretrained classifiers, may be more useful for facilitating collaborative observational research.

Asunto(s)

Registros Electrónicos de Salud/clasificación; Informática Médica; Aprendizaje Automático Supervisado; Clasificación/métodos; Ciencia de los Datos; Humanos; Estudios Observacionales como Asunto

Palabras clave

cohort identification; electronic health records; electronic phenotyping; machine learning; phenotype

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Informática Médica / Registros Electrónicos de Salud / Aprendizaje Automático Supervisado Tipo de estudio: Observational_studies / Prognostic_studies / Risk_factors_studies Límite: Humans Idioma: En Revista: J Am Med Inform Assoc Asunto de la revista: INFORMATICA MEDICA Año: 2020 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google