Your browser doesn't support javascript.
loading
Cohort design and natural language processing to reduce bias in electronic health records research.
Khurshid, Shaan; Reeder, Christopher; Harrington, Lia X; Singh, Pulkit; Sarma, Gopal; Friedman, Samuel F; Di Achille, Paolo; Diamant, Nathaniel; Cunningham, Jonathan W; Turner, Ashby C; Lau, Emily S; Haimovich, Julian S; Al-Alusi, Mostafa A; Wang, Xin; Klarqvist, Marcus D R; Ashburner, Jeffrey M; Diedrich, Christian; Ghadessi, Mercedeh; Mielke, Johanna; Eilken, Hanna M; McElhinney, Alice; Derix, Andrea; Atlas, Steven J; Ellinor, Patrick T; Philippakis, Anthony A; Anderson, Christopher D; Ho, Jennifer E; Batra, Puneet; Lubitz, Steven A.
Affiliation
  • Khurshid S; Division of Cardiology, Massachusetts General Hospital, Boston, MA, USA.
  • Reeder C; Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA.
  • Harrington LX; Cardiovascular Disease Initiative, Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA.
  • Singh P; Data Sciences Platform, Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA.
  • Sarma G; Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA.
  • Friedman SF; Cardiovascular Disease Initiative, Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA.
  • Di Achille P; Data Sciences Platform, Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA.
  • Diamant N; Data Sciences Platform, Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA.
  • Cunningham JW; Data Sciences Platform, Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA.
  • Turner AC; Data Sciences Platform, Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA.
  • Lau ES; Data Sciences Platform, Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA.
  • Haimovich JS; Cardiovascular Disease Initiative, Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA.
  • Al-Alusi MA; Division of Cardiology, Brigham and Women's Hospital, Boston, MA, USA.
  • Wang X; Department of Neurology, Massachusetts General Hospital, Boston, MA, USA.
  • Klarqvist MDR; Henry and Allison McCance Center for Brain Health, Massachusetts General Hospital, Boston, MA, USA.
  • Ashburner JM; Division of Cardiology, Massachusetts General Hospital, Boston, MA, USA.
  • Diedrich C; Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA.
  • Ghadessi M; Cardiovascular Disease Initiative, Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA.
  • Mielke J; Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA.
  • Eilken HM; Department of Medicine, Massachusetts General Hospital, Boston, MA, USA.
  • McElhinney A; Division of Cardiology, Massachusetts General Hospital, Boston, MA, USA.
  • Derix A; Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA.
  • Atlas SJ; Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA.
  • Ellinor PT; Cardiovascular Disease Initiative, Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA.
  • Philippakis AA; Data Sciences Platform, Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA.
  • Anderson CD; Harvard Medical School, Boston, MA, USA.
  • Ho JE; Division of General Internal Medicine, Massachusetts General Hospital, Boston, MA, USA.
  • Batra P; Bayer AG, Research and Development, Pharmaceuticals, Leverkusen, Germany.
  • Lubitz SA; Bayer AG, Research and Development, Pharmaceuticals, Leverkusen, Germany.
NPJ Digit Med ; 5(1): 47, 2022 Apr 08.
Article in En | MEDLINE | ID: mdl-35396454
ABSTRACT
Electronic health record (EHR) datasets are statistically powerful but are subject to ascertainment bias and missingness. Using the Mass General Brigham multi-institutional EHR, we approximated a community-based cohort by sampling patients receiving longitudinal primary care between 2001-2018 (Community Care Cohort Project [C3PO], n = 520,868). We utilized natural language processing (NLP) to recover vital signs from unstructured notes. We assessed the validity of C3PO by deploying established risk models for myocardial infarction/stroke and atrial fibrillation. We then compared C3PO to Convenience Samples including all individuals from the same EHR with complete data, but without a longitudinal primary care requirement. NLP reduced the missingness of vital signs by 31%. NLP-recovered vital signs were highly correlated with values derived from structured fields (Pearson r range 0.95-0.99). Atrial fibrillation and myocardial infarction/stroke incidence were lower and risk models were better calibrated in C3PO as opposed to the Convenience Samples (calibration error range for myocardial infarction/stroke 0.012-0.030 in C3PO vs. 0.028-0.046 in Convenience Samples; calibration error for atrial fibrillation 0.028 in C3PO vs. 0.036 in Convenience Samples). Sampling patients receiving regular primary care and using NLP to recover missing data may reduce bias and maximize generalizability of EHR research.

Full text: 1 Collection: 01-internacional Database: MEDLINE Type of study: Risk_factors_studies Language: En Journal: NPJ Digit Med Year: 2022 Document type: Article Affiliation country: United States

Full text: 1 Collection: 01-internacional Database: MEDLINE Type of study: Risk_factors_studies Language: En Journal: NPJ Digit Med Year: 2022 Document type: Article Affiliation country: United States
...