RESUMO
OBJECTIVE: Phenotyping patients using electronic health record (EHR) data conventionally requires labeled cases and controls. Assigning labels requires manual medical chart review and therefore is labor intensive. For some phenotypes, identifying gold-standard controls is prohibitive. We developed an accurate EHR phenotyping approach that does not require labeled controls. MATERIALS AND METHODS: Our framework relies on a random subset of cases, which can be specified using an anchor variable that has excellent positive predictive value and sensitivity independent of predictors. We proposed a maximum likelihood approach that efficiently leverages data from the specified cases and unlabeled patients to develop logistic regression phenotyping models, and compare model performance with existing algorithms. RESULTS: Our method outperformed the existing algorithms on predictive accuracy in Monte Carlo simulation studies, application to identify hypertension patients with hypokalemia requiring oral supplementation using a simulated anchor, and application to identify primary aldosteronism patients using real-world cases and anchor variables. Our method additionally generated consistent estimates of 2 important parameters, phenotype prevalence and the proportion of true cases that are labeled. DISCUSSION: Upon identification of an anchor variable that is scalable and transferable to different practices, our approach should facilitate development of scalable, transferable, and practice-specific phenotyping models. CONCLUSIONS: Our proposed approach enables accurate semiautomated EHR phenotyping with minimal manual labeling and therefore should greatly facilitate EHR clinical decision support and research.
Assuntos
Algoritmos , Registros Eletrônicos de Saúde/classificação , Funções Verossimilhança , Humanos , Método de Monte CarloRESUMO
Socioeconomic status (SES) has been associated with adverse outcomes after cardiac surgery, but is not included in commonly applied risk adjustment models. This study evaluates whether inclusion of SES improves aortic valve replacement (AVR) risk prediction models, as this is the most common elective operation performed at our institution during the study period. All patients who underwent AVR at a single institution from 2005 to 2015 were evaluated. SES measures included unemployment, poverty, household income, home value, educational attainment, housing density, and a validated SES index score. The risk scores for mortality, complications, and increased length of stay were generated using models published by the Society for Thoracic Surgeons. Univariate models were fitted for each SES covariate and multivariable models for mortality, any complication, and prolonged length of stay (PLOS). A total of 1,386 patients underwent AVR with a 2.7% mortality, 15.1% complication rate, and 9.7% PLOS. In univariate models, higher education was associated with decreased mortality (odds ratio [OR] 0.96, pâ¯=â¯0.04) and complications (OR 0.97, p <0.01). Poverty was associated with increased length of stay (OR 1.02, pâ¯=â¯0.02). In the multivariable models, the inclusion of SES covariates increased the area under the curve for mortality (0.735 to 0.750, pâ¯=â¯0.14), for any complications (0.663 to 0.680, p <0.01), and for PLOS (0.749 to 0.751, p = 0.12). The inclusion of census-tract-level socioeconomic factors into the the Society of Thoracic Surgeons risk predication models is new and shows potential to improve risk prediction for outcomes after cardiac surgery. With the possibility of reimbursement and institutional ranking based on these outcomes, this study represents an improvement in risk prediction model.