Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
1.
AMIA Jt Summits Transl Sci Proc ; 2024: 95-104, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38827052

RESUMO

Access to real-world data streams like electronic medical records (EMRs) has accelerated the development of supervised machine learning (ML) models for clinical applications. However, few studies investigate the differential impact of particular features in the EMR on model performance under temporal dataset shift. To explain how features in the EMR impact models over time, this study aggregates features into feature groups by their source (e.g. medication orders, diagnosis codes and lab results) and feature categories based on their reflection of patient pathophysiology or healthcare processes. We adapt Shapley values to explain feature groups' and feature categories' marginal contribution to initial and sustained model performance. We investigate three standard clinical prediction tasks and find that while feature contributions to initial performance differ across tasks, pathophysiological features help mitigate temporal discrimination deterioration. These results provide interpretable insights on how specific feature groups contribute to model performance and robustness to temporal dataset shift.

2.
AMIA Jt Summits Transl Sci Proc ; 2024: 182-189, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38827068

RESUMO

This study explored the efficacy of electronic phenotyping in data labeling for machine learning with a focus on urinary tract infections (UTIs). We contrasted labels from electronic phenotyping against previously published labels such as urine culture positivity. In comparison, electronic phenotyping showed the potential to enhance specificity in UTI labeling while maintaining similar sensitivity and was easily scaled for application to a large dataset suitable for machine learning, which we used to train and validate a machine learning model. Electronic phenotyping offers a valuable method for machine learning label generation in healthcare, with potential benefits for patient care and antimicrobial stewardship. Further research will expand its application and optimize techniques for increased performance.

3.
J Am Med Inform Assoc ; 30(9): 1532-1542, 2023 08 18.
Artigo em Inglês | MEDLINE | ID: mdl-37369008

RESUMO

OBJECTIVE: Heatlhcare institutions are establishing frameworks to govern and promote the implementation of accurate, actionable, and reliable machine learning models that integrate with clinical workflow. Such governance frameworks require an accompanying technical framework to deploy models in a resource efficient, safe and high-quality manner. Here we present DEPLOYR, a technical framework for enabling real-time deployment and monitoring of researcher-created models into a widely used electronic medical record system. MATERIALS AND METHODS: We discuss core functionality and design decisions, including mechanisms to trigger inference based on actions within electronic medical record software, modules that collect real-time data to make inferences, mechanisms that close-the-loop by displaying inferences back to end-users within their workflow, monitoring modules that track performance of deployed models over time, silent deployment capabilities, and mechanisms to prospectively evaluate a deployed model's impact. RESULTS: We demonstrate the use of DEPLOYR by silently deploying and prospectively evaluating 12 machine learning models trained using electronic medical record data that predict laboratory diagnostic results, triggered by clinician button-clicks in Stanford Health Care's electronic medical record. DISCUSSION: Our study highlights the need and feasibility for such silent deployment, because prospectively measured performance varies from retrospective estimates. When possible, we recommend using prospectively estimated performance measures during silent trials to make final go decisions for model deployment. CONCLUSION: Machine learning applications in healthcare are extensively researched, but successful translations to the bedside are rare. By describing DEPLOYR, we aim to inform machine learning deployment best practices and help bridge the model implementation gap.


Assuntos
Registros Eletrônicos de Saúde , Software , Estudos Retrospectivos , Aprendizado de Máquina
4.
Artigo em Inglês | MEDLINE | ID: mdl-37350883

RESUMO

When evaluating the performance of clinical machine learning models, one must consider the deployment population. When the population of patients with observed labels is only a subset of the deployment population (label selection), standard model performance estimates on the observed population may be misleading. In this study we describe three classes of label selection and simulate five causally distinct scenarios to assess how particular selection mechanisms bias a suite of commonly reported binary machine learning model performance metrics. Simulations reveal that when selection is affected by observed features, naive estimates of model discrimination may be misleading. When selection is affected by labels, naive estimates of calibration fail to reflect reality. We borrow traditional weighting estimators from causal inference literature and find that when selection probabilities are properly specified, they recover full population estimates. We then tackle the real-world task of monitoring the performance of deployed machine learning models whose interactions with clinicians feed-back and affect the selection mechanism of the labels. We train three machine learning models to flag low-yield laboratory diagnostics, and simulate their intended consequence of reducing wasteful laboratory utilization. We find that naive estimates of AUROC on the observed population undershoot actual performance by up to 20%. Such a disparity could be large enough to lead to the wrongful termination of a successful clinical decision support tool. We propose an altered deployment procedure, one that combines injected randomization with traditional weighted estimates, and find it recovers true model performance.

5.
AMIA Annu Symp Proc ; 2023: 1007-1016, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38222438

RESUMO

Low-yield repetitive laboratory diagnostics burden patients and inflate cost of care. In this study, we assess whether stability in repeated laboratory diagnostic measurements is predictable with uncertainty estimates using electronic health record data available before the diagnostic is ordered. We use probabilistic regression to predict a distribution of plausible values, allowing use-time customization for various definitions of "stability" given dynamic ranges and clinical scenarios. After converting distributions into "stability" scores, the models achieve a sensitivity of 29% for white blood cells, 60% for hemoglobin, 100% for platelets, 54% for potassium, 99% for albumin and 35% for creatinine for predicting stability at 90% precision, suggesting those fractions of repetitive tests could be reduced with low risk of missing important changes. The findings demonstrate the feasibility of using electronic health record data to identify low-yield repetitive tests and offer personalized guidance for better usage of testing while ensuring high quality care.


Assuntos
Técnicas de Laboratório Clínico , Hemoglobinas , Humanos
6.
AMIA Annu Symp Proc ; 2023: 1201-1208, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38222372

RESUMO

In analyzing direct hospitalization cost and clinical data from an academic medical center, commonly used metrics such as diagnosis-related group (DRG) weight explain approximately 37% of cost variability, but a substantial amount of variation remains unaccounted for by case mix index (CMI) alone. Using CMI as a benchmark, we isolate and target individual DRGs with higher than expected average costs for specific quality improvement efforts. While DRGs summarize hospitalization care after discharge, a predictive model using only information known before admission explained up to 60% of cost variability for two DRGs with a high excess cost burden. This level of variability likely reflects underlying patient factors that are not modifiable (e.g., age and prior comorbidities) and therefore less useful for health systems to target for intervention. However, the remaining unexplained variation can be inspected in further studies to discover operational factors that health systems can target to improve quality and value for their patients. Since DRG weights represent the expected resource consumption for a specific hospitalization type relative to the average hospitalization, the data-driven approach we demonstrate can be utilized by any health institution to quantify excess costs and potential savings among DRGs.


Assuntos
Grupos Diagnósticos Relacionados , Hospitalização , Humanos , Custos e Análise de Custo , Alta do Paciente , Centros Médicos Acadêmicos
7.
Commun Med (Lond) ; 2: 38, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35603264

RESUMO

Background: The Centers for Disease Control and Prevention identify antibiotic prescribing stewardship as the most important action to combat increasing antibiotic resistance. Clinicians balance broad empiric antibiotic coverage vs. precision coverage targeting only the most likely pathogens. We investigate the utility of machine learning-based clinical decision support for antibiotic prescribing stewardship. Methods: In this retrospective multi-site study, we developed machine learning models that predict antibiotic susceptibility patterns (personalized antibiograms) using electronic health record data of 8342 infections from Stanford emergency departments and 15,806 uncomplicated urinary tract infections from Massachusetts General Hospital and Brigham & Women's Hospital in Boston. We assessed the trade-off between broad-spectrum and precise antibiotic prescribing using linear programming. Results: We find in Stanford data that personalized antibiograms reallocate clinician antibiotic selections with a coverage rate (fraction of infections covered by treatment) of 85.9%; similar to clinician performance (84.3% p = 0.11). In the Boston dataset, the personalized antibiograms coverage rate is 90.4%; a significant improvement over clinicians (88.1% p < 0.0001). Personalized antibiograms achieve similar coverage to the clinician benchmark with narrower antibiotics. With Stanford data, personalized antibiograms maintain clinician coverage rates while narrowing 69% of empiric vancomycin+piperacillin/tazobactam prescriptions to piperacillin/tazobactam. In the Boston dataset, personalized antibiograms maintain clinician coverage rates while narrowing 48% of ciprofloxacin to trimethoprim/sulfamethoxazole. Conclusions: Precision empiric antibiotic prescribing with personalized antibiograms could improve patient safety and antibiotic stewardship by reducing unnecessary use of broad-spectrum antibiotics that breed a growing tide of resistant organisms.

8.
J Am Med Inform Assoc ; 28(11): 2423-2432, 2021 10 12.
Artigo em Inglês | MEDLINE | ID: mdl-34402507

RESUMO

OBJECTIVE: To develop prediction models for intensive care unit (ICU) vs non-ICU level-of-care need within 24 hours of inpatient admission for emergency department (ED) patients using electronic health record data. MATERIALS AND METHODS: Using records of 41 654 ED visits to a tertiary academic center from 2015 to 2019, we tested 4 algorithms-feed-forward neural networks, regularized regression, random forests, and gradient-boosted trees-to predict ICU vs non-ICU level-of-care within 24 hours and at the 24th hour following admission. Simple-feature models included patient demographics, Emergency Severity Index (ESI), and vital sign summary. Complex-feature models added all vital signs, lab results, and counts of diagnosis, imaging, procedures, medications, and lab orders. RESULTS: The best-performing model, a gradient-boosted tree using a full feature set, achieved an AUROC of 0.88 (95%CI: 0.87-0.89) and AUPRC of 0.65 (95%CI: 0.63-0.68) for predicting ICU care need within 24 hours of admission. The logistic regression model using ESI achieved an AUROC of 0.67 (95%CI: 0.65-0.70) and AUPRC of 0.37 (95%CI: 0.35-0.40). Using a discrimination threshold, such as 0.6, the positive predictive value, negative predictive value, sensitivity, and specificity were 85%, 89%, 30%, and 99%, respectively. Vital signs were the most important predictors. DISCUSSION AND CONCLUSIONS: Undertriaging admitted ED patients who subsequently require ICU care is common and associated with poorer outcomes. Machine learning models using readily available electronic health record data predict subsequent need for ICU admission with good discrimination, substantially better than the benchmarking ESI system. The results could be used in a multitiered clinical decision-support system to improve ED triage.


Assuntos
Serviço Hospitalar de Emergência , Triagem , Hospitalização , Hospitais , Humanos , Unidades de Terapia Intensiva , Aprendizado de Máquina , Estudos Retrospectivos
9.
J Biomed Inform ; 113: 103637, 2021 01.
Artigo em Inglês | MEDLINE | ID: mdl-33290879

RESUMO

Widespread adoption of electronic health records (EHRs) has fueled the development of using machine learning to build prediction models for various clinical outcomes. However, this process is often constrained by having a relatively small number of patient records for training the model. We demonstrate that using patient representation schemes inspired from techniques in natural language processing can increase the accuracy of clinical prediction models by transferring information learned from the entire patient population to the task of training a specific model, where only a subset of the population is relevant. Such patient representation schemes enable a 3.5% mean improvement in AUROC on five prediction tasks compared to standard baselines, with the average improvement rising to 19% when only a small number of patient records are available for training the clinical prediction model.


Assuntos
Registros Eletrônicos de Saúde , Modelos Estatísticos , Humanos , Aprendizado de Máquina , Processamento de Linguagem Natural , Prognóstico
10.
BMC Cancer ; 20(1): 1103, 2020 Nov 13.
Artigo em Inglês | MEDLINE | ID: mdl-33187484

RESUMO

BACKGROUND: Objectives were to build a machine learning algorithm to identify bloodstream infection (BSI) among pediatric patients with cancer and hematopoietic stem cell transplantation (HSCT) recipients, and to compare this approach with presence of neutropenia to identify BSI. METHODS: We included patients 0-18 years of age at cancer diagnosis or HSCT between January 2009 and November 2018. Eligible blood cultures were those with no previous blood culture (regardless of result) within 7 days. The primary outcome was BSI. Four machine learning algorithms were used: elastic net, support vector machine and two implementations of gradient boosting machine (GBM and XGBoost). Model training and evaluation were performed using temporally disjoint training (60%), validation (20%) and test (20%) sets. The best model was compared to neutropenia alone in the test set. RESULTS: Of 11,183 eligible blood cultures, 624 (5.6%) were positive. The best model in the validation set was GBM, which achieved an area-under-the-receiver-operator-curve (AUROC) of 0.74 in the test set. Among the 2236 in the test set, the number of false positives and specificity of GBM vs. neutropenia were 508 vs. 592 and 0.76 vs. 0.72 respectively. Among 139 test set BSIs, six (4.3%) non-neutropenic patients were identified by GBM. All received antibiotics prior to culture result availability. CONCLUSIONS: We developed a machine learning algorithm to classify BSI. GBM achieved an AUROC of 0.74 and identified 4.3% additional true cases in the test set. The machine learning algorithm did not perform substantially better than using presence of neutropenia alone to predict BSI.


Assuntos
Bacteriemia/diagnóstico , Transplante de Células-Tronco Hematopoéticas/efeitos adversos , Aprendizado de Máquina , Neoplasias/terapia , Neutropenia/diagnóstico , Sepse/diagnóstico , Adolescente , Bacteriemia/sangue , Bacteriemia/classificação , Bacteriemia/etiologia , Criança , Pré-Escolar , Feminino , Seguimentos , Humanos , Lactente , Recém-Nascido , Masculino , Neoplasias/patologia , Neutropenia/sangue , Neutropenia/etiologia , Prognóstico , Estudos Retrospectivos , Sepse/sangue , Sepse/classificação , Sepse/etiologia , Máquina de Vetores de Suporte
11.
NPJ Digit Med ; 3: 95, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32695885

RESUMO

There is substantial interest in using presenting symptoms to prioritize testing for COVID-19 and establish symptom-based surveillance. However, little is currently known about the specificity of COVID-19 symptoms. To assess the feasibility of symptom-based screening for COVID-19, we used data from tests for common respiratory viruses and SARS-CoV-2 in our health system to measure the ability to correctly classify virus test results based on presenting symptoms. Based on these results, symptom-based screening may not be an effective strategy to identify individuals who should be tested for SARS-CoV-2 infection or to obtain a leading indicator of new COVID-19 cases.

12.
AMIA Jt Summits Transl Sci Proc ; 2020: 108-115, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32477629

RESUMO

Up to 50% of antibiotic use in hospital settings is suboptimal. We build machine learning models trained on electronic health record data to minimize wasteful use of antibiotics. Our classifiers flag no growth blood and urine microbial cultures with high precision. Further, we build models that predict the likelihood of bacterial susceptibility to sets of antibiotics. These models contain decision thresholds that separate subgroups of patients whose susceptibility rates to narrow-spectrum antibiotics equal overall susceptibility rates to broader-spectrum drugs. Retroactively analyzing these thresholds on our one year test set, we find that 14% of patients infected with Escherichia coli and empirically treated with piperacillin/tazobactam could have been treated with ceftriaxone with coverage equal to the overall susceptibility rate ofpiperacillin/tazobactam. Similarly, 13% of the same cohort could have been treated with cefazolin - a first generation cephalosporin.

13.
AMIA Annu Symp Proc ; 2020: 953-962, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33936471

RESUMO

High quality patient care through timely, precise and efficacious management depends not only on the clinical presentation of a patient, but the context of the care environment to which they present. Understanding and improving factors that affect streamlined workflow, such as provider or department busyness or experience, are essential to improving these care processes, but have been difficult to measure with traditional approaches and clinical data sources. In this exploratory data analysis, we aim to determine whether such contextual factors can be captured for important clinical processes by taking advantage of non-traditional data sources like EHR audit logs which passively track the electronic behavior of clinical teams. Our results illustrate the potential of defining multiple measures of contextual factors and their correlation with key care processes. We illustrate this using thrombolytic (tPA) treatment for ischemic stroke as an example process, but the measurement approaches can be generalized to multiple scenarios.


Assuntos
Acidente Vascular Cerebral , Feminino , Humanos , Armazenamento e Recuperação da Informação , Masculino , Pessoa de Meia-Idade , Assistência ao Paciente , Acidente Vascular Cerebral/terapia , Fluxo de Trabalho
14.
Mol Psychiatry ; 25(11): 2818-2831, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-31358905

RESUMO

22q11.2 deletion syndrome (22q11DS)-a neurodevelopmental condition caused by a hemizygous deletion on chromosome 22-is associated with an elevated risk of psychosis and other developmental brain disorders. Prior single-site diffusion magnetic resonance imaging (dMRI) studies have reported altered white matter (WM) microstructure in 22q11DS, but small samples and variable methods have led to contradictory results. Here we present the largest study ever conducted of dMRI-derived measures of WM microstructure in 22q11DS (334 22q11.2 deletion carriers and 260 healthy age- and sex-matched controls; age range 6-52 years). Using harmonization protocols developed by the ENIGMA-DTI working group, we identified widespread reductions in mean, axial and radial diffusivities in 22q11DS, most pronounced in regions with major cortico-cortical and cortico-thalamic fibers: the corona radiata, corpus callosum, superior longitudinal fasciculus, posterior thalamic radiations, and sagittal stratum (Cohen's d's ranging from -0.9 to -1.3). Only the posterior limb of the internal capsule (IC), comprised primarily of corticofugal fibers, showed higher axial diffusivity in 22q11DS. 22q11DS patients showed higher mean fractional anisotropy (FA) in callosal and projection fibers (IC and corona radiata) relative to controls, but lower FA than controls in regions with predominantly association fibers. Psychotic illness in 22q11DS was associated with more substantial diffusivity reductions in multiple regions. Overall, these findings indicate large effects of the 22q11.2 deletion on WM microstructure, especially in major cortico-cortical connections. Taken together with findings from animal models, this pattern of abnormalities may reflect disrupted neurogenesis of projection neurons in outer cortical layers.


Assuntos
Síndrome de DiGeorge/diagnóstico por imagem , Síndrome de DiGeorge/patologia , Imagem de Difusão por Ressonância Magnética , Substância Branca/diagnóstico por imagem , Substância Branca/patologia , Adolescente , Adulto , Anisotropia , Criança , Síndrome de DiGeorge/genética , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Adulto Jovem
15.
J Med Internet Res ; 21(4): e13822, 2019 04 24.
Artigo em Inglês | MEDLINE | ID: mdl-31017583

RESUMO

BACKGROUND: Autism spectrum disorder (ASD) is currently diagnosed using qualitative methods that measure between 20-100 behaviors, can span multiple appointments with trained clinicians, and take several hours to complete. In our previous work, we demonstrated the efficacy of machine learning classifiers to accelerate the process by collecting home videos of US-based children, identifying a reduced subset of behavioral features that are scored by untrained raters using a machine learning classifier to determine children's "risk scores" for autism. We achieved an accuracy of 92% (95% CI 88%-97%) on US videos using a classifier built on five features. OBJECTIVE: Using videos of Bangladeshi children collected from Dhaka Shishu Children's Hospital, we aim to scale our pipeline to another culture and other developmental delays, including speech and language conditions. METHODS: Although our previously published and validated pipeline and set of classifiers perform reasonably well on Bangladeshi videos (75% accuracy, 95% CI 71%-78%), this work improves on that accuracy through the development and application of a powerful new technique for adaptive aggregation of crowdsourced labels. We enhance both the utility and performance of our model by building two classification layers: The first layer distinguishes between typical and atypical behavior, and the second layer distinguishes between ASD and non-ASD. In each of the layers, we use a unique rater weighting scheme to aggregate classification scores from different raters based on their expertise. We also determine Shapley values for the most important features in the classifier to understand how the classifiers' process aligns with clinical intuition. RESULTS: Using these techniques, we achieved an accuracy (area under the curve [AUC]) of 76% (SD 3%) and sensitivity of 76% (SD 4%) for identifying atypical children from among developmentally delayed children, and an accuracy (AUC) of 85% (SD 5%) and sensitivity of 76% (SD 6%) for identifying children with ASD from those predicted to have other developmental delays. CONCLUSIONS: These results show promise for using a mobile video-based and machine learning-directed approach for early and remote detection of autism in Bangladeshi children. This strategy could provide important resources for developmental health in developing countries with few clinical resources for diagnosis, helping children get access to care at an early age. Future research aimed at extending the application of this approach to identify a range of other conditions and determine the population-level burden of developmental disabilities and impairments will be of high value.


Assuntos
Transtorno do Espectro Autista/diagnóstico , Deficiências do Desenvolvimento/diagnóstico , Aprendizado de Máquina/normas , Gravação em Vídeo/métodos , Bangladesh , Criança , Pré-Escolar , Feminino , Humanos , Masculino , Estudos de Validação como Assunto
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA