Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 22
Filtrar
1.
BMC Med Inform Decis Mak ; 24(1): 51, 2024 Feb 14.
Artigo em Inglês | MEDLINE | ID: mdl-38355486

RESUMO

BACKGROUND: Diagnostic codes are commonly used as inputs for clinical prediction models, to create labels for prediction tasks, and to identify cohorts for multicenter network studies. However, the coverage rates of diagnostic codes and their variability across institutions are underexplored. The primary objective was to describe lab- and diagnosis-based labels for 7 selected outcomes at three institutions. Secondary objectives were to describe agreement, sensitivity, and specificity of diagnosis-based labels against lab-based labels. METHODS: This study included three cohorts: SickKids from The Hospital for Sick Children, and StanfordPeds and StanfordAdults from Stanford Medicine. We included seven clinical outcomes with lab-based definitions: acute kidney injury, hyperkalemia, hypoglycemia, hyponatremia, anemia, neutropenia and thrombocytopenia. For each outcome, we created four lab-based labels (abnormal, mild, moderate and severe) based on test result and one diagnosis-based label. Proportion of admissions with a positive label were presented for each outcome stratified by cohort. Using lab-based labels as the gold standard, agreement using Cohen's Kappa, sensitivity and specificity were calculated for each lab-based severity level. RESULTS: The number of admissions included were: SickKids (n = 59,298), StanfordPeds (n = 24,639) and StanfordAdults (n = 159,985). The proportion of admissions with a positive diagnosis-based label was significantly higher for StanfordPeds compared to SickKids across all outcomes, with odds ratio (99.9% confidence interval) for abnormal diagnosis-based label ranging from 2.2 (1.7-2.7) for neutropenia to 18.4 (10.1-33.4) for hyperkalemia. Lab-based labels were more similar by institution. When using lab-based labels as the gold standard, Cohen's Kappa and sensitivity were lower at SickKids for all severity levels compared to StanfordPeds. CONCLUSIONS: Across multiple outcomes, diagnosis codes were consistently different between the two pediatric institutions. This difference was not explained by differences in test results. These results may have implications for machine learning model development and deployment.


Assuntos
Hiperpotassemia , Neutropenia , Humanos , Atenção à Saúde , Aprendizado de Máquina , Sensibilidade e Especificidade
2.
BMC Med Res Methodol ; 23(1): 204, 2023 09 09.
Artigo em Inglês | MEDLINE | ID: mdl-37689623

RESUMO

BACKGROUND: Non-experimental studies (also known as observational studies) are valuable for estimating the effects of various medical interventions, but are notoriously difficult to evaluate because the methods used in non-experimental studies require untestable assumptions. This lack of intrinsic verifiability makes it difficult both to compare different non-experimental study methods and to trust the results of any particular non-experimental study. METHODS: We introduce TrialProbe, a data resource and statistical framework for the evaluation of non-experimental methods. We first collect a dataset of pseudo "ground truths" about the relative effects of drugs by using empirical Bayesian techniques to analyze adverse events recorded in public clinical trial reports. We then develop a framework for evaluating non-experimental methods against that ground truth by measuring concordance between the non-experimental effect estimates and the estimates derived from clinical trials. As a demonstration of our approach, we also perform an example methods evaluation between propensity score matching, inverse propensity score weighting, and an unadjusted approach on a large national insurance claims dataset. RESULTS: From the 33,701 clinical trial records in our version of the ClinicalTrials.gov dataset, we are able to extract 12,967 unique drug/drug adverse event comparisons to form a ground truth set. During our corresponding methods evaluation, we are able to use that reference set to demonstrate that both propensity score matching and inverse propensity score weighting can produce estimates that have high concordance with clinical trial results and substantially outperform an unadjusted baseline. CONCLUSIONS: We find that TrialProbe is an effective approach for probing non-experimental study methods, being able to generate large ground truth sets that are able to distinguish how well non-experimental methods perform in real world observational data.


Assuntos
Projetos de Pesquisa , Humanos , Teorema de Bayes , Causalidade , Pontuação de Propensão
3.
J Biomed Inform ; 113: 103637, 2021 01.
Artigo em Inglês | MEDLINE | ID: mdl-33290879

RESUMO

Widespread adoption of electronic health records (EHRs) has fueled the development of using machine learning to build prediction models for various clinical outcomes. However, this process is often constrained by having a relatively small number of patient records for training the model. We demonstrate that using patient representation schemes inspired from techniques in natural language processing can increase the accuracy of clinical prediction models by transferring information learned from the entire patient population to the task of training a specific model, where only a subset of the population is relevant. Such patient representation schemes enable a 3.5% mean improvement in AUROC on five prediction tasks compared to standard baselines, with the average improvement rising to 19% when only a small number of patient records are available for training the clinical prediction model.


Assuntos
Registros Eletrônicos de Saúde , Modelos Estatísticos , Humanos , Aprendizado de Máquina , Processamento de Linguagem Natural , Prognóstico
4.
BMC Cancer ; 20(1): 1103, 2020 Nov 13.
Artigo em Inglês | MEDLINE | ID: mdl-33187484

RESUMO

BACKGROUND: Objectives were to build a machine learning algorithm to identify bloodstream infection (BSI) among pediatric patients with cancer and hematopoietic stem cell transplantation (HSCT) recipients, and to compare this approach with presence of neutropenia to identify BSI. METHODS: We included patients 0-18 years of age at cancer diagnosis or HSCT between January 2009 and November 2018. Eligible blood cultures were those with no previous blood culture (regardless of result) within 7 days. The primary outcome was BSI. Four machine learning algorithms were used: elastic net, support vector machine and two implementations of gradient boosting machine (GBM and XGBoost). Model training and evaluation were performed using temporally disjoint training (60%), validation (20%) and test (20%) sets. The best model was compared to neutropenia alone in the test set. RESULTS: Of 11,183 eligible blood cultures, 624 (5.6%) were positive. The best model in the validation set was GBM, which achieved an area-under-the-receiver-operator-curve (AUROC) of 0.74 in the test set. Among the 2236 in the test set, the number of false positives and specificity of GBM vs. neutropenia were 508 vs. 592 and 0.76 vs. 0.72 respectively. Among 139 test set BSIs, six (4.3%) non-neutropenic patients were identified by GBM. All received antibiotics prior to culture result availability. CONCLUSIONS: We developed a machine learning algorithm to classify BSI. GBM achieved an AUROC of 0.74 and identified 4.3% additional true cases in the test set. The machine learning algorithm did not perform substantially better than using presence of neutropenia alone to predict BSI.


Assuntos
Bacteriemia/diagnóstico , Transplante de Células-Tronco Hematopoéticas/efeitos adversos , Aprendizado de Máquina , Neoplasias/terapia , Neutropenia/diagnóstico , Sepse/diagnóstico , Adolescente , Bacteriemia/sangue , Bacteriemia/classificação , Bacteriemia/etiologia , Criança , Pré-Escolar , Feminino , Seguimentos , Humanos , Lactente , Recém-Nascido , Masculino , Neoplasias/patologia , Neutropenia/sangue , Neutropenia/etiologia , Prognóstico , Estudos Retrospectivos , Sepse/sangue , Sepse/classificação , Sepse/etiologia , Máquina de Vetores de Suporte
5.
Proc Natl Acad Sci U S A ; 112(1): 196-201, 2015 Jan 06.
Artigo em Inglês | MEDLINE | ID: mdl-25512534

RESUMO

We report on a genome-wide scan for introgression between the house mouse (Mus musculus domesticus) and the Algerian mouse (Mus spretus), using samples from the ranges of sympatry and allopatry in Africa and Europe. Our analysis reveals wide variability in introgression signatures along the genomes, as well as across the samples. We find that fewer than half of the autosomes in each genome harbor all detectable introgression, whereas the X chromosome has none. Further, European mice carry more M. spretus alleles than the sympatric African ones. Using the length distribution and sharing patterns of introgressed genomic tracts across the samples, we infer, first, that at least three distinct hybridization events involving M. spretus have occurred, one of which is ancient, and the other two are recent (one presumably due to warfarin rodenticide selection). Second, several of the inferred introgressed tracts contain genes that are likely to confer adaptive advantage. Third, introgressed tracts might contain driver genes that determine the evolutionary fate of those tracts. Further, functional analysis revealed introgressed genes that are essential to fitness, including the Vkorc1 gene, which is implicated in rodenticide resistance, and olfactory receptor genes. Our findings highlight the extent and role of introgression in nature and call for careful analysis and interpretation of house mouse data in evolutionary and genetic studies.


Assuntos
Cruzamentos Genéticos , Variação Genética , Genoma/genética , Animais , Feminino , Geografia , Haploidia , Hibridização Genética , Masculino , Camundongos , Especificidade da Espécie
6.
NPJ Digit Med ; 7(1): 171, 2024 Jun 27.
Artigo em Inglês | MEDLINE | ID: mdl-38937550

RESUMO

Foundation models are transforming artificial intelligence (AI) in healthcare by providing modular components adaptable for various downstream tasks, making AI development more scalable and cost-effective. Foundation models for structured electronic health records (EHR), trained on coded medical records from millions of patients, demonstrated benefits including increased performance with fewer training labels, and improved robustness to distribution shifts. However, questions remain on the feasibility of sharing these models across hospitals and their performance in local tasks. This multi-center study examined the adaptability of a publicly accessible structured EHR foundation model (FMSM), trained on 2.57 M patient records from Stanford Medicine. Experiments used EHR data from The Hospital for Sick Children (SickKids) and Medical Information Mart for Intensive Care (MIMIC-IV). We assessed both adaptability via continued pretraining on local data, and task adaptability compared to baselines of locally training models from scratch, including a local foundation model. Evaluations on 8 clinical prediction tasks showed that adapting the off-the-shelf FMSM matched the performance of gradient boosting machines (GBM) locally trained on all data while providing a 13% improvement in settings with few task-specific training labels. Continued pretraining on local data showed FMSM required fewer than 1% of training examples to match the fully trained GBM's performance, and was 60 to 90% more sample-efficient than training local foundation models from scratch. Our findings demonstrate that adapting EHR foundation models across hospitals provides improved prediction performance at less cost, underscoring the utility of base foundation models as modular components to streamline the development of healthcare AI.

7.
J Am Med Inform Assoc ; 30(5): 878-887, 2023 04 19.
Artigo em Inglês | MEDLINE | ID: mdl-36795076

RESUMO

OBJECTIVE: There are over 363 customized risk models of the American College of Cardiology and the American Heart Association (ACC/AHA) pooled cohort equations (PCE) in the literature, but their gains in clinical utility are rarely evaluated. We build new risk models for patients with specific comorbidities and geographic locations and evaluate whether performance improvements translate to gains in clinical utility. MATERIALS AND METHODS: We retrain a baseline PCE using the ACC/AHA PCE variables and revise it to incorporate subject-level information of geographic location and 2 comorbidity conditions. We apply fixed effects, random effects, and extreme gradient boosting (XGB) models to handle the correlation and heterogeneity induced by locations. Models are trained using 2 464 522 claims records from Optum©'s Clinformatics® Data Mart and validated in the hold-out set (N = 1 056 224). We evaluate models' performance overall and across subgroups defined by the presence or absence of chronic kidney disease (CKD) or rheumatoid arthritis (RA) and geographic locations. We evaluate models' expected utility using net benefit and models' statistical properties using several discrimination and calibration metrics. RESULTS: The revised fixed effects and XGB models yielded improved discrimination, compared to baseline PCE, overall and in all comorbidity subgroups. XGB improved calibration for the subgroups with CKD or RA. However, the gains in net benefit are negligible, especially under low exchange rates. CONCLUSIONS: Common approaches to revising risk calculators incorporating extra information or applying flexible models may enhance statistical performance; however, such improvement does not necessarily translate to higher clinical utility. Thus, we recommend future works to quantify the consequences of using risk calculators to guide clinical decisions.


Assuntos
Artrite Reumatoide , Aterosclerose , Insuficiência Renal Crônica , Humanos , Doenças Cardiovasculares/epidemiologia , Comorbidade , Medição de Risco , Fatores de Risco , Estados Unidos , Aterosclerose/epidemiologia
8.
Sci Rep ; 13(1): 3767, 2023 03 07.
Artigo em Inglês | MEDLINE | ID: mdl-36882576

RESUMO

Temporal distribution shift negatively impacts the performance of clinical prediction models over time. Pretraining foundation models using self-supervised learning on electronic health records (EHR) may be effective in acquiring informative global patterns that can improve the robustness of task-specific models. The objective was to evaluate the utility of EHR foundation models in improving the in-distribution (ID) and out-of-distribution (OOD) performance of clinical prediction models. Transformer- and gated recurrent unit-based foundation models were pretrained on EHR of up to 1.8 M patients (382 M coded events) collected within pre-determined year groups (e.g., 2009-2012) and were subsequently used to construct patient representations for patients admitted to inpatient units. These representations were used to train logistic regression models to predict hospital mortality, long length of stay, 30-day readmission, and ICU admission. We compared our EHR foundation models with baseline logistic regression models learned on count-based representations (count-LR) in ID and OOD year groups. Performance was measured using area-under-the-receiver-operating-characteristic curve (AUROC), area-under-the-precision-recall curve, and absolute calibration error. Both transformer and recurrent-based foundation models generally showed better ID and OOD discrimination relative to count-LR and often exhibited less decay in tasks where there is observable degradation of discrimination performance (average AUROC decay of 3% for transformer-based foundation model vs. 7% for count-LR after 5-9 years). In addition, the performance and robustness of transformer-based foundation models continued to improve as pretraining set size increased. These results suggest that pretraining EHR foundation models at scale is a useful approach for developing clinical prediction models that perform well in the presence of temporal distribution shift.


Assuntos
Fontes de Energia Elétrica , Registros Eletrônicos de Saúde , Humanos , Mortalidade Hospitalar , Hospitalização
9.
NPJ Digit Med ; 6(1): 135, 2023 Jul 29.
Artigo em Inglês | MEDLINE | ID: mdl-37516790

RESUMO

The success of foundation models such as ChatGPT and AlphaFold has spurred significant interest in building similar models for electronic medical records (EMRs) to improve patient care and hospital operations. However, recent hype has obscured critical gaps in our understanding of these models' capabilities. In this narrative review, we examine 84 foundation models trained on non-imaging EMR data (i.e., clinical text and/or structured data) and create a taxonomy delineating their architectures, training data, and potential use cases. We find that most models are trained on small, narrowly-scoped clinical datasets (e.g., MIMIC-III) or broad, public biomedical corpora (e.g., PubMed) and are evaluated on tasks that do not provide meaningful insights on their usefulness to health systems. Considering these findings, we propose an improved evaluation framework for measuring the benefits of clinical foundation models that is more closely grounded to metrics that matter in healthcare.

10.
Appl Clin Inform ; 14(3): 400-407, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-36898410

RESUMO

BACKGROUND: The 21st Century Cures Act mandates the immediate, electronic release of health information to patients. However, in the case of adolescents, special consideration is required to ensure that confidentiality is maintained. The detection of confidential content in clinical notes may support operational efforts to preserve adolescent confidentiality while implementing information sharing. OBJECTIVES: This study aimed to determine if a natural language processing (NLP) algorithm can identify confidential content in adolescent clinical progress notes. METHODS: A total of 1,200 outpatient adolescent progress notes written between 2016 and 2019 were manually annotated to identify confidential content. Labeled sentences from this corpus were featurized and used to train a two-part logistic regression model, which provides both sentence-level and note-level probability estimates that a given text contains confidential content. This model was prospectively validated on a set of 240 progress notes written in May 2022. It was subsequently deployed in a pilot intervention to augment an ongoing operational effort to identify confidential content in progress notes. Note-level probability estimates were used to triage notes for review and sentence-level probability estimates were used to highlight high-risk portions of those notes to aid the manual reviewer. RESULTS: The prevalence of notes containing confidential content was 21% (255/1,200) and 22% (53/240) in the train/test and validation cohorts, respectively. The ensemble logistic regression model achieved an area under the receiver operating characteristic of 90 and 88% in the test and validation cohorts, respectively. Its use in a pilot intervention identified outlier documentation practices and demonstrated efficiency gains over completely manual note review. CONCLUSION: An NLP algorithm can identify confidential content in progress notes with high accuracy. Its human-in-the-loop deployment in clinical operations augmented an ongoing operational effort to identify confidential content in adolescent progress notes. These findings suggest NLP may be used to support efforts to preserve adolescent confidentiality in the wake of the information blocking mandate.


Assuntos
Confidencialidade , Processamento de Linguagem Natural , Humanos , Adolescente , Idioma , Algoritmos , Documentação , Registros Eletrônicos de Saúde
11.
Appl Clin Inform ; 14(2): 337-344, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-37137339

RESUMO

BACKGROUND: The 21st Century Cures Act information blocking final rule mandated the immediate and electronic release of health care data in 2020. There is anecdotal concern that a significant amount of information is documented in notes that would breach adolescent confidentiality if released electronically to a guardian. OBJECTIVES: The purpose of this study was to quantify the prevalence of confidential information, based on California laws, within progress notes for adolescent patients that would be released electronically and assess differences in prevalence across patient demographics. METHODS: This is a single-center retrospective chart review of outpatient progress notes written between January 1, 2016, and December 31, 2019, at a large suburban academic pediatric network. Notes were labeled into one of three confidential domains by five expert reviewers trained on a rubric defining confidential information for adolescents derived from California state law. Participants included a random sampling of eligible patients aged 12 to 17 years old at the time of note creation. Secondary analysis included prevalence of confidentiality across age, gender, language spoken, and patient race. RESULTS: Of 1,200 manually reviewed notes, 255 notes (21.3%) (95% confidence interval: 19-24%) contained confidential information. There was a similar distribution among gender and age and a majority of English speaking (83.9%) and white or Caucasian patients (41.2%) in the cohort. Confidential information was more likely to be found in notes for females (p < 0.05) as well as for English-speaking patients (p < 0.05). Older patients had a higher probability of notes containing confidential information (p < 0.05). CONCLUSION: This study demonstrates that there is a significant risk to breach adolescent confidentiality if historical progress notes are released electronically to proxies without further review or redaction. With increased sharing of health care data, there is a need to protect the privacy of the adolescents and prevent potential breaches of confidentiality.


Assuntos
Confidencialidade , Privacidade , Feminino , Humanos , Adolescente , Criança , Prevalência , Estudos Retrospectivos , Instalações de Saúde
12.
J Am Med Inform Assoc ; 30(12): 2004-2011, 2023 11 17.
Artigo em Inglês | MEDLINE | ID: mdl-37639620

RESUMO

OBJECTIVE: Development of electronic health records (EHR)-based machine learning models for pediatric inpatients is challenged by limited training data. Self-supervised learning using adult data may be a promising approach to creating robust pediatric prediction models. The primary objective was to determine whether a self-supervised model trained in adult inpatients was noninferior to logistic regression models trained in pediatric inpatients, for pediatric inpatient clinical prediction tasks. MATERIALS AND METHODS: This retrospective cohort study used EHR data and included patients with at least one admission to an inpatient unit. One admission per patient was randomly selected. Adult inpatients were 18 years or older while pediatric inpatients were more than 28 days and less than 18 years. Admissions were temporally split into training (January 1, 2008 to December 31, 2019), validation (January 1, 2020 to December 31, 2020), and test (January 1, 2021 to August 1, 2022) sets. Primary comparison was a self-supervised model trained in adult inpatients versus count-based logistic regression models trained in pediatric inpatients. Primary outcome was mean area-under-the-receiver-operating-characteristic-curve (AUROC) for 11 distinct clinical outcomes. Models were evaluated in pediatric inpatients. RESULTS: When evaluated in pediatric inpatients, mean AUROC of self-supervised model trained in adult inpatients (0.902) was noninferior to count-based logistic regression models trained in pediatric inpatients (0.868) (mean difference = 0.034, 95% CI=0.014-0.057; P < .001 for noninferiority and P = .006 for superiority). CONCLUSIONS: Self-supervised learning in adult inpatients was noninferior to logistic regression models trained in pediatric inpatients. This finding suggests transferability of self-supervised models trained in adult patients to pediatric patients, without requiring costly model retraining.


Assuntos
Pacientes Internados , Aprendizado de Máquina , Humanos , Adulto , Criança , Estudos Retrospectivos , Aprendizado de Máquina Supervisionado , Registros Eletrônicos de Saúde
13.
AMIA Annu Symp Proc ; 2022: 221-230, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-37128416

RESUMO

Patients diagnosed with systemic lupus erythematosus (SLE) suffer from a decreased quality of life, an increased risk of medical complications, and an increased risk of death. In particular, approximately 50% of SLE patients progress to develop lupus nephritis, which oftentimes leads to life-threatening end stage renal disease (ESRD) and requires dialysis or kidney transplant1. The challenge is that lupus nephritis is diagnosed via a kidney biopsy, which is typically performed only after noticeable decreased kidney function, leaving little room for proactive or preventative measures. The ability to predict which patients are most likely to develop lupus nephritis has the potential to shift lupus nephritis disease management from reactive to proactive. We present a clinically useful prediction model to predict which patients with newly diagnosed SLE will go on to develop lupus nephritis in the next five years.


Assuntos
Lúpus Eritematoso Sistêmico , Nefrite Lúpica , Medicina Preventiva , Humanos , Falência Renal Crônica/etiologia , Falência Renal Crônica/prevenção & controle , Lúpus Eritematoso Sistêmico/complicações , Lúpus Eritematoso Sistêmico/diagnóstico , Nefrite Lúpica/complicações , Nefrite Lúpica/diagnóstico , Nefrite Lúpica/prevenção & controle , Qualidade de Vida , Diálise Renal , Prognóstico , Biópsia , Medicina Preventiva/métodos , Conjuntos de Dados como Assunto , Registros Eletrônicos de Saúde , California , Masculino , Feminino , Adulto , Pessoa de Meia-Idade , Estudos de Coortes , Curva ROC , Reprodutibilidade dos Testes
14.
CPT Pharmacometrics Syst Pharmacol ; 11(11): 1527-1538, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-36204824

RESUMO

In some cases, drug combinations affect adverse outcome phenotypes by binding the same protein; however, drug-binding proteins are associated through protein-protein interaction (PPI) networks within the cell, suggesting that drug phenotypes may result from long-range network effects. We first used PPI network analysis to classify drugs based on proteins downstream of their targets and next predicted drug combination effects where drugs shared network proteins but had distinct binding proteins (e.g., targets, enzymes, or transporters). By classifying drugs using their downstream proteins, we had an 80.7% sensitivity for predicting rare drug combination effects documented in gold-standard datasets. We further measured the effect of predicted drug combinations on adverse outcome phenotypes using novel observational studies in the electronic health record. We tested predictions for 60 network-drug classes on seven adverse outcomes and measured changes in clinical outcomes for predicted combinations. These results demonstrate a novel paradigm for anticipating drug synergistic effects using proteins downstream of drug targets.


Assuntos
Sistemas de Liberação de Medicamentos , Proteínas , Combinação de Medicamentos , Interações Medicamentosas
15.
J Am Med Inform Assoc ; 28(10): 2258-2264, 2021 09 18.
Artigo em Inglês | MEDLINE | ID: mdl-34350942

RESUMO

Using a risk stratification model to guide clinical practice often requires the choice of a cutoff-called the decision threshold-on the model's output to trigger a subsequent action such as an electronic alert. Choosing this cutoff is not always straightforward. We propose a flexible approach that leverages the collective information in treatment decisions made in real life to learn reference decision thresholds from physician practice. Using the example of prescribing a statin for primary prevention of cardiovascular disease based on 10-year risk calculated by the 2013 pooled cohort equations, we demonstrate the feasibility of using real-world data to learn the implicit decision threshold that reflects existing physician behavior. Learning a decision threshold in this manner allows for evaluation of a proposed operating point against the threshold reflective of the community standard of care. Furthermore, this approach can be used to monitor and audit model-guided clinical decision making following model deployment.


Assuntos
Doenças Cardiovasculares , Tomada de Decisão Clínica , Humanos , Medição de Risco
16.
Nat Commun ; 12(1): 2017, 2021 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-33795682

RESUMO

In the electronic health record, using clinical notes to identify entities such as disorders and their temporality (e.g. the order of an event relative to a time index) can inform many important analyses. However, creating training data for clinical entity tasks is time consuming and sharing labeled data is challenging due to privacy concerns. The information needs of the COVID-19 pandemic highlight the need for agile methods of training machine learning models for clinical notes. We present Trove, a framework for weakly supervised entity classification using medical ontologies and expert-generated rules. Our approach, unlike hand-labeled notes, is easy to share and modify, while offering performance comparable to learning from manually labeled training data. In this work, we validate our framework on six benchmark tasks and demonstrate Trove's ability to analyze the records of patients visiting the emergency department at Stanford Health Care for COVID-19 presenting symptoms and risk factors.


Assuntos
COVID-19 , Curadoria de Dados/métodos , Sistemas Inteligentes , Aprendizado de Máquina , Conjuntos de Dados como Assunto , Registros Eletrônicos de Saúde , Humanos , Processamento de Linguagem Natural , SARS-CoV-2
17.
J Am Med Inform Assoc ; 28(11): 2325-2335, 2021 10 12.
Artigo em Inglês | MEDLINE | ID: mdl-34529084

RESUMO

OBJECTIVE: Ulcerative colitis (UC) is a chronic inflammatory disorder with limited effective therapeutic options for long-term treatment and disease maintenance. We hypothesized that a multi-cohort analysis of independent cohorts representing real-world heterogeneity of UC would identify a robust transcriptomic signature to improve identification of FDA-approved drugs that can be repurposed to treat patients with UC. MATERIALS AND METHODS: We performed a multi-cohort analysis of 272 colon biopsy transcriptome samples across 11 publicly available datasets to identify a robust UC disease gene signature. We compared the gene signature to in vitro transcriptomic profiles induced by 781 FDA-approved drugs to identify potential drug targets. We used a retrospective cohort study design modeled after a target trial to evaluate the protective effect of predicted drugs on colectomy risk in patients with UC from the Stanford Research Repository (STARR) database and Optum Clinformatics DataMart. RESULTS: Atorvastatin treatment had the highest inverse-correlation with the UC gene signature among non-oncolytic FDA-approved therapies. In both STARR (n = 827) and Optum (n = 7821), atorvastatin intake was significantly associated with a decreased risk of colectomy, a marker of treatment-refractory disease, compared to patients prescribed a comparator drug (STARR: HR = 0.47, P = .03; Optum: HR = 0.66, P = .03), irrespective of age and length of atorvastatin treatment. DISCUSSION & CONCLUSION: These findings suggest that atorvastatin may serve as a novel therapeutic option for ameliorating disease in patients with UC. Importantly, we provide a systematic framework for integrating publicly available heterogeneous molecular data with clinical data at a large scale to repurpose existing FDA-approved drugs for a wide range of human diseases.


Assuntos
Colite Ulcerativa , Atorvastatina/uso terapêutico , Colectomia , Colite Ulcerativa/tratamento farmacológico , Colite Ulcerativa/genética , Colite Ulcerativa/cirurgia , Reposicionamento de Medicamentos , Humanos , Estudos Retrospectivos
18.
Elife ; 102021 05 11.
Artigo em Inglês | MEDLINE | ID: mdl-33973518

RESUMO

Metastasis suppression by high-dose, multi-drug targeting is unsuccessful due to network heterogeneity and compensatory network activation. Here, we show that targeting driver network signaling capacity by limited inhibition of core pathways is a more effective anti-metastatic strategy. This principle underlies the action of a physiological metastasis suppressor, Raf Kinase Inhibitory Protein (RKIP), that moderately decreases stress-regulated MAP kinase network activity, reducing output to transcription factors such as pro-metastastic BACH1 and motility-related target genes. We developed a low-dose four-drug mimic that blocks metastatic colonization in mouse breast cancer models and increases survival. Experiments and network flow modeling show limited inhibition of multiple pathways is required to overcome variation in MAPK network topology and suppress signaling output across heterogeneous tumor cells. Restricting inhibition of individual kinases dissipates surplus signal, preventing threshold activation of compensatory kinase networks. This low-dose multi-drug approach to decrease signaling capacity of driver networks represents a transformative, clinically relevant strategy for anti-metastatic treatment.


Assuntos
Redes e Vias Metabólicas/efeitos dos fármacos , Metástase Neoplásica/prevenção & controle , Proteína de Ligação a Fosfatidiletanolamina/genética , Transdução de Sinais/efeitos dos fármacos , Animais , Neoplasias da Mama/tratamento farmacológico , Linhagem Celular Tumoral , Movimento Celular , Combinação de Medicamentos , Feminino , Humanos , Sistema de Sinalização das MAP Quinases , Camundongos , Camundongos Endogâmicos C57BL , Camundongos Nus
19.
ArXiv ; 2020 Aug 05.
Artigo em Inglês | MEDLINE | ID: mdl-32793768

RESUMO

In the electronic health record, using clinical notes to identify entities such as disorders and their temporality (e.g. the order of an event relative to a time index) can inform many important analyses. However, creating training data for clinical entity tasks is time consuming and sharing labeled data is challenging due to privacy concerns. The information needs of the COVID-19 pandemic highlight the need for agile methods of training machine learning models for clinical notes. We present Trove, a framework for weakly supervised entity classification using medical ontologies and expert-generated rules. Our approach, unlike hand-labeled notes, is easy to share and modify, while offering performance comparable to learning from manually labeled training data. In this work, we validate our framework on six benchmark tasks and demonstrate Trove's ability to analyze the records of patients visiting the emergency department at Stanford Health Care for COVID-19 presenting symptoms and risk factors.

20.
NPJ Digit Med ; 3: 95, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32695885

RESUMO

There is substantial interest in using presenting symptoms to prioritize testing for COVID-19 and establish symptom-based surveillance. However, little is currently known about the specificity of COVID-19 symptoms. To assess the feasibility of symptom-based screening for COVID-19, we used data from tests for common respiratory viruses and SARS-CoV-2 in our health system to measure the ability to correctly classify virus test results based on presenting symptoms. Based on these results, symptom-based screening may not be an effective strategy to identify individuals who should be tested for SARS-CoV-2 infection or to obtain a leading indicator of new COVID-19 cases.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA