RESUMO
BACKGROUND: The ubiquity of electronic health records (EHR) offers an opportunity to observe trajectories of laboratory results and vital signs over long periods of time. This study assessed the value of risk factor trajectories available in the electronic health record to predict incident type 2 diabetes. STUDY DESIGN AND METHODS: Analysis was based on a large 13-year retrospective cohort of 71,545 adult, non-diabetic patients with baseline in 2005 and median follow-up time of 8 years. The trajectories of fasting plasma glucose, lipids, BMI and blood pressure were computed over three time frames (2000-2001, 2002-2003, 2004) before baseline. A novel method, Cumulative Exposure (CE), was developed and evaluated using Cox proportional hazards regression to assess risk of incident type 2 diabetes. We used the Framingham Diabetes Risk Scoring (FDRS) Model as control. RESULTS: The new model outperformed the FDRS Model (.802 vs .660; p-values <2e-16). Cumulative exposure measured over different periods showed that even short episodes of hyperglycemia increase the risk of developing diabetes. Returning to normoglycemia moderates the risk, but does not fully eliminate it. The longer an individual maintains glycemic control after a hyperglycemic episode, the lower the subsequent risk of diabetes. CONCLUSION: Incorporating risk factor trajectories substantially increases the ability of clinical decision support risk models to predict onset of type 2 diabetes and provides information about how risk changes over time.
Assuntos
Diabetes Mellitus Tipo 2/diagnóstico , Diabetes Mellitus Tipo 2/prevenção & controle , Adulto , Glicemia , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Prognóstico , Modelos de Riscos Proporcionais , Estudos Retrospectivos , Fatores de RiscoRESUMO
OBJECTIVES: To specify when delays of specific 3-hour bundle Surviving Sepsis Campaign guideline recommendations applied to severe sepsis or septic shock become harmful and impact mortality. DESIGN: Retrospective cohort study. SETTING: One health system composed of six hospitals and 45 clinics in a Midwest state from January 01, 2011, to July 31, 2015. PATIENTS: All adult patients hospitalized with billing diagnosis of severe sepsis or septic shock. INTERVENTIONS: Four 3-hour Surviving Sepsis Campaign guideline recommendations: 1) obtain blood culture before antibiotics, 2) obtain lactate level, 3) administer broad-spectrum antibiotics, and 4) administer 30 mL/kg of crystalloid fluid for hypotension (defined as "mean arterial pressure" < 65) or lactate (> 4). MEASUREMENTS AND MAIN RESULTS: To determine the effect of t minutes of delay in carrying out each intervention, propensity score matching of "baseline" characteristics compensated for differences in health status. The average treatment effect in the treated computed as the average difference in outcomes between those treated after shorter versus longer delay. To estimate the uncertainty associated with the average treatment effect in the treated metric and to construct 95% CIs, bootstrap estimation with 1,000 replications was performed. From 5,072 patients with severe sepsis or septic shock, 1,412 (27.8%) had in-hospital mortality. The majority of patients had the four 3-hour bundle recommendations initiated within 3 hours. The statistically significant time in minutes after which a delay increased the risk of death for each recommendation was as follows: lactate, 20.0 minutes; blood culture, 50.0 minutes; crystalloids, 100.0 minutes; and antibiotic therapy, 125.0 minutes. CONCLUSIONS: The guideline recommendations showed that shorter delays indicates better outcomes. There was no evidence that 3 hours is safe; even very short delays adversely impact outcomes. Findings demonstrated a new approach to incorporate time t when analyzing the impact on outcomes and provide new evidence for clinical practice and research.
Assuntos
Mortalidade Hospitalar/tendências , Pacotes de Assistência ao Paciente/estatística & dados numéricos , Sepse/mortalidade , Sepse/terapia , Tempo para o Tratamento/estatística & dados numéricos , Idoso , Antibacterianos/administração & dosagem , Hemocultura , Soluções Cristaloides/administração & dosagem , Feminino , Humanos , Ácido Láctico/sangue , Masculino , Pessoa de Meia-Idade , Guias de Prática Clínica como Assunto , Pontuação de Propensão , Estudos Retrospectivos , Choque Séptico/mortalidade , Choque Séptico/terapia , Fatores de Tempo , Tempo para o Tratamento/normasRESUMO
Pulmonary complications due to infection and idiopathic pneumonia syndrome (IPS), a noninfectious lung injury in hematopoietic stem cell transplant (HSCT) recipients, are frequent causes of transplantation-related mortality and morbidity. Our objective was to characterize the global bronchoalveolar lavage fluid (BALF) protein expression of IPS to identify proteins and pathways that differentiate IPS from infectious lung injury after HSCT. We studied 30 BALF samples from patients who developed lung injury within 180 days of HSCT or cellular therapy transfusion (natural killer cell transfusion). Adult subjects were classified as having IPS or infectious lung injury by the criteria outlined in the 2011 American Thoracic Society statement. BALF was depleted of hemoglobin and 14 high-abundance proteins, treated with trypsin, and labeled with isobaric tagging for relative and absolute quantification (iTRAQ) 8-plex reagent for two-dimensional capillary liquid chromatography (LC) and data dependent peptide tandem mass spectrometry (MS) on an Orbitrap Velos system in higher-energy collision-induced dissociation activation mode. Protein identification employed a target-decoy strategy using ProteinPilot within Galaxy P. The relative protein abundance was determined with reference to a global internal standard consisting of pooled BALF from patients with respiratory failure and no history of HSCT. A variance weighted t-test controlling for a false discovery rate of ≤5% was used to identify proteins that showed differential expression between IPS and infectious lung injury. The biological relevance of these proteins was determined by using gene ontology enrichment analysis and Ingenuity Pathway Analysis. We characterized 12 IPS and 18 infectious lung injury BALF samples. In the 5 iTRAQ LC-MS/MS experiments 845, 735, 532, 615, and 594 proteins were identified for a total of 1125 unique proteins and 368 common proteins across all 5 LC-MS/MS experiments. When comparing IPS to infectious lung injury, 96 proteins were differentially expressed. Gene ontology enrichment analysis showed that these proteins participate in biological processes involved in the development of lung injury after HSCT. These include acute phase response signaling, complement system, coagulation system, liver X receptor (LXR)/retinoid X receptor (RXR), and farsenoid X receptor (FXR)/RXR modulation. We identified 2 canonical pathways modulated by TNF-α, FXR/RXR activation, and IL2 signaling in macrophages. The proteins also mapped to blood coagulation, fibrinolysis, and wound healing-processes that participate in organ repair. Cell movement was identified as significantly over-represented by proteins with differential expression between IPS and infection. In conclusion, the BALF protein expression in IPS differed significantly from infectious lung injury in HSCT recipients. These differences provide insights into mechanisms that are activated in lung injury in HSCT recipients and suggest potential therapeutic targets to augment lung repair.
Assuntos
Transplante de Células-Tronco Hematopoéticas/efeitos adversos , Lesão Pulmonar/etiologia , Pneumonia/etiologia , Proteoma/análise , Adulto , Idoso , Líquido da Lavagem Broncoalveolar/química , Perfilação da Expressão Gênica , Ontologia Genética , Humanos , Pessoa de Meia-Idade , Proteômica/métodosRESUMO
In this manuscript, we present connectivity cluster analysis (CoCA), a novel computational framework that takes advantage of structure of the brain networks to magnify reproducible signals and quash noise. Resting state functional Magnetic Resonance Imaging (fMRI) data that is used in estimating functional brain networks is often noisy, leading to reduced power and inconsistent findings across independent studies. There is a need for techniques that can unearth signals in noisy datasets, while addressing redundancy in the functional connections that are used for testing association. CoCA is a data driven approach that addresses the problems of redundancy and noise by first finding groups of region pairs that behave in a cohesive way across the subjects. These cohesive sets of functional connections are further tested for association with the disease. CoCA is applied in the context of patients with schizophrenia, a disorder characterized as a disconnectivity syndrome. Our results suggest that CoCA can find reproducible sets of functional connections that behave cohesively. Applying this technique, we found that the connectivity clusters joining thalamus to parietal, temporal, and visuoparietal regions are highly discriminative of schizophrenia patients as well as reproducible using retest data and replicable in an independent confirmatory sample.
Assuntos
Encéfalo/fisiopatologia , Análise por Conglomerados , Imageamento por Ressonância Magnética/métodos , Esquizofrenia/fisiopatologia , Adulto , Mapeamento Encefálico/métodos , Doença Crônica , Simulação por Computador , Feminino , Humanos , Masculino , Vias Neurais/fisiopatologia , Reprodutibilidade dos TestesRESUMO
BACKGROUND: Mobility is critical for self-management. Understanding factors associated with improvement in mobility during home healthcare can help nurses tailor interventions to improve mobility outcomes and keep patients safely at home. OBJECTIVES: The aims were to (a) identify patient and support system factors associated with mobility improvement during home care, (b) evaluate consistency of factors across groups defined by mobility status at the start of home care, and (c) identify patterns of factors associated with improvement and no improvement in mobility within each group. METHODS: Outcome and Assessment Information Set data extracted from a national convenience sample of 270,634 patient records collected from October 1, 2008 to December 31, 2009 from 581 Medicare-certified, home healthcare agencies were used. Patients were placed into groups based on mobility scores at admission. Odds ratios were used to index associations of factors with improvement at discharge. Discriminative pattern mining was used to discover patterns associated with improvement of mobility. RESULTS: Overall, mobility improved for 49.4% of patients; improvement occurred most frequently (80%) among patients who were able, at admission, to walk only with the supervision or assistance of another person at all times. Numerous factors associated with improvement in mobility outcome were similar across the groups (except for those who were chairfast but were able to wheel themselves independently); however, the number, strength, and direction of associations varied. In most groups, data mining-discovered patterns of factors associated with the mobility outcome were composed of combinations of functional and cognitive status and the type and amount of help required at home. DISCUSSION: This study provides new data mining-based information about how factors associated with improvement in mobility group together and vary by mobility at admission. These approaches have potential to provide new insights for clinicians to tailor interventions for improvement of mobility.
Assuntos
Mineração de Dados , Serviços de Assistência Domiciliar , Limitação da Mobilidade , Avaliação de Resultados em Cuidados de Saúde/estatística & dados numéricos , Caminhada/fisiologia , Atividades Cotidianas , Adulto , Fatores Etários , Idoso , Idoso de 80 Anos ou mais , Análise por Conglomerados , Bases de Dados Factuais , Feminino , Humanos , Masculino , Medicare , Pessoa de Meia-Idade , Recuperação de Função Fisiológica/fisiologia , Estudos Retrospectivos , Fatores de Risco , Estados Unidos , Adulto JovemRESUMO
BACKGROUND: Personal genome assembly is a critical process when studying tumor genomes and other highly divergent sequences. The accuracy of downstream analyses, such as RNA-seq and ChIP-seq, can be greatly enhanced by using personal genomic sequences rather than standard references. Unfortunately, reads sequenced from these types of samples often have a heterogeneous mix of various subpopulations with different variants, making assembly extremely difficult using existing assembly tools. To address these challenges, we developed SHEAR (Sample Heterogeneity Estimation and Assembly by Reference; http://vk.cs.umn.edu/SHEAR), a tool that predicts SVs, accounts for heterogeneous variants by estimating their representative percentages, and generates personal genomic sequences to be used for downstream analysis. RESULTS: By making use of structural variant detection algorithms, SHEAR offers improved performance in the form of a stronger ability to handle difficult structural variant types and better computational efficiency. We compare against the lead competing approach using a variety of simulated scenarios as well as real tumor cell line data with known heterogeneous variants. SHEAR is shown to successfully estimate heterogeneity percentages in both cases, and demonstrates an improved efficiency and better ability to handle tandem duplications. CONCLUSION: SHEAR allows for accurate and efficient SV detection and personal genomic sequence generation. It is also able to account for heterogeneous sequencing samples, such as from tumor tissue, by estimating the subpopulation percentage for each heterogeneous variant.
Assuntos
Software , Algoritmos , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Internet , Análise de Sequência de RNA/normas , Interface Usuário-ComputadorRESUMO
Insertional mutagenesis screens in mice are used to identify individual genes that drive tumor formation. In these screens, candidate cancer genes are identified if their genomic location is proximal to a common insertion site (CIS) defined by high rates of transposon or retroviral insertions in a given genomic window. In this article, we describe a new method for defining CISs based on a Poisson distribution, the Poisson Regression Insertion Model, and show that this new method is an improvement over previously described methods. We also describe a modification of the method that can identify pairs and higher orders of co-occurring common insertion sites. We apply these methods to two data sets, one generated in a transposon-based screen for gastrointestinal tract cancer genes and another based on the set of retroviral insertions in the Retroviral Tagged Cancer Gene Database. We show that the new methods identify more relevant candidate genes and candidate gene pairs than found using previous methods. Identification of the biologically relevant set of mutations that occur in a single cell and cause tumor progression will aid in the rational design of single and combinatorial therapies in the upcoming age of personalized cancer therapy.
Assuntos
Transformação Celular Neoplásica/genética , Elementos de DNA Transponíveis , Genes Neoplásicos , Mutagênese Insercional , Retroviridae/genética , Animais , Neoplasias Gastrointestinais/genética , Humanos , Camundongos , Método de Monte Carlo , Distribuição de Poisson , Análise de RegressãoRESUMO
In rodent model systems, the sequential changes in lung morphology resulting from hyperoxic injury are well characterized and are similar to changes in human acute respiratory distress syndrome. In the injured lung, alveolar type two (AT2) epithelial cells play a critical role in restoring the normal alveolar structure. Thus characterizing the changes in AT2 cells will provide insights into the mechanisms underpinning the recovery from lung injury. We applied an unbiased systems-level proteomics approach to elucidate molecular mechanisms contributing to lung repair in a rat hyperoxic lung injury model. AT2 cells were isolated from rat lungs at predetermined intervals during hyperoxic injury and recovery. Protein expression profiles were determined by using iTRAQ with tandem mass spectrometry. Of the 959 distinct proteins identified, 183 significantly changed in abundance during the injury-recovery cycle. Gene ontology enrichment analysis identified cell cycle, cell differentiation, cell metabolism, ion homeostasis, programmed cell death, ubiquitination, and cell migration to be significantly enriched by these proteins. Gene set enrichment analysis of data acquired during lung repair revealed differential expression of gene sets that control multicellular organismal development, systems development, organ development, and chemical homeostasis. More detailed analysis identified activity in two regulatory pathways, JNK and miR 374. A novel short time-series expression miner algorithm identified protein clusters with coherent changes during injury and repair. We concluded that coherent changes occur in the AT2 cell proteome in response to hyperoxic stress. These findings offer guidance regarding the specific molecular mechanisms governing repair of the injured lung.
Assuntos
Lesão Pulmonar Aguda/metabolismo , Hiperóxia/metabolismo , Estresse Oxidativo/fisiologia , Proteômica , Alvéolos Pulmonares/metabolismo , Mucosa Respiratória/metabolismo , Lesão Pulmonar Aguda/genética , Algoritmos , Animais , Células Cultivadas , Modelos Animais de Doenças , Hiperóxia/genética , Masculino , Oxigênio/toxicidade , Ratos , Ratos Sprague-Dawley , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz , TranscriptomaRESUMO
Neuroimaging research indicates that human intellectual ability is related to brain structure including the thickness of the cerebral cortex. Most studies indicate that general intelligence is positively associated with cortical thickness in areas of association cortex distributed throughout both brain hemispheres. In this study, we performed a cortical thickness mapping analysis on data from 182 healthy typically developing males and females ages 9 to 24 years to identify correlates of general intelligence (g) scores. To determine if these correlates also mediate associations of specific cognitive abilities with cortical thickness, we regressed specific cognitive test scores on g scores and analyzed the residuals with respect to cortical thickness. The effect of age on the association between cortical thickness and intelligence was examined. We found a widely distributed pattern of positive associations between cortical thickness and g scores, as derived from the first unrotated principal factor of a factor analysis of Wechsler Abbreviated Scale of Intelligence (WASI) subtest scores. After WASI specific cognitive subtest scores were regressed on g factor scores, the residual score variances did not correlate significantly with cortical thickness in the full sample with age covaried. When participants were grouped at the age median, significant positive associations of cortical thickness were obtained in the older group for g-residualized scores on Block Design (a measure of visual-motor integrative processing) while significant negative associations of cortical thickness were observed in the younger group for g-residualized Vocabulary scores. These results regarding correlates of general intelligence are concordant with the existing literature, while the findings from younger versus older subgroups have implications for future research on brain structural correlates of specific cognitive abilities, as well as the cognitive domain specificity of behavioral performance correlates of normative gray matter thinning during adolescence.
RESUMO
OBJECTIVE: The association of body mass index (BMI) and all-cause mortality is controversial, frequently referred to as a paradox. Whether the cause is metabolic factors or statistical biases is still controversial. We assessed the association of BMI and all-cause mortality considering a wide range of comorbidities and baseline mortality risk. METHODS: Retrospective cohort study of Olmsted County residents with at least one BMI measurement between 2000-2005, clinical data in the electronic health record and minimum 8 year follow-up or death within this time. The cohort was categorized based on baseline mortality risk: Low, Medium, Medium-high, High and Very-high. All-cause mortality was assessed for BMI intervals of 5 and 0.5 Kg/m2. RESULTS: Of 39,739 subjects (average age 52.6, range 18-89; 38.1% male) 11.86% died during 8-year follow-up. The 8-year all-cause mortality risk had a "U" shape with a flat nadir in all the risk groups. Extreme BMI showed higher risk (BMI <15 = 36.4%, 15 to <20 = 15.4% and ≥45 = 13.7%), while intermediate BMI categories showed a plateau between 10.6 and 12.5%. The increased risk attributed to baseline risk and comorbidities was more obvious than the risk based on BMI increase within the same risk groups. CONCLUSIONS: There is a complex association between BMI and all-cause mortality when evaluated including comorbidities and baseline mortality risk. In general, comorbidities are better predictors of mortality risk except at extreme BMIs. In patients with no or few comorbidities, BMI seems to better define mortality risk. Aggressive management of comorbidities may provide better survival outcome for patients with body mass between normal and moderate obesity.
Assuntos
Índice de Massa Corporal , Comorbidade , Mortalidade , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Registros Eletrônicos de Saúde/estatística & dados numéricos , Feminino , Seguimentos , Humanos , Masculino , Pessoa de Meia-Idade , Minnesota/epidemiologia , Estudos Retrospectivos , Medição de Risco/métodos , Medição de Risco/estatística & dados numéricos , Fatores de Risco , Adulto JovemRESUMO
OBJECTIVE: Hospital-acquired infections (HAIs) are associated with significant morbidity, mortality, and prolonged hospital length of stay. Risk prediction models based on pre- and intraoperative data have been proposed to assess the risk of HAIs at the end of the surgery, but the performance of these models lag behind HAI detection models based on postoperative data. Postoperative data are more predictive than pre- or interoperative data since it is closer to the outcomes in time, but it is unavailable when the risk models are applied (end of surgery). The objective is to study whether such data, which is temporally unavailable at prediction time (TUP) (and thus cannot directly enter the model), can be used to improve the performance of the risk model. MATERIALS AND METHODS: An extensive array of 12 methods based on logistic/linear regression and deep learning were used to incorporate the TUP data using a variety of intermediate representations of the data. Due to the hierarchical structure of different HAI outcomes, a comparison of single and multi-task learning frameworks is also presented. RESULTS AND DISCUSSION: The use of TUP data was always advantageous as baseline methods, which cannot utilize TUP data, never achieved the top performance. The relative performances of the different models vary across the different outcomes. Regarding the intermediate representation, we found that its complexity was key and that incorporating label information was helpful. CONCLUSIONS: Using TUP data significantly helped predictive performance irrespective of the model complexity.
Assuntos
Infecção Hospitalar , Infecção Hospitalar/epidemiologia , Hospitais , Humanos , Modelos Logísticos , MorbidadeRESUMO
Diseases can show different courses of progression even when patients share the same risk factors. Recent studies have revealed that the use of trajectories, the order in which diseases manifest throughout life, can be predictive of the course of progression. In this study, we propose a novel computational method for learning disease trajectories from EHR data. The proposed method consists of three parts: first, we propose an algorithm for extracting trajectories from EHR data; second, three criteria for filtering trajectories; and third, a likelihood function for assessing the risk of developing a set of outcomes given a trajectory set. We applied our methods to extract a set of disease trajectories from Mayo Clinic EHR data and evaluated it internally based on log-likelihood, which can be interpreted as the trajectories' ability to explain the observed (partial) disease progressions. We then externally evaluated the trajectories on EHR data from an independent health system, M Health Fairview. The proposed algorithm extracted a comprehensive set of disease trajectories that can explain the observed outcomes substantially better than competing methods and the proposed filtering criteria selected a small subset of disease trajectories that are highly interpretable and suffered only a minimal (relative 5%) loss of the ability to explain disease progression in both the internal and external validation.
Assuntos
Algoritmos , Registros Eletrônicos de Saúde , HumanosRESUMO
Importance: Clinical domain knowledge about diseases and their comorbidities, severity, treatment pathways, and outcomes can facilitate diagnosis, enhance preventive strategies, and help create smart evidence-based practice guidelines. Objective: To introduce a new representation of patient data called disease severity hierarchy that leverages domain knowledge in a nested fashion to create subpopulations that share increasing amounts of clinical details suitable for risk prediction. Design, Setting, and Participants: This retrospective cohort study included 51â¯969 patients aged 45 to 85 years, with 10â¯674 patients who received primary care at the Mayo Clinic between January 2004 and December 2015 in the training cohort and 41â¯295 patients who received primary care at Fairview Health Services from January 2010 to December 2017 in the validation cohort. Data were analyzed from May 2018 to December 2019. Main Outcomes and Measures: Several binary classification measures, including the area under the receiver operating characteristic curve (AUC), Gini score, sensitivity, and positive predictive value, were used to evaluate models predicting all-cause mortality and major cardiovascular events at ages 60, 65, 75, and 80 years. Results: The mean (SD) age and proportions of women and white individuals were 59.4 (10.8) years, 6324 (59.3%) and 9804 (91.9%), respectively, in the training cohort and 57.4 (7.9) years, 21â¯975 (53.1%), and 37â¯653 (91.2%), respectively, in the validation cohort. During follow-up, 945 patients (8.9%) in the training cohort died, while 787 (7.4%) had major cardiovascular events. Models using the new representation achieved AUCs for predicting death in the training cohort at ages 60, 65, 75, and 80 years of 0.96 (95% CI, 0.94-0.97), 0.96 (95% CI, 0.95-0.98), 0.97 (95% CI, 0.96-0.98), and 0.98 (95% CI, 0.98-0.99), respectively, while standard methods achieved modest AUCs of 0.67 (95% CI, 0.55-0.80), 0.66 (95% CI, 0.56-0.79), 0.64 (95% CI, 0.57-0.71), and 0.63 (95% CI, 0.54-0.70), respectively. Conclusions and Relevance: In this study, the proposed patient data representation accurately predicted the age at which a patient was at risk of dying or developing major cardiovascular events substantially better than standard methods. The representation uses known relationships contained in electronic health records to capture disease severity in a natural and clinically meaningful way. Furthermore, it is expressive and interpretable. This novel patient representation can help to support critical decision-making, develop smart guidelines, and enhance health care and disease management by helping to identify patients with high risk.
Assuntos
Doenças Cardiovasculares , Medição de Risco/métodos , Índice de Gravidade de Doença , Fatores Etários , Idoso , Doenças Cardiovasculares/diagnóstico , Doenças Cardiovasculares/mortalidade , Comorbidade , Prática Clínica Baseada em Evidências , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Padrões de Prática Médica , Valor Preditivo dos Testes , Serviços Preventivos de Saúde/métodos , Serviços Preventivos de Saúde/normas , Melhoria de QualidadeRESUMO
Our aging population increasingly suffers from multiple chronic diseases simultaneously, necessitating the comprehensive treatment of these conditions. Finding the optimal set of drugs for a combinatorial set of diseases is a combinatorial pattern exploration problem. Association rule mining is a popular tool for such problems, but the requirement of health care for finding causal, rather than associative, patterns renders association rule mining unsuitable. To address this issue, we propose a novel framework based on the Rubin-Neyman causal model for extracting causal rules from observational data, correcting for a number of common biases. Specifically, given a set of interventions and a set of items that define subpopulations (e.g., diseases), we wish to find all subpopulations in which effective intervention combinations exist and in each such subpopulation, we wish to find all intervention combinations such that dropping any intervention from this combination will reduce the efficacy of the treatment. A key aspect of our framework is the concept of closed intervention sets which extend the concept of quantifying the effect of a single intervention to a set of concurrent interventions. Closed intervention sets also allow for a pruning strategy that is strictly more efficient than the traditional pruning strategy used by the Apriori algorithm. To implement our ideas, we introduce and compare five methods of estimating causal effect from observational data and rigorously evaluate them on synthetic data to mathematically prove (when possible) why they work. We also evaluated our causal rule mining framework on the Electronic Health Records (EHR) data of a large cohort of 152000 patients from Mayo Clinic and showed that the patterns we extracted are sufficiently rich to explain the controversial findings in the medical literature regarding the effect of a class of cholesterol drugs on Type-II Diabetes Mellitus (T2DM).
RESUMO
In recent years, the emerging field of computational psychiatry has impelled the use of machine learning models as a means to further understand the pathogenesis of multiple clinical disorders. In this paper, we discuss how autism spectrum disorder (ASD) was and continues to be diagnosed in the context of its complex neurodevelopmental heterogeneity. We review machine learning approaches to streamline ASD's diagnostic methods, to discern similarities and differences from comorbid diagnoses, and to follow developmentally variable outcomes. Both supervised machine learning models for classification outcome and unsupervised approaches to identify new dimensions and subgroups are discussed. We provide an illustrative example of how computational analytic methods and a longitudinal design can improve our inferential ability to detect early dysfunctional behaviors that may or may not reach threshold levels for formal diagnoses. Specifically, an unsupervised machine learning approach of anomaly detection is used to illustrate how community samples may be utilized to investigate early autism risk, multidimensional features, and outcome variables. Because ASD symptoms and challenges are not static within individuals across development, computational approaches present a promising method to elucidate subgroups of etiological contributions to phenotype, alternative developmental courses, interactions with biomedical comorbidities, and to predict potential responses to therapeutic interventions.
Assuntos
Transtorno do Espectro Autista/diagnóstico , Transtorno do Espectro Autista/fisiopatologia , Comorbidade , Aprendizado de Máquina , Modelos Teóricos , Transtorno do Espectro Autista/epidemiologia , HumanosRESUMO
The ability to assess data quality is essential for secondary use of EHR data and an automated Healthcare Data Quality Framework (HDQF) can be used as a tool to support a healthcare organization's data quality initiatives. Use of a general purpose HDQF provides a method to assess and visualize data quality to quickly identify areas for improvement. The value of the approach is illustrated for two analytics use cases: 1) predictive models and 2) clinical quality measures. The results show that data quality issues can be efficiently identified and visualized. The automated HDQF is much less time consuming than a manual approach to data quality and the framework can be rerun repeatedly on additional datasets without much effort.
RESUMO
Different analytic techniques operate optimally with different types of data. As the use of EHR-based analytics expands to newer tasks, data will have to be transformed into different representations, so the tasks can be optimally solved. We classified representations into broad categories based on their characteristics, and proposed a new knowledge-driven representation for clinical data mining as well as trajectory mining, called Severity Encoding Variables (SEVs). Additionally, we studied which characteristics make representations most suitable for particular clinical analytics tasks including trajectory mining. Our evaluation shows that, for regression, most data representations performed similarly, with SEV achieving a slight (albeit statistically significant) advantage. For patients at high risk of diabetes, it outperformed the competing representation by (relative) 20%. For association mining, SEV achieved the highest performance. Its ability to constrain the search space of patterns through clinical knowledge was key to its success.
Assuntos
Mineração de Dados , Registros Eletrônicos de SaúdeRESUMO
Genetic interactions have been reported to underlie phenotypes in a variety of systems, but the extent to which they contribute to complex disease in humans remains unclear. In principle, genome-wide association studies (GWAS) provide a platform for detecting genetic interactions, but existing methods for identifying them from GWAS data tend to focus on testing individual locus pairs, which undermines statistical power. Importantly, a global genetic network mapped for a model eukaryotic organism revealed that genetic interactions often connect genes between compensatory functional modules in a highly coherent manner. Taking advantage of this expected structure, we developed a computational approach called BridGE that identifies pathways connected by genetic interactions from GWAS data. Applying BridGE broadly, we discover significant interactions in Parkinson's disease, schizophrenia, hypertension, prostate cancer, breast cancer, and type 2 diabetes. Our novel approach provides a general framework for mapping complex genetic networks underlying human disease from genome-wide genotype data.
Assuntos
Redes Reguladoras de Genes/genética , Predisposição Genética para Doença/genética , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Modelos Genéticos , Neoplasias da Mama/genética , Diabetes Mellitus Tipo 2/genética , Feminino , Humanos , Hipertensão/genética , Masculino , Transtornos Parkinsonianos/genética , Polimorfismo de Nucleotídeo Único/genética , Neoplasias da Próstata/genética , Esquizofrenia/genéticaRESUMO
BACKGROUND: We have engaged in an international program designated the Bank On A Cure, which has established DNA banks from multiple cooperative and institutional clinical trials, and a platform for examining the association of genetic variations with disease risk and outcomes in multiple myeloma. We describe the development and content of a novel custom SNP panel that contains 3404 SNPs in 983 genes, representing cellular functions and pathways that may influence disease severity at diagnosis, toxicity, progression or other treatment outcomes. A systematic search of national databases was used to identify non-synonymous coding SNPs and SNPs within transcriptional regulatory regions. To explore SNP associations with PFS we compared SNP profiles of short term (less than 1 year, n = 70) versus long term progression-free survivors (greater than 3 years, n = 73) in two phase III clinical trials. RESULTS: Quality controls were established, demonstrating an accurate and robust screening panel for genetic variations, and some initial racial comparisons of allelic variation were done. A variety of analytical approaches, including machine learning tools for data mining and recursive partitioning analyses, demonstrated predictive value of the SNP panel in survival. While the entire SNP panel showed genotype predictive association with PFS, some SNP subsets were identified within drug response, cellular signaling and cell cycle genes. CONCLUSION: A targeted gene approach was undertaken to develop an SNP panel that can test for associations with clinical outcomes in myeloma. The initial analysis provided some predictive power, demonstrating that genetic variations in the myeloma patient population may influence PFS.
Assuntos
Predisposição Genética para Doença , Mieloma Múltiplo/diagnóstico , Mieloma Múltiplo/genética , Polimorfismo de Nucleotídeo Único , Bases de Dados Genéticas , Intervalo Livre de Doença , Genômica , Humanos , Fases de Leitura Aberta , Valor Preditivo dos Testes , Regiões Promotoras GenéticasRESUMO
To conduct an independent secondary analysis of a multi-focal intervention for early detection of sepsis that included implementation of change management strategies, electronic surveillance for sepsis, and evidence based point of care alerting using the POC AdvisorTM application. METHODS: Propensity score matching was used to select subsets of the cohorts with balanced covariates. Bootstrapping was performed to build distributions of the measured difference in rates/means. The effect of the sepsis intervention was evaluated for all patients, and High and Low Risk subgroups for illness severity. A separate analysis was performed patients on the intervention and non-intervention units (without the electronic surveillance). Sensitivity, specificity, and the positive predictive values were calculated to evaluate the accuracy of the alerting system for detecting sepsis or severe sepsis/ septic shock. RESULTS: There was positive effect on the intervention units with sepsis electronic surveillance with an adjusted mortality rate of -6.6%. Mortality rates for non-intervention units also improved, but at a lower rate of -2.9%. Additional outcomes improved for patients on both intervention and non-intervention units for home discharge (7.5% vs 1.1%), total length of hospital stay (-0.9% vs -0.3%), and 30 day readmissions (-6.6% vs -1.6%). Patients on the intervention units showed better outcomes compared with non-intervention unit patients, and even more so for High Risk patients. The sensitivity was 95.2%, specificity of 82.0% and PPV of 50.6% for the electronic surveillance alerts. CONCLUSION: There was improvement over time across the hospital for patients on the intervention and non-intervention units with more improvement for sicker patients. Patients on intervention units with electronic surveillance have better outcomes; however, due to differences in exclusion criteria and types of units, further study is needed to draw a direct relationship between the electronic surveillance system and outcomes.