RESUMO
BACKGROUND: Fundamentally defined by an imbalance in energy consumption and energy expenditure, obesity is a significant risk factor of several musculoskeletal conditions including osteoarthritis (OA). High-fat diets and sedentary lifestyle leads to increased adiposity resulting in systemic inflammation due to the endocrine properties of adipose tissue producing inflammatory cytokines and adipokines. We previously showed serum levels of specific adipokines are associated with biomarkers of bone remodelling and cartilage volume loss in knee OA patients. Whilst more recently we find the metabolic consequence of obesity drives the enrichment of pro-inflammatory fibroblast subsets within joint synovial tissues in obese individuals compared to those of BMI defined 'health weight'. As such this present study identifies obesity-associated genes in OA joint tissues which are conserved across species and conditions. METHODS: The study utilised 6 publicly available bulk and single-cell transcriptomic datasets from human and mice studies downloaded from Gene Expression Omnibus (GEO). Machine learning models were employed to model and statistically test datasets for conserved gene expression profiles. Identified genes were validated in OA tissues from obese and healthy weight individuals using quantitative PCR method (N = 38). Obese and healthy-weight patients were categorised by BMI > 30 and BMI between 18 and 24.9 respectively. Informed consent was obtained from all study participants who were scheduled to undergo elective arthroplasty. RESULTS: Principal component analysis (PCA) was used to investigate the variations between classes of mouse and human data which confirmed variation between obese and healthy populations. Differential gene expression analysis filtered on adjusted p-values of p < 0.05, identified differentially expressed genes (DEGs) in mouse and human datasets. DEGs were analysed further using area under curve (AUC) which identified 12 genes. Pathway enrichment analysis suggests these genes were involved in the biosynthesis and elongation of fatty acids and the transport, oxidation, and catabolic processing of lipids. qPCR validation found the majority of genes showed a tendency to be upregulated in joint tissues from obese participants. Three validated genes, IGFBP2 (p = 0.0363), DOK6 (0.0451) and CASP1 (0.0412) were found to be significantly different in obese joint tissues compared to lean-weight joint tissues. CONCLUSIONS: The present study has employed machine learning models across several published obesity datasets to identify obesity-associated genes which are validated in joint tissues from OA. These results suggest obesity-associated genes are conserved across conditions and may be fundamental in accelerating disease in obese individuals. Whilst further validations and additional conditions remain to be tested in this model, identifying obesity-associated genes in this way may serve as a global aid for patient stratification giving rise to the potential of targeted therapeutic interventions in such patient subpopulations.
Assuntos
Obesidade , Transcriptoma , Humanos , Obesidade/genética , Obesidade/complicações , Obesidade/metabolismo , Animais , Camundongos , Transcriptoma/genética , Especificidade da Espécie , Perfilação da Expressão Gênica , Análise de Componente Principal , Aprendizado de Máquina , Regulação da Expressão Gênica , Masculino , FemininoRESUMO
[This corrects the article DOI: 10.1371/journal.pone.0263390.].
RESUMO
BACKGROUND: Numerous approaches have been proposed for the detection of epistatic interactions within GWAS datasets in order to better understand the drivers of disease and genetics. METHODS: A selection of state-of-the-art approaches were assessed. These included the statistical tests, fast-epistasis, BOOST, logistic regression and wtest; swarm intelligence methods, namely AntEpiSeeker, epiACO and CINOEDV; and data mining approaches, including MDR, GSS, SNPRuler and MPI3SNP. Data were simulated to provide randomly generated models with no individual main effects at different heritabilities (pure epistasis) as well as models based on penetrance tables with some main effects (impure epistasis). Detection of both two and three locus interactions were assessed across a total of 1,560 simulated datasets. The different methods were also applied to a section of the UK biobank cohort for Atrial Fibrillation. RESULTS: For pure, two locus interactions, PLINK's implementation of BOOST recovered the highest number of correct interactions, with 53.9% and significantly better performing than the other methods (p = 4.52e - 36). For impure two locus interactions, MDR exhibited the best performance, recovering 62.2% of the most significant impure epistatic interactions (p = 6.31e - 90 for all but one test). The assessment of three locus interaction prediction revealed that wtest recovered the highest number (17.2%) of pure epistatic interactions(p = 8.49e - 14). wtest also recovered the highest number of three locus impure epistatic interactions (p = 6.76e - 48) while AntEpiSeeker ranked as the most significant the highest number of such interactions (40.5%). Finally, when applied to a real dataset for Atrial Fibrillation, most notably finding an interaction between SYNE2 and DTNB.
Assuntos
Fibrilação Atrial/genética , Epistasia Genética , Loci Gênicos , Modelos Genéticos , Penetrância , Algoritmos , Alelos , Mineração de Dados/métodos , Proteínas Associadas à Distrofina/genética , Frequência do Gene , Estudo de Associação Genômica Ampla/métodos , Genótipo , Humanos , Modelos Lineares , Proteínas dos Microfilamentos/genética , Redução Dimensional com Múltiplos Fatores , Proteínas do Tecido Nervoso/genética , Neuropeptídeos/genética , Polimorfismo de Nucleotídeo Único , Curva ROCRESUMO
Observational and experimental evidence has linked chronotype to both psychological and cardiometabolic traits. Recent Mendelian randomization (MR) studies have investigated direct links between chronotype and several of these traits, often in isolation of outside potential mediating or moderating traits. We mined the EpiGraphDB MR database for calculated chronotype-trait associations (p-value < 5 × 10-8). We then re-analyzed those relevant to metabolic or mental health and investigated for statistical evidence of horizontal pleiotropy. Analyses passing multiple testing correction were then investigated for confounders, colliders, intermediates, and reverse intermediates using the EpiGraphDB database, creating multiple chronotype-trait interactions among each of the the traits studied. We revealed 10 significant chronotype-exposure associations (false discovery rate < 0.05) exposed to 111 potential previously known confounders, 52 intermediates, 18 reverse intermediates, and 31 colliders. Chronotype-lipid causal associations collided with treatment and diabetes effects; chronotype-bipolar associations were mediated by breast cancer; and chronotype-alcohol intake associations were impacted by confounders and intermediate variables including known zeitgebers and molecular traits. We have reported the influence of chronotype on several cardiometabolic and behavioural traits, and identified potential confounding variables not reported on in studies while discovering new associations to drugs and disease.
Assuntos
Transtorno Bipolar/genética , Ritmo Circadiano/genética , Fenótipo , Consumo de Bebidas Alcoólicas , Álcoois , Bases de Dados Genéticas , Humanos , Análise da Randomização Mendeliana , Fluxo de TrabalhoRESUMO
Multimorbidity, frequently associated with aging, can be operationally defined as the presence of two or more chronic conditions. Predicting the likelihood of a patient with multimorbidity to develop a further particular disease in the future is one of the key challenges in multimorbidity research. In this paper we are using a network-based approach to analyze multimorbidity data and develop methods for predicting diseases that a patient is likely to develop. The multimorbidity data is represented using a temporal bipartite network whose nodes represent patients and diseases and a link between these nodes indicates that the patient has been diagnosed with the disease. Disease prediction then is reduced to a problem of predicting those missing links in the network that are likely to appear in the future. We develop a novel link prediction method for static bipartite network and validate the performance of the method on benchmark datasets. By using a probabilistic framework, we then report on the development of a method for predicting future links in the network, where links are labelled with a time-stamp. We apply the proposed method to three different multimorbidity datasets and report its performance measured by different performance metrics including AUC, Precision, Recall, and F-Score.
Assuntos
Doença Crônica/tendências , Multimorbidade/tendências , Previsões/métodos , Humanos , ProbabilidadeRESUMO
In this chapter we discuss the past, present and future of clinical biomarker development. We explore the advent of new technologies, paving the way in which health, medicine and disease is understood. This review includes the identification of physicochemical assays, current regulations, the development and reproducibility of clinical trials, as well as, the revolution of omics technologies and state-of-the-art integration and analysis approaches.
Assuntos
Medicina de Precisão , Inteligência Artificial , Biomarcadores/análise , HumanosRESUMO
Recent advances in emergency medicine and the co-ordinated delivery of trauma care mean more critically-injured patients now reach the hospital alive and survive life-saving operations. Indeed, between 2008 and 2017, the odds of surviving a major traumatic injury in the UK increased by nineteen percent. However, the improved survival rates of severely-injured patients have placed an increased burden on the healthcare system, with major trauma a common cause of intensive care unit (ICU) admissions that last ≥10 days. Improved understanding of the factors influencing patient outcomes is now urgently needed. We investigated the serum metabolomic profile of fifty-five major trauma patients across three post-injury phases: acute (days 0-4), intermediate (days 5-14) and late (days 15-112). Using ICU length of stay (LOS) as a clinical outcome, we aimed to determine whether the serum metabolome measured at days 0-4 post-injury for patients with an extended (≥10 days) ICU LOS differed from that of patients with a short (<10 days) ICU LOS. In addition, we investigated whether combining metabolomic profiles with clinical scoring systems would generate a variable that would identify patients with an extended ICU LOS with a greater degree of accuracy than models built on either variable alone. The number of metabolites unique to and shared across each time segment varied across acute, intermediate and late segments. A one-way ANOVA revealed the most variation in metabolite levels across the different time-points was for the metabolites lactate, glucose, anserine and 3-hydroxybutyrate. A total of eleven features were selected to differentiate between <10 days ICU LOS vs. >10 days ICU LOS. New Injury Severity Score (NISS), testosterone, and the metabolites cadaverine, urea, isoleucine, acetoacetate, dimethyl sulfone, syringate, creatinine, xylitol, and acetone form the integrated biomarker set. Using metabolic enrichment analysis, we found valine, leucine and isoleucine biosynthesis, glutathione metabolism, and glycine, serine and threonine metabolism were the top three pathways differentiating ICU LOS with a p < 0.05. A combined model of NISS and testosterone and all nine selected metabolites achieved an AUROC of 0.824. Differences exist in the serum metabolome of major trauma patients who subsequently experience a short or prolonged ICU LOS in the acute post-injury setting. Combining metabolomic data with anatomical scoring systems allowed us to discriminate between these two groups with a greater degree of accuracy than that of either variable alone.
RESUMO
Inferring the topology of a gene regulatory network (GRN) from gene expression data is a challenging but important undertaking for gaining a better understanding of gene regulation. Key challenges include working with noisy data and dealing with a higher number of genes than samples. Although a number of different methods have been proposed to infer the structure of a GRN, there are large discrepancies among the different inference algorithms they adopt, rendering their meaningful comparison challenging. In this study, we used two methods, namely the MIDER (Mutual Information Distance and Entropy Reduction) and the PLSNET (Partial least square based feature selection) methods, to infer the structure of a GRN directly from data and computationally validated our results. Both methods were applied to different gene expression datasets resulting from inflammatory bowel disease (IBD), pancreatic ductal adenocarcinoma (PDAC), and acute myeloid leukaemia (AML) studies. For each case, gene regulators were successfully identified. For example, for the case of the IBD dataset, the UGT1A family genes were identified as key regulators while upon analysing the PDAC dataset, the SULF1 and THBS2 genes were depicted. We further demonstrate that an ensemble-based approach, that combines the output of the MIDER and PLSNET algorithms, can infer the structure of a GRN from data with higher accuracy. We have also estimated the number of the samples required for potential future validation studies. Here, we presented our proposed analysis framework that caters not only to candidate regulator genes prediction for potential validation experiments but also an estimation of the number of samples required for these experiments.
Assuntos
Carcinoma Ductal Pancreático/genética , Biologia Computacional/métodos , Redes Reguladoras de Genes , Doenças Inflamatórias Intestinais/genética , Leucemia Mieloide Aguda/genética , Neoplasias Pancreáticas/genética , Algoritmos , Perfilação da Expressão Gênica/métodos , Regulação da Expressão Gênica , Marcadores Genéticos , Glucuronosiltransferase/genética , Humanos , Análise dos Mínimos Quadrados , Sulfotransferases/genética , Trombospondinas/genéticaRESUMO
Genome-wide data is used to stratify patients into classes for precision medicine using clustering algorithms. A common problem in this area is selection of the number of clusters (K). The Monti consensus clustering algorithm is a widely used method which uses stability selection to estimate K. However, the method has bias towards higher values of K and yields high numbers of false positives. As a solution, we developed Monte Carlo reference-based consensus clustering (M3C), which is based on this algorithm. M3C simulates null distributions of stability scores for a range of K values thus enabling a comparison with real data to remove bias and statistically test for the presence of structure. M3C corrects the inherent bias of consensus clustering as demonstrated on simulated and real expression data from The Cancer Genome Atlas (TCGA). For testing M3C, we developed clusterlab, a new method for simulating multivariate Gaussian clusters.