ABSTRACT
BACKGROUND: While our understanding of the single-cell gene expression patterns underlying the transformation of vascular cell types during the progression of atherosclerosis is rapidly improving, the clinical and pathophysiological relevance of these changes remains poorly understood. METHODS: Single-cell RNA sequencing data generated with SmartSeq2 (≈8000 genes/cell) in 16â 588 single cells isolated during atherosclerosis progression in Ldlr-/-Apob100/100 mice with human-like plasma lipoproteins and from humans with asymptomatic and symptomatic carotid plaques was clustered into multiple subtypes. For clinical and pathophysiological context, the advanced-stage and symptomatic subtype clusters were integrated with 135 tissue-specific (atherosclerotic aortic wall, mammary artery, liver, skeletal muscle, and visceral and subcutaneous, fat) gene-regulatory networks (GRNs) inferred from 600 coronary artery disease patients in the STARNET (Stockholm-Tartu Atherosclerosis Reverse Network Engineering Task) study. RESULTS: Advanced stages of atherosclerosis progression and symptomatic carotid plaques were largely characterized by 3 smooth muscle cells (SMCs), and 3 macrophage subtype clusters with extracellular matrix organization/osteogenic (SMC), and M1-type proinflammatory/Trem2-high lipid-associated (macrophage) phenotypes. Integrative analysis of these 6 clusters with STARNET revealed significant enrichments of 3 arterial wall GRNs: GRN33 (macrophage), GRN39 (SMC), and GRN122 (macrophage) with major contributions to coronary artery disease heritability and strong associations with clinical scores of coronary atherosclerosis severity. The presence and pathophysiological relevance of GRN39 were verified in 5 independent RNAseq data sets obtained from the human coronary and aortic artery, and primary SMCs and by targeting its top-key drivers, FRZB and ALCAM in cultured human coronary artery SMCs. CONCLUSIONS: By identifying and integrating the most gene-rich single-cell subclusters of atherosclerosis to date with a coronary artery disease framework of GRNs, GRN39 was identified and independently validated as being critical for the transformation of contractile SMCs into an osteogenic phenotype promoting advanced, symptomatic atherosclerosis.
Subject(s)
Atherosclerosis , Gene Regulatory Networks , Single-Cell Analysis , Humans , Animals , Atherosclerosis/genetics , Atherosclerosis/metabolism , Atherosclerosis/pathology , Mice , Myocytes, Smooth Muscle/metabolism , Myocytes, Smooth Muscle/pathology , Male , Plaque, Atherosclerotic , Disease Progression , Female , Macrophages/metabolism , Macrophages/pathology , Mice, Knockout , Receptors, LDL/genetics , Receptors, LDL/metabolism , Mice, Inbred C57BL , Muscle, Smooth, Vascular/metabolism , Muscle, Smooth, Vascular/pathologyABSTRACT
BACKGROUND: Molecular interaction networks summarize complex biological processes as graphs, whose structure is informative of biological function at multiple scales. Simultaneously, omics technologies measure the variation or activity of genes, proteins, or metabolites across individuals or experimental conditions. Integrating the complementary viewpoints of biological networks and omics data is an important task in bioinformatics, but existing methods treat networks as discrete structures, which are intrinsically difficult to integrate with continuous node features or activity measures. Graph neural networks map graph nodes into a low-dimensional vector space representation, and can be trained to preserve both the local graph structure and the similarity between node features. RESULTS: We studied the representation of transcriptional, protein-protein and genetic interaction networks in E. coli and mouse using graph neural networks. We found that such representations explain a large proportion of variation in gene expression data, and that using gene expression data as node features improves the reconstruction of the graph from the embedding. We further proposed a new end-to-end Graph Feature Auto-Encoder framework for the prediction of node features utilizing the structure of the gene networks, which is trained on the feature prediction task, and showed that it performs better at predicting unobserved node features than regular MultiLayer Perceptrons. When applied to the problem of imputing missing data in single-cell RNAseq data, the Graph Feature Auto-Encoder utilizing our new graph convolution layer called FeatGraphConv outperformed a state-of-the-art imputation method that does not use protein interaction information, showing the benefit of integrating biological networks and omics data with our proposed approach. CONCLUSION: Our proposed Graph Feature Auto-Encoder framework is a powerful approach for integrating and exploiting the close relation between molecular interaction networks and functional genomics data.
Subject(s)
Escherichia coli , Neural Networks, Computer , Animals , Computational Biology , Gene Regulatory Networks , Mice , ProteinsABSTRACT
MOTIVATION: Recently, it has become feasible to generate large-scale, multi-tissue gene expression data, where expression profiles are obtained from multiple tissues or organs sampled from dozens to hundreds of individuals. When traditional clustering methods are applied to this type of data, important information is lost, because they either require all tissues to be analyzed independently, ignoring dependencies and similarities between tissues, or to merge tissues in a single, monolithic dataset, ignoring individual characteristics of tissues. RESULTS: We developed a Bayesian model-based multi-tissue clustering algorithm, revamp, which can incorporate prior information on physiological tissue similarity, and which results in a set of clusters, each consisting of a core set of genes conserved across tissues as well as differential sets of genes specific to one or more subsets of tissues. Using data from seven vascular and metabolic tissues from over 100 individuals in the STockholm Atherosclerosis Gene Expression (STAGE) study, we demonstrate that multi-tissue clusters inferred by revamp are more enriched for tissue-dependent protein-protein interactions compared to alternative approaches. We further demonstrate that revamp results in easily interpretable multi-tissue gene expression associations to key coronary artery disease processes and clinical phenotypes in the STAGE individuals. AVAILABILITY AND IMPLEMENTATION: Revamp is implemented in the Lemon-Tree software, available at https://github.com/eb00/lemon-tree. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Subject(s)
Algorithms , Software , Bayes Theorem , Cluster Analysis , Gene Expression Profiling , HumansABSTRACT
The stress hormone cortisol modulates fuel metabolism, cardiovascular homoeostasis, mood, inflammation and cognition. The CORtisol NETwork (CORNET) consortium previously identified a single locus associated with morning plasma cortisol. Identifying additional genetic variants that explain more of the variance in cortisol could provide new insights into cortisol biology and provide statistical power to test the causative role of cortisol in common diseases. The CORNET consortium extended its genome-wide association meta-analysis for morning plasma cortisol from 12,597 to 25,314 subjects and from ~2.2 M to ~7 M SNPs, in 17 population-based cohorts of European ancestries. We confirmed the genetic association with SERPINA6/SERPINA1. This locus contains genes encoding corticosteroid binding globulin (CBG) and α1-antitrypsin. Expression quantitative trait loci (eQTL) analyses undertaken in the STARNET cohort of 600 individuals showed that specific genetic variants within the SERPINA6/SERPINA1 locus influence expression of SERPINA6 rather than SERPINA1 in the liver. Moreover, trans-eQTL analysis demonstrated effects on adipose tissue gene expression, suggesting that variations in CBG levels have an effect on delivery of cortisol to peripheral tissues. Two-sample Mendelian randomisation analyses provided evidence that each genetically-determined standard deviation (SD) increase in morning plasma cortisol was associated with increased odds of chronic ischaemic heart disease (0.32, 95% CI 0.06-0.59) and myocardial infarction (0.21, 95% CI 0.00-0.43) in UK Biobank and similarly in CARDIoGRAMplusC4D. These findings reveal a causative pathway for CBG in determining cortisol action in peripheral tissues and thereby contributing to the aetiology of cardiovascular disease.
Subject(s)
Cardiovascular Diseases/genetics , Myocardial Infarction/genetics , Transcortin/genetics , alpha 1-Antitrypsin/genetics , Adrenal Cortex Hormones/blood , Adult , Biological Specimen Banks , Cardiovascular Diseases/blood , Cardiovascular Diseases/epidemiology , Cardiovascular Diseases/pathology , Female , Gene Expression Regulation , Genetic Predisposition to Disease , Genome-Wide Association Study , Humans , Liver/metabolism , Liver/pathology , Male , Mendelian Randomization Analysis , Middle Aged , Myocardial Infarction/blood , Myocardial Infarction/pathology , Polymorphism, Single Nucleotide/genetics , Quantitative Trait Loci/genetics , United KingdomABSTRACT
OBJECTIVE: The male-specific region of the Y chromosome (MSY) remains one of the most unexplored regions of the genome. We sought to examine how the genetic variants of the MSY influence male susceptibility to coronary artery disease (CAD) and atherosclerosis. Approach and Results: Analysis of 129 133 men from UK Biobank revealed that only one of 7 common MSY haplogroups (haplogroup I1) was associated with CAD-carriers of haplogroup I1 had ≈11% increase in risk of CAD when compared with all other haplogroups combined (odds ratio, 1.11; 95% CI, 1.04-1.18; P=6.8×10-4). Targeted MSY sequencing uncovered 235 variants exclusive to this haplogroup. The haplogroup I1-specific variants showed 2.45- and 1.56-fold respective enrichment for promoter and enhancer chromatin states, in cells/tissues relevant to atherosclerosis, when compared with other MSY variants. Gene set enrichment analysis in CAD-relevant tissues showed that haplogroup I1 was associated with changes in pathways responsible for early and late stages of atherosclerosis development including defence against pathogens, immunity, oxidative phosphorylation, mitochondrial respiration, lipids, coagulation, and extracellular matrix remodeling. UTY was the only Y chromosome gene whose blood expression was associated with haplogroup I1. Experimental reduction of UTY expression in macrophages led to changes in expression of 59 pathways (28 of which overlapped with those associated with haplogroup I1) and a significant reduction in the immune costimulatory signal. CONCLUSIONS: Haplogroup I1 is enriched for regulatory chromatin variants in numerous cells of relevance to CAD and increases cardiovascular risk through proatherosclerotic reprogramming of the transcriptome, partly through UTY.
Subject(s)
Chromosomes, Human, Y , Coronary Artery Disease/genetics , Genetic Pleiotropy , Genetic Predisposition to Disease , Gene Expression , Haplotypes , High-Throughput Nucleotide Sequencing , Humans , Macrophages/metabolism , Male , Minor Histocompatibility Antigens/genetics , Nuclear Proteins/genetics , Phylogeny , Risk Factors , THP-1 CellsABSTRACT
Mapping gene expression as a quantitative trait using whole genome-sequencing and transcriptome analysis allows to discover the functional consequences of genetic variation. We developed a novel method and ultra-fast software Findr for higly accurate causal inference between gene expression traits using cis-regulatory DNA variations as causal anchors, which improves current methods by taking into consideration hidden confounders and weak regulations. Findr outperformed existing methods on the DREAM5 Systems Genetics challenge and on the prediction of microRNA and transcription factor targets in human lymphoblastoid cells, while being nearly a million times faster. Findr is publicly available at https://github.com/lingfeiwang/findr.
Subject(s)
Chromosome Mapping , High-Throughput Nucleotide Sequencing , Transcriptome/genetics , Algorithms , Chromosome Mapping/methods , Chromosome Mapping/standards , Databases, Genetic , Genetic Variation , High-Throughput Nucleotide Sequencing/methods , High-Throughput Nucleotide Sequencing/standards , Models, StatisticalABSTRACT
Transcription control plays a crucial role in establishing a unique gene expression signature for each of the hundreds of mammalian cell types. Though gene expression data have been widely used to infer cellular regulatory networks, existing methods mainly infer correlations rather than causality. We developed statistical models and likelihood-ratio tests to infer causal gene regulatory networks using enhancer RNA (eRNA) expression information as a causal anchor and applied the framework to eRNA and transcript expression data from the FANTOM Consortium. Predicted causal targets of transcription factors (TFs) in mouse embryonic stem cells, macrophages and erythroblastic leukaemia overlapped significantly with experimentally-validated targets from ChIP-seq and perturbation data. We further improved the model by taking into account that some TFs might act in a quantitative, dosage-dependent manner, whereas others might act predominantly in a binary on/off fashion. We predicted TF targets from concerted variation of eRNA and TF and target promoter expression levels within a single cell type, as well as across multiple cell types. Importantly, TFs with high-confidence predictions were largely different between these two analyses, demonstrating that variability within a cell type is highly relevant for target prediction of cell type-specific factors. Finally, we generated a compendium of high-confidence TF targets across diverse human cell and tissue types.
Subject(s)
Enhancer Elements, Genetic/genetics , Gene Regulatory Networks/genetics , Animals , Databases, Genetic , Embryonic Stem Cells/metabolism , Gene Expression Regulation , Humans , Mice , Models, Genetic , Phylogeny , RNA, Messenger/genetics , RNA, Messenger/metabolism , Reproducibility of ResultsABSTRACT
Transcriptional silencing is a major cause for the inactivation of tumor suppressor genes, however, the underlying mechanisms are only poorly understood. The EPHB2 gene encodes a receptor tyrosine kinase that controls epithelial cell migration and allocation in intestinal crypts. Through its ability to restrict cell spreading, EPHB2 functions as a tumor suppressor in colorectal cancer whose expression is frequently lost as tumors progress to the carcinoma stage. Previously we reported that EPHB2 expression depends on a transcriptional enhancer whose activity is diminished in EPHB2 non-expressing cells. Here we investigated the mechanisms that lead to EPHB2 enhancer inactivation. We show that expression of EPHB2 and SNAIL1 - an inducer of epithelial-mesenchymal transition (EMT) - is anti-correlated in colorectal cancer cell lines and tumors. In a cellular model of Snail1-induced EMT, we observe that features of active chromatin at the EPHB2 enhancer are diminished upon expression of murine Snail1. We identify the transcription factors FOXA1, MYB, CDX2 and TCF7L2 as EPHB2 enhancer factors and demonstrate that Snail1 indirectly inactivates the EPHB2 enhancer by downregulation of FOXA1 and MYB. In addition, Snail1 induces the expression of Lymphoid enhancer factor 1 (LEF1) which competitively displaces TCF7L2 from the EPHB2 enhancer. In contrast to TCF7L2, however, LEF1 appears to repress the EPHB2 enhancer. Our findings underscore the importance of transcriptional enhancers for gene regulation under physiological and pathological conditions and show that SNAIL1 employs a combinatorial mechanism to inactivate the EPHB2 enhancer based on activator deprivation and competitive displacement of transcription factors.
Subject(s)
Down-Regulation , Enhancer Elements, Genetic , Epithelial-Mesenchymal Transition/genetics , Gene Silencing , Receptor, EphB2/genetics , Snail Family Transcription Factors/physiology , Trans-Activators/metabolism , Transcription Factor 7-Like 2 Protein/metabolism , Cell Line , Chromatin/metabolism , HumansABSTRACT
OBJECTIVE: The genetically modified mouse is the most commonly used animal model for studying the pathogenesis of atherosclerotic disease. We aimed to assess if mice atherosclerosis-related genes could be validated in human disease through examination of results from genome-wide association studies. APPROACH AND RESULTS: We performed a systematic review to identify atherosclerosis-causing genes in mice and carried out gene-based association tests of their human orthologs for an association with human coronary artery disease and human large artery ischemic stroke. Moreover, we investigated the association of these genes with human atherosclerotic plaque characteristics. In addition, we assessed the presence of tissue-specific cis-acting expression quantitative trait loci for these genes in humans. Finally, using pathway analyses we show that the putative atherosclerosis-causing genes revealed few associations with human coronary artery disease, large artery ischemic stroke, or atherosclerotic plaque characteristics, despite the fact that the majority of these genes have cis-acting expression quantitative trait loci. CONCLUSIONS: A role for genes that has been observed in mice for atherosclerotic lesion development could scarcely be confirmed by studying associations of disease development with common human genetic variants. The value of murine atherosclerotic models for selection of therapeutic targets in human disease remains unclear.
Subject(s)
Coronary Artery Disease/genetics , Gene Expression Profiling , Intracranial Arteriosclerosis/genetics , Polymorphism, Single Nucleotide , Stroke/genetics , Animals , Computational Biology , Coronary Artery Disease/pathology , Databases, Genetic , Disease Models, Animal , Gene Expression Profiling/methods , Gene Expression Regulation , Gene Regulatory Networks , Genetic Markers , Genetic Predisposition to Disease , Genome-Wide Association Study , Humans , Intracranial Arteriosclerosis/pathology , Mice , Phenotype , Plaque, Atherosclerotic , Quantitative Trait Loci , Reproducibility of Results , Risk Assessment , Risk Factors , Species Specificity , Stroke/pathologyABSTRACT
The protein tyrosine kinase Ephrin type-B receptor 3 (EPHB3) counteracts tumor-cell dissemination by regulating intercellular adhesion and repulsion and acts as tumor/invasion suppressor in colorectal cancer. This protective mechanism frequently collapses at the adenoma-carcinoma transition due to EPHB3 transcriptional silencing. Here, we identify a transcriptional enhancer at the EPHB3 gene that integrates input from the intestinal stem-cell regulator achaete-scute family basic helix-loop-helix transcription factor 2 (ASCL2), Wnt/ß-catenin, MAP kinase, and Notch signaling. EPHB3 enhancer activity is highly variable in colorectal carcinoma cells and precisely reflects EPHB3 expression states, suggesting that enhancer dysfunction underlies EPHB3 silencing. Interestingly, low Notch activity parallels reduced EPHB3 expression in colorectal carcinoma cell lines and poorly differentiated tumor-tissue specimens. Restoring Notch activity reestablished enhancer function and EPHB3 expression. Although essential for intestinal stem-cell maintenance and adenoma formation, Notch activity seems dispensable in colorectal carcinomas. Notch activation even promoted growth arrest and apoptosis of colorectal carcinoma cells, attenuated their self-renewal capacity in vitro, and blocked tumor growth in vivo. Higher levels of Notch activity also correlated with longer disease-free survival of colorectal cancer patients. In summary, our results uncover enhancer decommissioning as a mechanism for transcriptional silencing of the EPHB3 tumor suppressor and argue for an antitumorigenic function of Notch signaling in advanced colorectal cancer.
Subject(s)
Colorectal Neoplasms/genetics , Enhancer Elements, Genetic/genetics , Gene Silencing , Receptor, EphB3/genetics , Transcription, Genetic , Animals , Apoptosis/genetics , Basic Helix-Loop-Helix Transcription Factors/metabolism , Cell Cycle Checkpoints/genetics , Cell Differentiation/genetics , Colorectal Neoplasms/enzymology , Colorectal Neoplasms/pathology , Gene Expression Regulation, Neoplastic , HT29 Cells , Humans , MAP Kinase Signaling System/genetics , Mice , Neoplastic Stem Cells/metabolism , Neoplastic Stem Cells/pathology , Receptor, EphB3/metabolism , Receptors, Notch/metabolism , Signal Transduction/genetics , Wnt Proteins/metabolism , beta Catenin/metabolismABSTRACT
Plasma cholesterol lowering (PCL) slows and sometimes prevents progression of atherosclerosis and may even lead to regression. Little is known about how molecular processes in the atherosclerotic arterial wall respond to PCL and modify responses to atherosclerosis regression. We studied atherosclerosis regression and global gene expression responses to PCL (≥80%) and to atherosclerosis regression itself in early, mature, and advanced lesions. In atherosclerotic aortic wall from Ldlr(-/-)Apob (100/100) Mttp (flox/flox)Mx1-Cre mice, atherosclerosis regressed after PCL regardless of lesion stage. However, near-complete regression was observed only in mice with early lesions; mice with mature and advanced lesions were left with regression-resistant, relatively unstable plaque remnants. Atherosclerosis genes responding to PCL before regression, unlike those responding to the regression itself, were enriched in inherited risk for coronary artery disease and myocardial infarction, indicating causality. Inference of transcription factor (TF) regulatory networks of these PCL-responsive gene sets revealed largely different networks in early, mature, and advanced lesions. In early lesions, PPARG was identified as a specific master regulator of the PCL-responsive atherosclerosis TF-regulatory network, whereas in mature and advanced lesions, the specific master regulators were MLL5 and SRSF10/XRN2, respectively. In a THP-1 foam cell model of atherosclerosis regression, siRNA targeting of these master regulators activated the time-point-specific TF-regulatory networks and altered the accumulation of cholesterol esters. We conclude that PCL leads to complete atherosclerosis regression only in mice with early lesions. Identified master regulators and related PCL-responsive TF-regulatory networks will be interesting targets to enhance PCL-mediated regression of mature and advanced atherosclerotic lesions.
Subject(s)
Aorta/metabolism , Atherosclerosis/blood , Cholesterol/blood , Receptors, LDL/genetics , Animals , Aorta/drug effects , Apolipoproteins B/genetics , Atherosclerosis/drug therapy , Atherosclerosis/pathology , Gene Expression Regulation/drug effects , Histone-Lysine N-Methyltransferase/biosynthesis , Humans , Hydroxymethylglutaryl-CoA Reductase Inhibitors/administration & dosage , Mice , Mice, Transgenic , Nuclear Proteins/biosynthesis , Ribonucleoproteins/biosynthesis , Serine-Arginine Splicing FactorsABSTRACT
Gene expression profiling studies are usually performed on pooled samples grown under tightly controlled experimental conditions to suppress variability among individuals and increase experimental reproducibility. In addition, to mask unwanted residual effects, the samples are often subjected to relatively harsh treatments that are unrealistic in a natural context. Here, we show that expression variations among individual wild-type Arabidopsis thaliana plants grown under the same macroscopic growth conditions contain as much information on the underlying gene network structure as expression profiles of pooled plant samples under controlled experimental perturbations. We advocate the use of subtle uncontrolled variations in gene expression between individuals to uncover functional links between genes and unravel regulatory influences. As a case study, we use this approach to identify ILL6 as a new regulatory component of the jasmonate response pathway.
Subject(s)
Arabidopsis/genetics , Gene Expression Regulation, Plant , Genes, Plant/genetics , Arabidopsis/drug effects , Cyclopentanes/pharmacology , Gene Expression Regulation, Plant/drug effects , Gene Regulatory Networks/genetics , Molecular Sequence Annotation , Oxylipins/pharmacology , Signal Transduction/drug effects , Signal Transduction/genetics , SoftwareABSTRACT
Module network inference is an established statistical method to reconstruct co-expression modules and their upstream regulatory programs from integrated multi-omics datasets measuring the activity levels of various cellular components across different individuals, experimental conditions or time points of a dynamic process. We have developed Lemon-Tree, an open-source, platform-independent, modular, extensible software package implementing state-of-the-art ensemble methods for module network inference. We benchmarked Lemon-Tree using large-scale tumor datasets and showed that Lemon-Tree algorithms compare favorably with state-of-the-art module network inference software. We also analyzed a large dataset of somatic copy-number alterations and gene expression levels measured in glioblastoma samples from The Cancer Genome Atlas and found that Lemon-Tree correctly identifies known glioblastoma oncogenes and tumor suppressors as master regulators in the inferred module network. Novel candidate driver genes predicted by Lemon-Tree were validated using tumor pathway and survival analyses. Lemon-Tree is available from http://lemon-tree.googlecode.com under the GNU General Public License version 2.0.
Subject(s)
Computational Biology/methods , Internet , Software , Cluster Analysis , Databases, Genetic , Gene Expression Profiling , Glioblastoma/genetics , Glioblastoma/metabolism , Glioblastoma/mortality , Humans , Kaplan-Meier Estimate , Signal Transduction/geneticsABSTRACT
OBJECTIVE: Using a multi-tissue, genome-wide gene expression approach, we recently identified a gene module linked to the extent of human atherosclerosis. This atherosclerosis module was enriched with inherited risk for coronary and carotid artery disease (CAD) and overlapped with genes in the transendothelial migration of leukocyte (TEML) pathway. Among the atherosclerosis module genes, the transcription cofactor Lim domain binding 2 (LDB2) was the most connected in a CAD vascular wall regulatory gene network. Here, we used human genomics and atherosclerosis-prone mice to evaluate the possible role of LDB2 in TEML and atherosclerosis. APPROACH AND RESULTS: mRNA profiles generated from blood macrophages in patients with CAD were used to infer transcription factor regulatory gene networks; Ldlr(-/-)Apob(100/100) mice were used to study the effects of Ldb2 deficiency on TEML activity and atherogenesis. LDB2 was the most connected gene in a transcription factor regulatory network inferred from TEML and atherosclerosis module genes in CAD macrophages. In Ldlr(-/-)Apob(100/100) mice, loss of Ldb2 increased atherosclerotic lesion size ≈2-fold and decreased plaque stability. The exacerbated atherosclerosis was caused by increased TEML activity, as demonstrated in air-pouch and retinal vasculature models in vivo, by ex vivo perfusion of primary leukocytes, and by leukocyte migration in vitro. In THP1 cells, migration was increased by overexpression and decreased by small interfering RNA inhibition of LDB2. A functional LDB2 variant (rs10939673) was associated with the risk and extent of CAD across several cohorts. CONCLUSIONS: As a key driver of the TEML pathway in CAD macrophages, LDB2 is a novel candidate to target CAD by inhibiting the overall activity of TEML.
Subject(s)
Atherosclerosis/physiopathology , Carotid Artery Diseases/pathology , Chemotaxis, Leukocyte/physiology , Coronary Artery Disease/pathology , LIM Domain Proteins/physiology , Transcription Factors/physiology , Transendothelial and Transepithelial Migration/physiology , Animals , Apolipoprotein B-100/genetics , Carotid Artery Diseases/genetics , Cell Line, Tumor , Chemokine CCL2/pharmacology , Coronary Artery Disease/genetics , Gene Expression Profiling , Gene Expression Regulation , Genome-Wide Association Study , Humans , LIM Domain Proteins/deficiency , LIM Domain Proteins/genetics , Macrophages/metabolism , Mice , Mice, Knockout , RNA, Messenger/biosynthesis , Transcription Factors/deficiency , Transcription Factors/genetics , Transendothelial and Transepithelial Migration/geneticsABSTRACT
BACKGROUND: The Kruskal-Wallis test is a popular non-parametric statistical test for identifying expression quantitative trait loci (eQTLs) from genome-wide data due to its robustness against variations in the underlying genetic model and expression trait distribution, but testing billions of marker-trait combinations one-by-one can become computationally prohibitive. RESULTS: We developed kruX, an algorithm implemented in Matlab, Python and R that uses matrix multiplications to simultaneously calculate the Kruskal-Wallis test statistic for several millions of marker-trait combinations at once. KruX is more than ten thousand times faster than computing associations one-by-one on a typical human dataset. We used kruX and a dataset of more than 500k SNPs and 20k expression traits measured in 102 human blood samples to compare eQTLs detected by the Kruskal-Wallis test to eQTLs detected by the parametric ANOVA and linear model methods. We found that the Kruskal-Wallis test is more robust against data outliers and heterogeneous genotype group sizes and detects a higher proportion of non-linear associations, but is more conservative for calling additive linear associations. CONCLUSION: kruX enables the use of robust non-parametric methods for massive eQTL mapping without the need for a high-performance computing infrastructure and is freely available from http://krux.googlecode.com.
Subject(s)
Computational Biology/methods , Quantitative Trait Loci/genetics , Software , Algorithms , Genome/genetics , Genotype , Humans , Polymorphism, Single Nucleotide/genetics , Reproducibility of Results , Statistics, NonparametricABSTRACT
Genome-scale metabolic models are powerful tools for understanding cellular physiology. Flux balance analysis (FBA), in particular, is an optimization-based approach widely employed for predicting metabolic phenotypes. In model microbes such as Escherichia coli, FBA has been successful at predicting essential genes, i.e. those genes that impair survival when deleted. A central assumption in this approach is that both wild type and deletion strains optimize the same fitness objective. Although the optimality assumption may hold for the wild type metabolic network, deletion strains are not subject to the same evolutionary pressures and knock-out mutants may steer their metabolism to meet other objectives for survival. Here, we present FlowGAT, a hybrid FBA-machine learning strategy for predicting essentiality directly from wild type metabolic phenotypes. The approach is based on graph-structured representation of metabolic fluxes predicted by FBA, where nodes correspond to enzymatic reactions and edges quantify the propagation of metabolite mass flow between a reaction and its neighbours. We integrate this information into a graph neural network that can be trained on knock-out fitness assay data. Comparisons across different model architectures reveal that FlowGAT predictions for E. coli are close to those of FBA for several growth conditions. This suggests that essentiality of enzymatic genes can be predicted by exploiting the inherent network structure of metabolism. Our approach demonstrates the benefits of combining the mechanistic insights afforded by genome-scale models with the ability of deep learning to infer patterns from complex datasets.
Subject(s)
Escherichia coli , Machine Learning , Escherichia coli/genetics , Neural Networks, Computer , PhenotypeABSTRACT
Multivariate Mendelian randomization (MVMR) is a statistical technique that uses sets of genetic instruments to estimate the direct causal effects of multiple exposures on an outcome of interest. At genomic loci with pleiotropic gene regulatory effects, that is, loci where the same genetic variants are associated to multiple nearby genes, MVMR can potentially be used to predict candidate causal genes. However, consensus in the field dictates that the genetic instruments in MVMR must be independent (not in linkage disequilibrium, which is usually not possible when considering a group of candidate genes from the same locus. Here we used causal inference theory to show that MVMR with correlated instruments satisfies the instrumental set condition. This is a classical result by Brito and Pearl (2002) for structural equation models that guarantees the identifiability of individual causal effects in situations where multiple exposures collectively, but not individually, separate a set of instrumental variables from an outcome variable. Extensive simulations confirmed the validity and usefulness of these theoretical results. Importantly, the causal effect estimates remained unbiased and their variance small even when instruments are highly correlated, while bias introduced by horizontal pleiotropy or LD matrix sampling error was comparable to standard MR. We applied MVMR with correlated instrumental variable sets at genome-wide significant loci for coronary artery disease (CAD) risk using expression Quantitative Trait Loci (eQTL) data from seven vascular and metabolic tissues in the STARNET study. Our method predicts causal genes at twelve loci, each associated with multiple colocated genes in multiple tissues. We confirm causal roles for PHACTR 1 and ADAMTS 7 in arterial tissues, among others. However, the extensive degree of regulatory pleiotropy across tissues and the limited number of causal variants in each locus still require that MVMR is run on a tissue-by-tissue basis, and testing all gene-tissue pairs with cis-eQTL associations at a given locus in a single model to predict causal gene-tissue combinations remains infeasible. Our results show that within tissues, MVMR with dependent, as opposed to independent, sets of instrumental variables significantly expands the scope for predicting causal genes in disease risk loci with pleiotropic regulatory effects. However, considering risk loci with regulatory pleiotropy that also spans across tissues remains an unsolved problem.
ABSTRACT
MOTIVATION: Transcriptional regulatory network inference methods have been studied for years. Most of them rely on complex mathematical and algorithmic concepts, making them hard to adapt, re-implement or integrate with other methods. To address this problem, we introduce a novel method based on a minimal statistical model for observing transcriptional regulatory interactions in noisy expression data, which is conceptually simple, easy to implement and integrate in any statistical software environment and equally well performing as existing methods. RESULTS: We developed a method to infer regulatory interactions based on a model where transcription factors (TFs) and their targets are both differentially expressed in a gene-specific, critical sample contrast, as measured by repeated two-way t-tests. Benchmarking on standard Escherichia coli and yeast reference datasets showed that this method performs equally well as the best existing methods. Analysis of the predicted interactions suggested that it works best to infer context-specific TF-target interactions which only co-express locally. We confirmed this hypothesis on a dataset of >1000 normal human tissue samples, where we found that our method predicts highly tissue-specific and functionally relevant interactions, whereas a global co-expression method only associates general TFs to non-specific biological processes. AVAILABILITY: A software tool called TwixTrix is available from http://twixtrix.googlecode.com. SUPPLEMENTARY INFORMATION: Supplementary Material is available from http://www.roslin.ed.ac.uk/tom-michoel/supplementary-data. CONTACT: tom.michoel@roslin.ed.ac.uk.
Subject(s)
Gene Expression Profiling , Gene Regulatory Networks , Models, Statistical , Algorithms , Data Interpretation, Statistical , Gene Expression Regulation , Humans , Software , Transcription Factors/metabolismABSTRACT
MOTIVATION: Probabilistic motif detection requires a multi-step approach going from the actual de novo regulatory motif finding up to a tedious assessment of the predicted motifs. MotifSuite, a user-friendly web interface streamlines this analysis flow. Its core consists of two post-processing procedures that allow prioritizing the motif detection output. The tools offered by MotifSuite are built around the well-established motif detection tool MotifSampler and can also be used in combination with any other probabilistic motif detection tool. Elaborate guidelines on each of its applications have been provided. AVAILABILITY: http://homes.esat.kuleuven.be/bioi_marchal/MotifSuite/Index.htm
Subject(s)
Protein Structure, Tertiary , Sequence Analysis, Protein/methods , Software , Algorithms , Computational Biology/methods , InternetABSTRACT
Post-transcriptional control of mRNA transcript processing by RNA binding proteins (RBPs) is an important step in the regulation of gene expression and protein production. The post-transcriptional regulatory network is similar in complexity to the transcriptional regulatory network and is thought to be organized in RNA regulons, coherent sets of functionally related mRNAs combinatorially regulated by common RBPs. We integrated genome-wide transcriptional and translational expression data in yeast with large-scale regulatory networks of transcription factor and RBP binding interactions to analyze the functional organization of post-transcriptional regulation and RNA regulons at a system level. We found that post-transcriptional feedback loops and mixed bifan motifs are overrepresented in the integrated regulatory network and control the coordinated translation of RNA regulons, manifested as clusters of functionally related mRNAs which are strongly coexpressed in the translatome data. These translatome clusters are more functionally coherent than transcriptome clusters and are expressed with higher mRNA and protein levels and less noise. Our results show how the post-transcriptional network is intertwined with the transcriptional network to regulate gene expression in a coordinated way and that the integration of heterogeneous genome-wide datasets allows to relate structure to function in regulatory networks at a system level.