ABSTRACT
Acute physical activity leads to several changes in metabolic, cardiovascular, and immune pathways. Although studies have examined selected changes in these pathways, the system-wide molecular response to an acute bout of exercise has not been fully characterized. We performed longitudinal multi-omic profiling of plasma and peripheral blood mononuclear cells including metabolome, lipidome, immunome, proteome, and transcriptome from 36 well-characterized volunteers, before and after a controlled bout of symptom-limited exercise. Time-series analysis revealed thousands of molecular changes and an orchestrated choreography of biological processes involving energy metabolism, oxidative stress, inflammation, tissue repair, and growth factor response, as well as regulatory pathways. Most of these processes were dampened and some were reversed in insulin-resistant participants. Finally, we discovered biological pathways involved in cardiopulmonary exercise response and developed prediction models revealing potential resting blood-based biomarkers of peak oxygen consumption.
Subject(s)
Energy Metabolism/physiology , Exercise/physiology , Aged , Biomarkers/metabolism , Female , Humans , Insulin/metabolism , Insulin Resistance , Leukocytes, Mononuclear/metabolism , Longitudinal Studies , Male , Metabolome , Middle Aged , Oxygen/metabolism , Oxygen Consumption , Proteome , TranscriptomeABSTRACT
Cancer genomes often harbor hundreds of somatic DNA rearrangement junctions, many of which cannot be easily classified into simple (e.g., deletion) or complex (e.g., chromothripsis) structural variant classes. Applying a novel genome graph computational paradigm to analyze the topology of junction copy number (JCN) across 2,778 tumor whole-genome sequences, we uncovered three novel complex rearrangement phenomena: pyrgo, rigma, and tyfonas. Pyrgo are "towers" of low-JCN duplications associated with early-replicating regions, superenhancers, and breast or ovarian cancers. Rigma comprise "chasms" of low-JCN deletions enriched in late-replicating fragile sites and gastrointestinal carcinomas. Tyfonas are "typhoons" of high-JCN junctions and fold-back inversions associated with expressed protein-coding fusions, breakend hypermutation, and acral, but not cutaneous, melanomas. Clustering of tumors according to genome graph-derived features identified subgroups associated with DNA repair defects and poor prognosis.
Subject(s)
Genomic Structural Variation/genetics , Genomics/methods , Neoplasms/genetics , Chromosome Inversion/genetics , Chromothripsis , DNA Copy Number Variations/genetics , Gene Rearrangement/genetics , Genome, Human/genetics , Humans , Mutation/genetics , Whole Genome Sequencing/methodsABSTRACT
It is a generally accepted model that environmental influences can exert their effects, at least in part, by changing the molecular regulators of transcription that are described as epigenetic. As there is biochemical evidence that some epigenetic regulators of transcription can maintain their states long term and through cell division, an epigenetic model encompasses the idea of maintenance of the effect of an exposure long after it is no longer present. The evidence supporting this model is mostly from the observation of alterations of molecular regulators of transcription following exposures. With the understanding that the interpretation of these associations is more complex than originally recognised, this model may be oversimplistic; therefore, adopting novel perspectives and experimental approaches when examining how environmental exposures are linked to phenotypes may prove worthwhile. In this review, we have chosen to use the example of nonalcoholic fatty liver disease (NAFLD), a common, complex human disease with strong environmental and genetic influences. We describe how epigenomic approaches combined with emerging functional genetic and single-cell genomic techniques are poised to generate new insights into the pathogenesis of environmentally influenced human disease phenotypes exemplified by NAFLD.
Subject(s)
Non-alcoholic Fatty Liver Disease , Humans , Non-alcoholic Fatty Liver Disease/genetics , Epigenesis, Genetic , Epigenomics , Environmental Exposure/adverse effects , PhenotypeABSTRACT
Modern population-scale biobanks contain simultaneous measurements of many phenotypes, providing unprecedented opportunity to study the relationship between biomarkers and disease. However, inferring causal effects from observational data is notoriously challenging. Mendelian randomization (MR) has recently received increased attention as a class of methods for estimating causal effects using genetic associations. However, standard methods result in pervasive false positives when two traits share a heritable, unobserved common cause. This is the problem of correlated pleiotropy. Here, we introduce a flexible framework for simulating traits with a common genetic confounder that generalizes recently proposed models, as well as a simple approach we call Welch-weighted Egger regression (WWER) for estimating causal effects. We show in comprehensive simulations that our method substantially reduces false positives due to correlated pleiotropy while being fast enough to apply to hundreds of phenotypes. We apply our method first to a subset of the UK Biobank consisting of blood traits and inflammatory disease, and then to a broader set of 411 heritable phenotypes. We detect many effects with strong literature support, as well as numerous behavioral effects that appear to stem from physician advice given to people at high risk for disease. We conclude that WWER is a powerful tool for exploratory data analysis in ever-growing databases of genotypes and phenotypes.
Subject(s)
False Positive Reactions , Genetic Pleiotropy , Mendelian Randomization Analysis/methods , Models, Genetic , Regression Analysis , Computer Simulation , Female , Humans , Inflammation/blood , Inflammation/genetics , Male , Mendelian Randomization Analysis/standards , Phenotype , Polymorphism, Single NucleotideABSTRACT
Complex traits and diseases can be influenced by both genetics and environment. However, given the large number of environmental stimuli and power challenges for gene-by-environment testing, it remains a critical challenge to identify and prioritize specific disease-relevant environmental exposures. We propose a framework for leveraging signals from transcriptional responses to environmental perturbations to identify disease-relevant perturbations that can modulate genetic risk for complex traits and inform the functions of genetic variants associated with complex traits. We perturbed human skeletal-muscle-, fat-, and liver-relevant cell lines with 21 perturbations affecting insulin resistance, glucose homeostasis, and metabolic regulation in humans and identified thousands of environmentally responsive genes. By combining these data with GWASs from 31 distinct polygenic traits, we show that the heritability of multiple traits is enriched in regions surrounding genes responsive to specific perturbations and, further, that environmentally responsive genes are enriched for associations with specific diseases and phenotypes from the GWAS Catalog. Overall, we demonstrate the advantages of large-scale characterization of transcriptional changes in diversely stimulated and pathologically relevant cells to identify disease-relevant perturbations.
Subject(s)
Gene-Environment Interaction , Genetic Predisposition to Disease , Genome-Wide Association Study , Multifactorial Inheritance , Polymorphism, Single Nucleotide , Quantitative Trait Loci , Autoimmune Diseases/etiology , Autoimmune Diseases/pathology , Humans , Mental Disorders/etiology , Mental Disorders/pathology , Metabolic Diseases/etiology , Metabolic Diseases/pathology , PhenotypeABSTRACT
MOTIVATION: Linkage disequilibrium (LD) matrices derived from large populations are widely used in population genetics in fine-mapping, LD score regression, and linear mixed models for Genome-wide Association Studies (GWAS). However, these matrices can reach large sizes when they are derived from millions of individuals; hence, moving, sharing and extracting granular information from this large amount of data can be cumbersome. RESULTS: We sought to address the need for compressing and easily querying large LD matrices by developing LDmat. LDmat is a standalone tool to compress large LD matrices in an HDF5 file format and query these compressed matrices. It can extract submatrices corresponding to a sub-region of the genome, a list of select loci, and loci within a minor allele frequency range. LDmat can also rebuild the original file formats from the compressed files. AVAILABILITY AND IMPLEMENTATION: LDmat is implemented in python, and can be installed on Unix systems with the command 'pip install ldmat'. It can also be accessed through https://github.com/G2Lab/ldmat and https://pypi.org/project/ldmat/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Subject(s)
Data Compression , Software , Humans , Linkage Disequilibrium , Genome-Wide Association Study , GenomeABSTRACT
Multi-task deep learning (DL) models can accurately predict diverse genomic marks from sequence, but whether these models learn the causal relationships between genomic marks is unknown. Here, we describe Deep Mendelian Randomization (DeepMR), a method for estimating causal relationships between genomic marks learned by genomic DL models. By combining Mendelian randomization with in silico mutagenesis, DeepMR obtains local (locus specific) and global estimates of (an assumed) linear causal relationship between marks. In a simulation designed to test recovery of pairwise causal relations between transcription factors (TFs), DeepMR gives accurate and unbiased estimates of the 'true' global causal effect, but its coverage decays in the presence of sequence-dependent confounding. We then apply DeepMR to examine the global relationships learned by a state-of-the-art DL model, BPNet, between TFs involved in reprogramming. DeepMR's causal effect estimates validate previously hypothesized relationships between TFs and suggest new relationships for future investigation.
Subject(s)
Deep Learning , Mendelian Randomization Analysis , Mendelian Randomization Analysis/methods , Causality , Research Design , GenomicsABSTRACT
Amyotrophic lateral sclerosis (ALS) is a rapidly progressing neurodegenerative disease that is characterized by motor neuron loss and that leads to paralysis and death 2-5 years after disease onset. Nearly all patients with ALS have aggregates of the RNA-binding protein TDP-43 in their brains and spinal cords, and rare mutations in the gene encoding TDP-43 can cause ALS. There are no effective TDP-43-directed therapies for ALS or related TDP-43 proteinopathies, such as frontotemporal dementia. Antisense oligonucleotides (ASOs) and RNA-interference approaches are emerging as attractive therapeutic strategies in neurological diseases. Indeed, treatment of a rat model of inherited ALS (caused by a mutation in Sod1) with ASOs against Sod1 has been shown to substantially slow disease progression. However, as SOD1 mutations account for only around 2-5% of ALS cases, additional therapeutic strategies are needed. Silencing TDP-43 itself is probably not appropriate, given its critical cellular functions. Here we present a promising alternative therapeutic strategy for ALS that involves targeting ataxin-2. A decrease in ataxin-2 suppresses TDP-43 toxicity in yeast and flies, and intermediate-length polyglutamine expansions in the ataxin-2 gene increase risk of ALS. We used two independent approaches to test whether decreasing ataxin-2 levels could mitigate disease in a mouse model of TDP-43 proteinopathy. First, we crossed ataxin-2 knockout mice with TDP-43 (also known as TARDBP) transgenic mice. The decrease in ataxin-2 reduced aggregation of TDP-43, markedly increased survival and improved motor function. Second, in a more therapeutically applicable approach, we administered ASOs targeting ataxin-2 to the central nervous system of TDP-43 transgenic mice. This single treatment markedly extended survival. Because TDP-43 aggregation is a component of nearly all cases of ALS, targeting ataxin-2 could represent a broadly effective therapeutic strategy.
Subject(s)
Amyotrophic Lateral Sclerosis/genetics , Amyotrophic Lateral Sclerosis/therapy , Ataxin-2/deficiency , DNA-Binding Proteins/metabolism , Longevity , Oligonucleotides, Antisense/therapeutic use , Protein Aggregation, Pathological/therapy , Amyotrophic Lateral Sclerosis/metabolism , Amyotrophic Lateral Sclerosis/physiopathology , Animals , Ataxin-2/genetics , Central Nervous System/metabolism , Cytoplasmic Granules/metabolism , DNA-Binding Proteins/chemistry , DNA-Binding Proteins/genetics , Disease Progression , Female , Gene Knockdown Techniques , Humans , Male , Mice , Mice, Knockout , Mice, Transgenic , Motor Skills/physiology , Oligonucleotides, Antisense/administration & dosage , Oligonucleotides, Antisense/genetics , Protein Aggregation, Pathological/genetics , Stress, Physiological , Survival AnalysisABSTRACT
Previous studies have prioritized trait-relevant cell types by looking for an enrichment of genome-wide association study (GWAS) signal within functional regions. However, these studies are limited in cell resolution by the lack of functional annotations from difficult-to-characterize or rare cell populations. Measurement of single-cell gene expression has become a popular method for characterizing novel cell types, and yet limited work has linked single-cell RNA sequencing (RNA-seq) to phenotypes of interest. To address this deficiency, we present RolyPoly, a regression-based polygenic model that can prioritize trait-relevant cell types and genes from GWAS summary statistics and gene expression data. RolyPoly is designed to use expression data from either bulk tissue or single-cell RNA-seq. In this study, we demonstrated RolyPoly's accuracy through simulation and validated previously known tissue-trait associations. We discovered a significant association between microglia and late-onset Alzheimer disease and an association between schizophrenia and oligodendrocytes and replicating fetal cortical cells. Additionally, RolyPoly computes a trait-relevance score for each gene to reflect the importance of expression specific to a cell type. We found that differentially expressed genes in the prefrontal cortex of individuals with Alzheimer disease were significantly enriched with genes ranked highly by RolyPoly gene scores. Overall, our method represents a powerful framework for understanding the effect of common variants on cell types contributing to complex traits.
Subject(s)
Alzheimer Disease/genetics , Microglia/metabolism , Oligodendroglia/metabolism , Schizophrenia/genetics , Single-Cell Analysis/statistics & numerical data , Software , Alzheimer Disease/diagnosis , Alzheimer Disease/pathology , Computer Simulation , Fetus , Genome-Wide Association Study , Humans , Microglia/pathology , Models, Genetic , Oligodendroglia/pathology , Prefrontal Cortex/metabolism , Prefrontal Cortex/pathology , Quantitative Trait Loci , Schizophrenia/diagnosis , Schizophrenia/pathology , Single-Cell Analysis/methods , TranscriptomeABSTRACT
Identifying interactions between genetics and the environment (GxE) remains challenging. We have developed EAGLE, a hierarchical Bayesian model for identifying GxE interactions based on associations between environmental variables and allele-specific expression. Combining whole-blood RNA-seq with extensive environmental annotations collected from 922 human individuals, we identified 35 GxE interactions, compared with only four using standard GxE interaction testing. EAGLE provides new opportunities for researchers to identify GxE interactions using functional genomic data.
Subject(s)
Alleles , Epigenesis, Genetic , Gene Expression Regulation , Genetic Variation , Adult , Cohort Studies , Female , Humans , Male , Models, Genetic , Quantitative Trait LociABSTRACT
Drug screening studies typically involve assaying the sensitivity of a range of cancer cell lines across an array of anti-cancer therapeutics. Alongside these sensitivity measurements high dimensional molecular characterizations of the cell lines are typically available, including gene expression, copy number variation and genomic mutations. We propose a sparse multitask regression model which learns discriminative latent characteristics that predict drug sensitivity and are associated with specific molecular features. We use ideas from Bayesian nonparametrics to automatically infer the appropriate number of these latent characteristics. The resulting analysis couples high predictive performance with interpretability since each latent characteristic involves a typically small set of drugs, cell lines and genomic features. Our model uncovers a number of drug-gene sensitivity associations missed by single gene analyses. We functionally validate one such novel association: that increased expression of the cell-cycle regulator C/EBPδ decreases sensitivity to the histone deacetylase (HDAC) inhibitor panobinostat.
Subject(s)
Forecasting/methods , Neoplasms/genetics , Antineoplastic Agents/pharmacology , Bayes Theorem , Biomarkers, Pharmacological , CCAAT-Enhancer-Binding Protein-delta/genetics , Cell Line, Tumor , DNA Copy Number Variations , Genome , Genomics , Histone Deacetylase Inhibitors/pharmacology , Humans , Neoplasms/drug therapy , Panobinostat/pharmacology , Regression Analysis , Statistics, NonparametricABSTRACT
Methods for multiple-testing correction in local expression quantitative trait locus (cis-eQTL) studies are a trade-off between statistical power and computational efficiency. Bonferroni correction, though computationally trivial, is overly conservative and fails to account for linkage disequilibrium between variants. Permutation-based methods are more powerful, though computationally far more intensive. We present an alternative correction method called eigenMT, which runs over 500 times faster than permutations and has adjusted p values that closely approximate empirical ones. To achieve this speed while also maintaining the accuracy of permutation-based methods, we estimate the effective number of independent variants tested for association with a particular gene, termed Meff, by using the eigenvalue decomposition of the genotype correlation matrix. We employ a regularized estimator of the correlation matrix to ensure Meff is robust and yields adjusted p values that closely approximate p values from permutations. Finally, using a common genotype matrix, we show that eigenMT can be applied with even greater efficiency to studies across tissues or conditions. Our method provides a simpler, more efficient approach to multiple-testing correction than existing methods and fits within existing pipelines for eQTL discovery.
Subject(s)
Linkage Disequilibrium , Quantitative Trait Loci , HumansABSTRACT
The X Chromosome, with its unique mode of inheritance, contributes to differences between the sexes at a molecular level, including sex-specific gene expression and sex-specific impact of genetic variation. Improving our understanding of these differences offers to elucidate the molecular mechanisms underlying sex-specific traits and diseases. However, to date, most studies have either ignored the X Chromosome or had insufficient power to test for the sex-specific impact of genetic variation. By analyzing whole blood transcriptomes of 922 individuals, we have conducted the first large-scale, genome-wide analysis of the impact of both sex and genetic variation on patterns of gene expression, including comparison between the X Chromosome and autosomes. We identified a depletion of expression quantitative trait loci (eQTL) on the X Chromosome, especially among genes under high selective constraint. In contrast, we discovered an enrichment of sex-specific regulatory variants on the X Chromosome. To resolve the molecular mechanisms underlying such effects, we generated chromatin accessibility data through ATAC-sequencing to connect sex-specific chromatin accessibility to sex-specific patterns of expression and regulatory variation. As sex-specific regulatory variants discovered in our study can inform sex differences in heritable disease prevalence, we integrated our data with genome-wide association study data for multiple immune traits identifying several traits with significant sex biases in genetic susceptibilities. Together, our study provides genome-wide insight into how genetic variation, the X Chromosome, and sex shape human gene regulation and disease.
Subject(s)
Chromosomes, Human, X/genetics , Transcriptome , Female , Gene Expression Profiling , Gene Expression Regulation , Genetic Predisposition to Disease , Genome, Human , Humans , Male , Polymorphism, Single Nucleotide , Quantitative Trait Loci , Sex CharacteristicsABSTRACT
Recent and rapid human population growth has led to an excess of rare genetic variants that are expected to contribute to an individual's genetic burden of disease risk. To date, much of the focus has been on rare protein-coding variants, for which potential impact can be estimated from the genetic code, but determining the impact of rare noncoding variants has been more challenging. To improve our understanding of such variants, we combined high-quality genome sequencing and RNA sequencing data from a 17-individual, three-generation family to contrast expression quantitative trait loci (eQTLs) and splicing quantitative trait loci (sQTLs) within this family to eQTLs and sQTLs within a population sample. Using this design, we found that eQTLs and sQTLs with large effects in the family were enriched with rare regulatory and splicing variants (minor allele frequency < 0.01). They were also more likely to influence essential genes and genes involved in complex disease. In addition, we tested the capacity of diverse noncoding annotation to predict the impact of rare noncoding variants. We found that distance to the transcription start site, evolutionary constraint, and epigenetic annotation were considerably more informative for predicting the impact of rare variants than for predicting the impact of common variants. These results highlight that rare noncoding variants are important contributors to individual gene-expression profiles and further demonstrate a significant capability for genomic annotation to predict the impact of rare noncoding variants.
Subject(s)
Genome, Human , Polymorphism, Single Nucleotide/genetics , Quantitative Trait Loci , RNA, Untranslated/genetics , Sequence Analysis, RNA , Transcriptome , Family , Haplotypes/genetics , High-Throughput Nucleotide Sequencing , Humans , Lymphocytes/metabolism , White People/geneticsABSTRACT
Personal exome and genome sequencing provides access to loss-of-function and rare deleterious alleles whose interpretation is expected to provide insight into individual disease burden. However, for each allele, accurate interpretation of its effect will depend on both its penetrance and the trait's expressivity. In this regard, an important factor that can modify the effect of a pathogenic coding allele is its level of expression; a factor which itself characteristically changes across tissues. To better inform the degree to which pathogenic alleles can be modified by expression level across multiple tissues, we have conducted exome, RNA and deep, targeted allele-specific expression (ASE) sequencing in ten tissues obtained from a single individual. By combining such data, we report the impact of rare and common loss-of-function variants on allelic expression exposing stronger allelic bias for rare stop-gain variants and informing the extent to which rare deleterious coding alleles are consistently expressed across tissues. This study demonstrates the potential importance of transcriptome data to the interpretation of pathogenic protein-coding variants.
Subject(s)
Alleles , Proteins/genetics , Exome , Humans , Polymerase Chain ReactionABSTRACT
Transcriptome engineering applications in living cells with RNA-targeting CRISPR effectors depend on accurate prediction of on-target activity and off-target avoidance. Here we design and test ~200,000 RfxCas13d guide RNAs targeting essential genes in human cells with systematically designed mismatches and insertions and deletions (indels). We find that mismatches and indels have a position- and context-dependent impact on Cas13d activity, and mismatches that result in G-U wobble pairings are better tolerated than other single-base mismatches. Using this large-scale dataset, we train a convolutional neural network that we term targeted inhibition of gene expression via gRNA design (TIGER) to predict efficacy from guide sequence and context. TIGER outperforms the existing models at predicting on-target and off-target activity on our dataset and published datasets. We show that TIGER scoring combined with specific mismatches yields the first general framework to modulate transcript expression, enabling the use of RNA-targeting CRISPRs to precisely control gene dosage.
Subject(s)
Deep Learning , RNA, Guide, CRISPR-Cas Systems , Humans , CRISPR-Cas Systems/genetics , Clustered Regularly Interspaced Short Palindromic Repeats , RNA , Gene EditingABSTRACT
Inference of directed biological networks is an important but notoriously challenging problem. We introduce inverse sparse regression (inspre), an approach to learning causal networks that leverages large-scale intervention-response data. Applied to 788 genes from the genome-wide perturb-seq dataset, inspre helps elucidate the network architecture of blood traits.
ABSTRACT
Dopamine neurotransmission in the striatum is central to many normal and disease functions. Ventral midbrain dopamine neurons exhibit ongoing tonic firing that produces low extrasynaptic levels of dopamine below the detection of conventional extrasynaptic cyclic voltammetry (â¼10-20 nanomolar), with superimposed bursts that can saturate the dopamine uptake transporter and produce transient micromolar concentrations. The bursts are known to lead to marked presynaptic plasticity via multiple mechanisms, but analysis methods for these kinetic parameters are limited. To provide a deeper understanding of the mechanics of the modulation of dopamine neurotransmission by physiological, genetic, and pharmacological means, we present three computational models of dopamine release with different levels of spatiotemporal complexity to analyze in vivo fast-scan cyclic voltammetry recordings from the dorsal striatum of mice. The models accurately fit to cyclic voltammetry data and provide estimates of presynaptic dopamine facilitation/depression kinetics and dopamine transporter reuptake kinetics, and we used the models to analyze the role of synuclein proteins in neurotransmission. The models' results support recent findings linking the presynaptic protein α-synuclein to the short-term facilitation and long-term depression of dopamine release, as well as reveal a new role for ß-synuclein and/or γ-synuclein in the long-term regulation of dopamine reuptake.
ABSTRACT
Characterizing cell-cell communication and tracking its variability over time is essential for understanding the coordination of biological processes mediating normal development, progression of disease, or responses to perturbations such as therapies. Existing tools lack the ability to capture time-dependent intercellular interactions, such as those influenced by therapy, and primarily rely on existing databases compiled from limited contexts. We present DIISCO, a Bayesian framework for characterizing the temporal dynamics of cellular interactions using single-cell RNA-sequencing data from multiple time points. Our method uses structured Gaussian process regression to unveil time-resolved interactions among diverse cell types according to their co-evolution and incorporates prior knowledge of receptor-ligand complexes. We show the interpretability of DIISCO in simulated data and new data collected from CAR-T cells co-cultured with lymphoma cells, demonstrating its potential to uncover dynamic cell-cell crosstalk.
ABSTRACT
Alternative splicing is an essential mechanism for diversifying proteins, in which mature RNA isoforms produce proteins with potentially distinct functions. Two major challenges in characterizing the cellular function of isoforms are the lack of experimental methods to specifically and efficiently modulate isoform expression and computational tools for complex experimental design. To address these gaps, we developed and methodically tested a strategy which pairs the RNA-targeting CRISPR/Cas13d system with guide RNAs that span exon-exon junctions in the mature RNA. We performed a high-throughput essentiality screen, quantitative RT-PCR assays, and PacBio long read sequencing to affirm our ability to specifically target and robustly knockdown individual RNA isoforms. In parallel, we provide computational tools for experimental design and screen analysis. Considering all possible splice junctions annotated in GENCODE for multi-isoform genes and our gRNA efficacy predictions, we estimate that our junction-centric strategy can uniquely target up to 89% of human RNA isoforms, including 50,066 protein-coding and 11,415 lncRNA isoforms. Importantly, this specificity spans all splicing and transcriptional events, including exon skipping and inclusion, alternative 5' and 3' splice sites, and alternative starts and ends.