RESUMEN
BACKGROUND: Modeling of gene regulatory networks (GRNs) is limited due to a lack of direct measurements of genome-wide transcription factor activity (TFA) making it difficult to separate covariance and regulatory interactions. Inference of regulatory interactions and TFA requires aggregation of complementary evidence. Estimating TFA explicitly is problematic as it disconnects GRN inference and TFA estimation and is unable to account for, for example, contextual transcription factor-transcription factor interactions, and other higher order features. Deep-learning offers a potential solution, as it can model complex interactions and higher-order latent features, although does not provide interpretable models and latent features. RESULTS: We propose a novel autoencoder-based framework, StrUcture Primed Inference of Regulation using latent Factor ACTivity (SupirFactor) for modeling, and a metric, explained relative variance (ERV), for interpretation of GRNs. We evaluate SupirFactor with ERV in a wide set of contexts. Compared to current state-of-the-art GRN inference methods, SupirFactor performs favorably. We evaluate latent feature activity as an estimate of TFA and biological function in S. cerevisiae as well as in peripheral blood mononuclear cells (PBMC). CONCLUSION: Here we present a framework for structure-primed inference and interpretation of GRNs, SupirFactor, demonstrating interpretability using ERV in multiple biological and experimental settings. SupirFactor enables TFA estimation and pathway analysis using latent factor activity, demonstrated here on two large-scale single-cell datasets, modeling S. cerevisiae and PBMC. We find that the SupirFactor model facilitates biological analysis acquiring novel functional and regulatory insight.
Asunto(s)
Redes Reguladoras de Genes , Saccharomyces cerevisiae , Saccharomyces cerevisiae/genética , Algoritmos , Leucocitos Mononucleares , Factores de Transcripción/genéticaRESUMEN
Cells respond to environmental and developmental stimuli by remodeling their transcriptomes through regulation of both mRNA transcription and mRNA decay. A central goal of biology is identifying the global set of regulatory relationships between factors that control mRNA production and degradation and their target transcripts and construct a predictive model of gene expression. Regulatory relationships are typically identified using transcriptome measurements and causal inference algorithms. RNA kinetic parameters are determined experimentally by employing run-on or metabolic labeling (e.g. 4-thiouracil) methods that allow transcription and decay rates to be separately measured. Here, we develop a deep learning model, trained with single-cell RNA-seq data, that both infers causal regulatory relationships and estimates RNA kinetic parameters. The resulting in silico model predicts future gene expression states and can be perturbed to simulate the effect of transcription factor changes. We acquired model training data by sequencing the transcriptomes of 175,000 individual Saccharomyces cerevisiae cells that were subject to an external perturbation and continuously sampled over a one hour period. The rate of change for each transcript was calculated on a per-cell basis to estimate RNA velocity. We then trained a deep learning model with transcriptome and RNA velocity data to calculate time-dependent estimates of mRNA production and decay rates. By separating RNA velocity into transcription and decay rates, we show that rapamycin treatment causes existing ribosomal protein transcripts to be rapidly destabilized, while production of new transcripts gradually slows over the course of an hour. The neural network framework we present is designed to explicitly model causal regulatory relationships between transcription factors and their genes, and shows superior performance to existing models on the basis of recovery of known regulatory relationships. We validated the predictive power of the model by perturbing transcription factors in silico and comparing transcriptome-wide effects with experimental data. Our study represents the first step in constructing a complete, predictive, biophysical model of gene expression regulation.
RESUMEN
The modeling of gene regulatory networks (GRNs) is limited due to a lack of direct measurements of regulatory features in genome-wide screens. Most GRN inference methods are therefore forced to model relationships between regulatory genes and their targets with expression as a proxy for the upstream independent features, complicating validation and predictions produced by modeling frameworks. Separating covariance and regulatory influence requires aggregation of independent and complementary sets of evidence, such as transcription factor (TF) binding and target gene expression. However, the complete regulatory state of the system, e.g. TF activity (TFA) is unknown due to a lack of experimental feasibility, making regulatory relations difficult to infer. Some methods attempt to account for this by modeling TFA as a latent feature, but these models often use linear frameworks that are unable to account for non-linearities such as saturation, TF-TF interactions, and other higher order features. Deep learning frameworks may offer a solution, as they are capable of modeling complex interactions and capturing higher-order latent features. However, these methods often discard central concepts in biological systems modeling, such as sparsity and latent feature interpretability, in favor of increased model complexity. We propose a novel deep learning autoencoder-based framework, StrUcture Primed Inference of Regulation using latent Factor ACTivity (SupirFactor), that scales to single cell genomic data and maintains interpretability to perform GRN inference and estimate TFA as a latent feature. We demonstrate that SupirFactor outperforms current leading GRN inference methods, predicts biologically relevant TFA and elucidates functional regulatory pathways through aggregation of TFs.
RESUMEN
MOTIVATION: Gene regulatory networks define regulatory relationships between transcription factors and target genes within a biological system, and reconstructing them is essential for understanding cellular growth and function. Methods for inferring and reconstructing networks from genomics data have evolved rapidly over the last decade in response to advances in sequencing technology and machine learning. The scale of data collection has increased dramatically; the largest genome-wide gene expression datasets have grown from thousands of measurements to millions of single cells, and new technologies are on the horizon to increase to tens of millions of cells and above. RESULTS: In this work, we present the Inferelator 3.0, which has been significantly updated to integrate data from distinct cell types to learn context-specific regulatory networks and aggregate them into a shared regulatory network, while retaining the functionality of the previous versions. The Inferelator is able to integrate the largest single-cell datasets and learn cell-type-specific gene regulatory networks. Compared to other network inference methods, the Inferelator learns new and informative Saccharomyces cerevisiae networks from single-cell gene expression data, measured by recovery of a known gold standard. We demonstrate its scaling capabilities by learning networks for multiple distinct neuronal and glial cell types in the developing Mus musculus brain at E18 from a large (1.3 million) single-cell gene expression dataset with paired single-cell chromatin accessibility data. AVAILABILITY AND IMPLEMENTATION: The inferelator software is available on GitHub (https://github.com/flatironinstitute/inferelator) under the MIT license and has been released as python packages with associated documentation (https://inferelator.readthedocs.io/). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Redes Reguladoras de Genes , Programas Informáticos , Animales , Ratones , Genómica , Genoma , CromatinaRESUMEN
While measurements of RNA expression have dominated the world of single-cell analyses, new single-cell techniques increasingly allow collection of different data modalities, measuring different molecules, structural connections, and intermolecular interactions. Integrating the resulting multimodal single-cell datasets is a new bioinformatics challenge. Equally important, it is a new experimental design challenge for the bench scientist, who is not only choosing from a myriad of techniques for each data modality but also faces new challenges in experimental design. The ultimate goal is to design, execute, and analyze multimodal single-cell experiments that are more than just descriptive but enable the learning of new causal and mechanistic biology. This objective requires strict consideration of the goals behind the analysis, which might range from mapping the heterogeneity of a cellular population to assembling system-wide causal networks that can further our understanding of cellular functions and eventually lead to models of tissues and organs. We review steps and challenges toward this goal. Single-cell transcriptomics is now a mature technology, and methods to measure proteins, lipids, small-molecule metabolites, and other molecular phenotypes at the single-cell level are rapidly developing. Integrating these single-cell readouts so that each cell has measurements of multiple types of data, e.g., transcriptomes, proteomes, and metabolomes, is expected to allow identification of highly specific cellular subpopulations and to provide the basis for inferring causal biological mechanisms.
Asunto(s)
Biología Computacional , Proyectos de Investigación , Análisis de la Célula Individual , Integración de Sistemas , Animales , Perfilación de la Expresión Génica , Humanos , Metabolómica , ProteómicaRESUMEN
The analysis of single-cell genomics data presents several statistical challenges, and extensive efforts have been made to produce methods for the analysis of this data that impute missing values, address sampling issues and quantify and correct for noise. In spite of such efforts, no consensus on best practices has been established and all current approaches vary substantially based on the available data and empirical tests. The k-Nearest Neighbor Graph (kNN-G) is often used to infer the identities of, and relationships between, cells and is the basis of many widely used dimensionality-reduction and projection methods. The kNN-G has also been the basis for imputation methods using, e.g., neighbor averaging and graph diffusion. However, due to the lack of an agreed-upon optimal objective function for choosing hyperparameters, these methods tend to oversmooth data, thereby resulting in a loss of information with regard to cell identity and the specific gene-to-gene patterns underlying regulatory mechanisms. In this paper, we investigate the tuning of kNN- and diffusion-based denoising methods with a novel non-stochastic method for optimally preserving biologically relevant informative variance in single-cell data. The framework, Denoising Expression data with a Weighted Affinity Kernel and Self-Supervision (DEWÄKSS), uses a self-supervised technique to tune its parameters. We demonstrate that denoising with optimal parameters selected by our objective function (i) is robust to preprocessing methods using data from established benchmarks, (ii) disentangles cellular identity and maintains robust clusters over dimension-reduction methods, (iii) maintains variance along several expression dimensions, unlike previous heuristic-based methods that tend to oversmooth data variance, and (iv) rarely involves diffusion but rather uses a fixed weighted kNN graph for denoising. Together, these findings provide a new understanding of kNN- and diffusion-based denoising methods. Code and example data for DEWÄKSS is available at https://gitlab.com/Xparx/dewakss/-/tree/Tjarnberg2020branch.
Asunto(s)
Algoritmos , Genómica/métodos , Análisis de la Célula Individual/métodos , Aprendizaje Automático Supervisado , Animales , Línea Celular , Bases de Datos Genéticas , Humanos , Ratones , RNA-Seq , Saccharomyces cerevisiaeAsunto(s)
Ciencia , Medios de Comunicación Sociales , Personas con Daño Visual , Animales , Chlorocebus aethiops , Humanos , Células VeroRESUMEN
Understanding how gene expression programs are controlled requires identifying regulatory relationships between transcription factors and target genes. Gene regulatory networks are typically constructed from gene expression data acquired following genetic perturbation or environmental stimulus. Single-cell RNA sequencing (scRNAseq) captures the gene expression state of thousands of individual cells in a single experiment, offering advantages in combinatorial experimental design, large numbers of independent measurements, and accessing the interaction between the cell cycle and environmental responses that is hidden by population-level analysis of gene expression. To leverage these advantages, we developed a method for scRNAseq in budding yeast (Saccharomyces cerevisiae). We pooled diverse transcriptionally barcoded gene deletion mutants in 11 different environmental conditions and determined their expression state by sequencing 38,285 individual cells. We benchmarked a framework for learning gene regulatory networks from scRNAseq data that incorporates multitask learning and constructed a global gene regulatory network comprising 12,228 interactions.
Organisms switch their genes on and off to adapt to changing environments. This takes place thanks to complex networks of regulators that control which genes are actively 'read' by the cell to create the RNA molecules that are needed at the time. Piecing together these networks is key to fully understand the inner workings of living organisms, and how to potentially modify or artificially create them. Single-cell RNA sequencing is a powerful new tool that can measure which genes are turned on (or 'expressed') in an individual cell. Datasets with millions of gene expression profiles for individual cells now exist for organisms such as mice or humans. Yet, it is difficult to use these data to reconstruct networks of regulators; this is partly because scientists are not sure if the computational methods normally used to build these networks also work for single-cell RNA sequencing data. One way to check if this is the case is to use the methods on single-cell datasets from organisms where the networks of regulators are already known, and check whether the computational tools help to reach the same conclusion. Unfortunately, the regulatory networks in the organisms for which scientists have a lot of single-cell RNA sequencing data are still poorly known. There are living beings in which the networks are well characterised such as yeast but it has been difficult to do single-cell sequencing in them at the scale seen in other organisms. Jackson, Castro et al. first adapted a system for single-cell sequencing so that it would work in yeast. This generated a gene expression dataset of over 40,000 yeast cells. They then used a computational method (called the Inferelator) on these data to construct networks of regulators, and the results showed that the method performed well. This allowed Jackson, Castro et al. to start mapping how different networks connect, for example those that control the response to the environment and cell division. This is one of the benefits of single-cell RNA methods: cell division for example is not a process that can be examined at the level of a population, since the cells may all be at different life stages. In the future, the dataset will also be useful to scientists to benchmark a variety of single cell computational tools.
Asunto(s)
Código de Barras del ADN Taxonómico , Redes Reguladoras de Genes , Genotipo , Saccharomyces cerevisiae/genética , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Eliminación de Gen , Regulación Fúngica de la Expresión Génica , Genes Fúngicos , Saccharomyces cerevisiae/metabolismo , Factores de Transcripción/metabolismoRESUMEN
Protein arginine methylation is an important means by which protein function can be regulated. In the budding yeast, this modification is catalyzed by the major protein arginine methyltransferase Hmt1. Here, we provide evidence that the Hmt1-mediated methylation of Rpc31, a subunit of RNA polymerase III, plays context-dependent roles in tRNA gene transcription: under conditions optimal for growth, it positively regulates tRNA gene transcription, and in the setting of stress, it promotes robust transcriptional repression. In the context of stress, methylation of Rpc31 allows for its optimal interaction with RNA polymerase III global repressor Maf1. Interestingly, mammalian Hmt1 homologue is able to methylate one of Rpc31's human homologue, RPC32ß, but not its paralogue, RPC32α. Our data led us to propose an efficient model whereby protein arginine methylation facilitates metabolic economy and coordinates protein-synthetic capacity.
Asunto(s)
Arginina/metabolismo , ARN de Transferencia , Estrés Fisiológico/genética , Transcripción Genética , Secuencia de Aminoácidos , Regulación Fúngica de la Expresión Génica , Metilación , Mutación , Unión Proteica , Subunidades de Proteína/metabolismo , Proteína-Arginina N-Metiltransferasas/química , Proteína-Arginina N-Metiltransferasas/genética , Proteína-Arginina N-Metiltransferasas/metabolismo , ARN Polimerasa III/química , ARN Polimerasa III/genética , ARN Polimerasa III/metabolismo , Proteínas Represoras/química , Proteínas Represoras/genética , Proteínas Represoras/metabolismo , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/química , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismoRESUMEN
Protein arginine methylation occurs on spliceosomal components and spliceosome-associated proteins, but how this modification contributes to their function in pre-mRNA splicing remains sparse. Here we provide evidence that protein arginine methylation of the yeast SR-/hnRNP-like protein Npl3 plays a role in facilitating efficient splicing of the SUS1 intron that harbors a non-consensus 5' splice site and branch site. In yeast cells lacking the major protein arginine methyltransferase HMT1, we observed a change in the co-transcriptional recruitment of the U1 snRNP subunit Snp1 and Npl3 to pre-mRNAs harboring both consensus (ECM33 and ASC1) and non-consensus (SUS1) 5' splice site and branch site. Using an Npl3 mutant that phenocopies wild-type Npl3 when expressed in Δhmt1 cells, we showed that the arginine methylation of Npl3 is responsible for this. Examination of pre-mRNA splicing efficiency in these mutants reveals the requirement of Npl3 methylation for the efficient splicing of SUS1 intron 1, but not of ECM33 or ASC1. Changing the 5' splice site and branch site in SUS1 intron 1 to the consensus form restored splicing efficiency in an Hmt1-independent manner. Results from biochemical studies show that methylation of Npl3 promotes its optimal association with the U1 snRNP through its association with the U1 snRNP subunit Mud1. Based on these data, we propose a model in which Hmt1, via arginine methylation of Npl3, facilitates U1 snRNP engagement with the pre-mRNA to promote usage of non-consensus splice sites by the splicing machinery.
Asunto(s)
Intrones , Proteínas Nucleares/metabolismo , Empalme del ARN/fisiología , Proteínas de Unión al ARN/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/genética , Proteínas Adaptadoras Transductoras de Señales/genética , Proteínas Adaptadoras Transductoras de Señales/metabolismo , Arginina/genética , Arginina/metabolismo , Proteínas de Unión al GTP/genética , Proteínas de Unión al GTP/metabolismo , Proteínas de la Membrana/genética , Proteínas de la Membrana/metabolismo , Metilación , Proteínas Nucleares/genética , Proteína-Arginina N-Metiltransferasas/genética , Proteína-Arginina N-Metiltransferasas/metabolismo , Proteínas de Unión al ARN/genética , Proteínas Represoras/genética , Proteínas Represoras/metabolismo , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/genéticaRESUMEN
Protein arginine methylation regulates diverse functions of eukaryotic cells, including gene expression, the DNA damage response, and circadian rhythms. We showed that arginine residues within the third intracellular loop of the human D2 dopamine receptor, which are conserved in the DOP-3 receptor in the nematode Caenorhabditis elegans, were methylated by protein arginine methyltransferase 5 (PRMT5). By mutating these arginine residues, we further showed that their methylation enhanced the D2 receptor-mediated inhibition of cyclic adenosine monophosphate (cAMP) signaling in cultured human embryonic kidney (HEK) 293T cells. Analysis of prmt-5-deficient worms indicated that methylation promoted the dopamine-mediated modulation of chemosensory and locomotory behaviors in C. elegans through the DOP-3 receptor. In addition to delineating a previously uncharacterized means of regulating GPCR (heterotrimeric guanine nucleotide-binding protein-coupled receptor) signaling, these findings may lead to the development of a new class of pharmacological therapies that modulate GPCR signaling by changing the methylation status of these key proteins.
Asunto(s)
Proteína-Arginina N-Metiltransferasas/metabolismo , Receptores de Dopamina D2/metabolismo , Secuencias de Aminoácidos , Secuencia de Aminoácidos , Animales , Animales Modificados Genéticamente , Arginina/química , Caenorhabditis elegans/efectos de los fármacos , Caenorhabditis elegans/genética , Caenorhabditis elegans/metabolismo , Proteínas de Caenorhabditis elegans/química , Proteínas de Caenorhabditis elegans/genética , Proteínas de Caenorhabditis elegans/metabolismo , Biología Computacional , Secuencia Conservada , Dopamina/metabolismo , Dopamina/farmacología , Células HEK293 , Humanos , Locomoción/efectos de los fármacos , Locomoción/genética , Locomoción/fisiología , Metilación , Datos de Secuencia Molecular , Octanoles/farmacología , Odorantes , Proteína-Arginina N-Metiltransferasas/deficiencia , Proteína-Arginina N-Metiltransferasas/genética , Receptores de Dopamina D2/química , Receptores de Dopamina D2/genética , Receptores Acoplados a Proteínas G/química , Receptores Acoplados a Proteínas G/genética , Receptores Acoplados a Proteínas G/metabolismo , Homología de Secuencia de Aminoácido , Transducción de SeñalRESUMEN
Protein arginine methylation has emerged to be an important regulator of cellular protein functions. Techniques that uncover the presence of methylarginines on a protein provide an important step towards understanding the functional role of arginine methylation. Here, we describe several common methods used to detect the presence of protein arginine methylation in Saccharomyces cerevisiae.
Asunto(s)
Arginina/metabolismo , Biología Molecular/métodos , Proteínas Represoras/aislamiento & purificación , Secuencia de Aminoácidos , Metilación , Proteínas Represoras/genética , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismoRESUMEN
Signaling levels within sensory neurons must be tightly regulated to allow cells to integrate information from multiple signaling inputs and to respond to new stimuli. Herein we report a new role for the cGMP-dependent protein kinase EGL-4 in the negative regulation of G protein-coupled nociceptive chemosensory signaling. C. elegans lacking EGL-4 function are hypersensitive in their behavioral response to low concentrations of the bitter tastant quinine and exhibit an elevated calcium flux in the ASH sensory neurons in response to quinine. We provide the first direct evidence for cGMP/PKG function in ASH and propose that ODR-1, GCY-27, GCY-33 and GCY-34 act in a non-cell-autonomous manner to provide cGMP for EGL-4 function in ASH. Our data suggest that activated EGL-4 dampens quinine sensitivity via phosphorylation and activation of the regulator of G protein signaling (RGS) proteins RGS-2 and RGS-3, which in turn downregulate Gα signaling and behavioral sensitivity.
Asunto(s)
Conducta Animal/fisiología , Proteínas de Caenorhabditis elegans/genética , Caenorhabditis elegans/genética , Proteínas Quinasas Dependientes de GMP Cíclico/genética , GMP Cíclico/metabolismo , Animales , Caenorhabditis elegans/fisiología , Proteínas de Caenorhabditis elegans/metabolismo , Proteínas Quinasas Dependientes de GMP Cíclico/metabolismo , Subunidades alfa de la Proteína de Unión al GTP Gi-Go/genética , Subunidades alfa de la Proteína de Unión al GTP Gi-Go/metabolismo , Fosforilación , Proteínas RGS/genética , Proteínas RGS/metabolismo , Células Receptoras Sensoriales/metabolismo , Células Receptoras Sensoriales/fisiología , Transducción de Señal/genéticaRESUMEN
Protein arginine methylation is a PTM catalyzed by an evolutionarily conserved family of enzymes called protein arginine methyltransferases (PRMTs), with PRMT1 being the most conserved member of this enzyme family. This modification has emerged to be an important regulator of protein functions. To better understand the role of PRMTs in cellular pathways and functions, we have carried out a proteomic profiling experiment to comprehensively identify the physical interactors of Hmt1, the budding yeast homolog for human PRMT1. Using a dual-enzymatic digestion linear trap quadrupole/Orbitrap proteomic strategy, we identified a total of 108 proteins that specifically copurify with Hmt1 by tandem affinity purification. A reverse coimmunoprecipitation experiment was used to confirm Hmt1's physical association with Bre5, Mtr4, Snf2, Sum1, and Ssd1, five proteins that were identified as Hmt1-specific interactors in multiple biological replicates. To determine whether the identified Hmt1-interactors had the potential to act as an Hmt1 substrate, we used published bioinformatics algorithms that predict the presence and location of potential methylarginines for each identified interactor. One of the top hits from this analysis, Snf2, was experimentally confirmed as a robust substrate of Hmt1 in vitro. Overall, our data provide a feasible proteomic approach that aid in the better understanding of PRMT1's roles within a cell.
Asunto(s)
Mapeo de Interacción de Proteínas/métodos , Proteína-Arginina N-Metiltransferasas/metabolismo , Proteoma/metabolismo , Proteómica/métodos , Proteínas Represoras/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo , Adenosina Trifosfatasas/química , Adenosina Trifosfatasas/metabolismo , Secuencia de Aminoácidos , Arginina/química , Arginina/metabolismo , Simulación por Computador , Metilación , Datos de Secuencia Molecular , Proteína-Arginina N-Metiltransferasas/química , Proteoma/análisis , Proteoma/química , Proteínas Represoras/química , Proteínas de Saccharomyces cerevisiae/química , Alineación de Secuencia , Factores de Transcripción/química , Factores de Transcripción/metabolismoRESUMEN
Cotranscriptional recruitment of pre-mRNA splicing factors to their genomic targets facilitates efficient and ordered assembly of a mature messenger ribonucleoprotein particle (mRNP). However, how the cotranscriptional recruitment of splicing factors is regulated remains largely unknown. Here, we demonstrate that protein arginine methylation plays a novel role in regulating this process in Saccharomyces cerevisiae. Our data show that Hmt1, the major type I arginine methyltransferase, methylates Snp1, a U1 small nuclear RNP (snRNP)-specific protein, and that the mammalian Snp1 homolog, U1-70K, is likewise arginine methylated. Genome-wide localization analysis reveals that the deletion of the HMT1 gene deregulates the recruitment of U1 snRNP and its associated components to intron-containing genes (ICGs). In the same context, splicing factors acting downstream of U1 snRNP addition bind to a reduced number of ICGs. Quantitative measurement of the abundance of spliced target transcripts shows that these changes in recruitment result in an increase in the splicing efficiency of developmentally regulated mRNAs. We also show that in the absence of either Hmt1 or of its catalytic activity, an association between Snp1 and the SR-like protein Npl3 is substantially increased. Together, these data support a model whereby arginine methylation modulates dynamic associations between SR-like protein and pre-mRNA splicing factor to promote target specificity in splicing.