RESUMO
Multi-omic single-cell technologies, which simultaneously measure the transcriptional and epigenomic state of the same cell, enable understanding epigenetic mechanisms of gene regulation. However, noisy and sparse data pose fundamental statistical challenges to extract biological knowledge from complex datasets. SHARE-Topic, a Bayesian generative model of multi-omic single cell data using topic models, aims to address these challenges. SHARE-Topic identifies common patterns of co-variation between different omic layers, providing interpretable explanations for the data complexity. Tested on data from different technological platforms, SHARE-Topic provides low dimensional representations recapitulating known biology and defines associations between genes and distal regulators in individual cells.
Assuntos
Epigenômica , Multiômica , Teorema de Bayes , Epigênese GenéticaRESUMO
BACKGROUND: The large-scale availability of whole-genome sequencing profiles from bulk DNA sequencing of cancer tissues is fueling the application of evolutionary theory to cancer. From a bulk biopsy, subclonal deconvolution methods are used to determine the composition of cancer subpopulations in the biopsy sample, a fundamental step to determine clonal expansions and their evolutionary trajectories. RESULTS: In a recent work we have developed a new model-based approach to carry out subclonal deconvolution from the site frequency spectrum of somatic mutations. This new method integrates, for the first time, an explicit model for neutral evolutionary forces that participate in clonal expansions; in that work we have also shown that our method improves largely over competing data-driven methods. In this Software paper we present mobster, an open source R package built around our new deconvolution approach, which provides several functions to plot data and fit models, assess their confidence and compute further evolutionary analyses that relate to subclonal deconvolution. CONCLUSIONS: We present the mobster package for tumour subclonal deconvolution from bulk sequencing, the first approach to integrate Machine Learning and Population Genetics which can explicitly model co-existing neutral and positive selection in cancer. We showcase the analysis of two datasets, one simulated and one from a breast cancer patient, and overview all package functionalities.
Assuntos
Neoplasias da Mama/genética , DNA de Neoplasias/genética , Software , Sequenciamento Completo do Genoma , Proliferação de Células , Células Clonais , Análise de Dados , Feminino , Genética Populacional , Humanos , Aprendizado de Máquina , Modelos Genéticos , Mutação/genéticaRESUMO
Most cancer genomic data are generated from bulk samples composed of mixtures of cancer subpopulations, as well as normal cells. Subclonal reconstruction methods based on machine learning aim to separate those subpopulations in a sample and infer their evolutionary history. However, current approaches are entirely data driven and agnostic to evolutionary theory. We demonstrate that systematic errors occur in the analysis if evolution is not accounted for, and this is exacerbated with multi-sampling of the same tumor. We present a novel approach for model-based tumor subclonal reconstruction, called MOBSTER, which combines machine learning with theoretical population genetics. Using public whole-genome sequencing data from 2,606 samples from different cohorts, new data and synthetic validation, we show that this method is more robust and accurate than current techniques in single-sample, multiregion and longitudinal data. This approach minimizes the confounding factors of nonevolutionary methods, thus leading to more accurate recovery of the evolutionary history of human cancers.
Assuntos
Neoplasias/genética , Evolução Clonal/genética , Genética Populacional/métodos , Genômica/métodos , Humanos , Aprendizado de Máquina , Sequenciamento Completo do Genoma/métodosRESUMO
Recurrent successions of genomic changes, both within and between patients, reflect repeated evolutionary processes that are valuable for the anticipation of cancer progression. Multi-region sequencing allows the temporal order of some genomic changes in a tumor to be inferred, but the robust identification of repeated evolution across patients remains a challenge. We developed a machine-learning method based on transfer learning that allowed us to overcome the stochastic effects of cancer evolution and noise in data and identified hidden evolutionary patterns in cancer cohorts. When applied to multi-region sequencing datasets from lung, breast, renal, and colorectal cancer (768 samples from 178 patients), our method detected repeated evolutionary trajectories in subgroups of patients, which were reproduced in single-sample cohorts (n = 2,935). Our method provides a means of classifying patients on the basis of how their tumor evolved, with implications for the anticipation of disease progression.
Assuntos
Evolução Molecular , Neoplasias/classificação , Neoplasias/patologia , Linhagem Celular Tumoral , Estudos de Coortes , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Aprendizado de Máquina , Neoplasias/genética , Reprodutibilidade dos Testes , Processos EstocásticosRESUMO
The RNA-binding protein HuD promotes neurogenesis and favors recovery from peripheral axon injury. HuD interacts with many mRNAs, altering both stability and translation efficiency. We generated a nucleotide resolution map of the HuD RNA interactome in motor neuron-like cells, identifying HuD target sites in 1,304 mRNAs, almost exclusively in the 3' UTR. HuD binds many mRNAs encoding mTORC1-responsive ribosomal proteins and translation factors. Altered HuD expression correlates with the translation efficiency of these mRNAs and overall protein synthesis, in a mTORC1-independent fashion. The predominant HuD target is the abundant, small non-coding RNA Y3, amounting to 70% of the HuD interaction signal. Y3 functions as a molecular sponge for HuD, dynamically limiting its recruitment to polysomes and its activity as a translation and neuron differentiation enhancer. These findings uncover an alternative route to the mTORC1 pathway for translational control in motor neurons that is tunable by a small non-coding RNA.
Assuntos
Proteína Semelhante a ELAV 4/genética , Alvo Mecanístico do Complexo 1 de Rapamicina/genética , Neurônios Motores/fisiologia , Pequeno RNA não Traduzido/genética , Regiões 3' não Traduzidas , Membro 2 da Subfamília B de Transportadores de Cassetes de Ligação de ATP , Animais , Linhagem Celular , Proteína Semelhante a ELAV 4/metabolismo , Humanos , Alvo Mecanístico do Complexo 1 de Rapamicina/metabolismo , Camundongos , Neurônios Motores/metabolismo , Neurogênese/genética , Polirribossomos/metabolismo , RNA Longo não Codificante/genética , RNA Longo não Codificante/metabolismo , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Pequeno RNA não Traduzido/metabolismoRESUMO
AIMS: Carbon monoxide is a respiratory poison and gaseous signaling molecule. Although CO-releasing molecules (CORMs) deliver CO with temporal and spatial specificity in mammals, and are proven antimicrobial agents, we do not understand the modes of CO toxicity. Our aim was to explore the impact of CO gas per se, without intervention of CORMs, on bacterial physiology and gene expression. RESULTS: We used tightly controlled chemostat conditions and integrated transcriptomic datasets with statistical modeling to reveal the global effects of CO. CO is known to inhibit bacterial respiration, and we found expression of genes encoding energy-transducing pathways to be significantly affected via the global regulators, Fnr, Arc, and PdhR. Aerobically, ArcA-the response regulator-is transiently phosphorylated and pyruvate accumulates, mimicking anaerobiosis. Genes implicated in iron acquisition, and the metabolism of sulfur amino acids and arginine, are all perturbed. The global iron-related changes, confirmed by modulation of activity of the transcription factor Fur, may underlie enhanced siderophore excretion, diminished intracellular iron pools, and the sensitivity of CO-challenged bacteria to metal chelators. Although CO gas (unlike H2S and NO) offers little protection from antibiotics, a ruthenium CORM is a potent adjuvant of antibiotic activity. INNOVATION: This is the first detailed exploration of global bacterial responses to CO, revealing unexpected targets with implications for employing CORMs therapeutically. CONCLUSION: This work reveals the complexity of bacterial responses to CO and provides a basis for understanding the impacts of CO from CORMs, heme oxygenase activity, or environmental sources. Antioxid. Redox Signal. 24, 1013-1028.
Assuntos
Antibacterianos/farmacologia , Monóxido de Carbono/fisiologia , Escherichia coli/metabolismo , Ferro/metabolismo , Aerobiose , Aminoácidos/biossíntese , Anaerobiose , Resistência Microbiana a Medicamentos , Escherichia coli/efeitos dos fármacos , Escherichia coli/genética , Escherichia coli/crescimento & desenvolvimento , Proteínas de Escherichia coli/genética , Proteínas de Escherichia coli/metabolismo , Regulação Bacteriana da Expressão Gênica , Genes Bacterianos , Redes e Vias Metabólicas , Testes de Sensibilidade Microbiana , Fosforilação , Processamento de Proteína Pós-Traducional , Sideróforos/genética , Sideróforos/metabolismo , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , TranscriptomaRESUMO
The glutathione/cysteine exporter CydDC maintains redox balance in Escherichia coli. A cydD mutant strain was used to probe the influence of CydDC upon reduced thiol export, gene expression, metabolic perturbations, intracellular pH homoeostasis and tolerance to nitric oxide (NO). Loss of CydDC was found to decrease extracytoplasmic thiol levels, whereas overexpression diminished the cytoplasmic thiol content. Transcriptomic analysis revealed a dramatic up-regulation of protein chaperones, protein degradation (via phenylpropionate/phenylacetate catabolism), ß-oxidation of fatty acids and genes involved in nitrate/nitrite reduction. (1)H NMR metabolomics revealed elevated methionine and betaine and diminished acetate and NAD(+) in cydD cells, which was consistent with the transcriptomics-based metabolic model. The growth rate and ΔpH, however, were unaffected, although the cydD strain did exhibit sensitivity to the NO-releasing compound NOC-12. These observations are consistent with the hypothesis that the loss of CydDC-mediated reductant export promotes protein misfolding, adaptations to energy metabolism and sensitivity to NO. The addition of both glutathione and cysteine to the medium was found to complement the loss of bd-type cytochrome synthesis in a cydD strain (a key component of the pleiotropic cydDC phenotype), providing the first direct evidence that CydDC substrates are able to restore the correct assembly of this respiratory oxidase. These data provide an insight into the metabolic flexibility of E. coli, highlight the importance of bacterial redox homoeostasis during nitrosative stress, and report for the first time the ability of periplasmic low molecular weight thiols to restore haem incorporation into a cytochrome complex.
Assuntos
Transportadores de Cassetes de Ligação de ATP/metabolismo , Metabolismo Energético/fisiologia , Proteínas de Escherichia coli/metabolismo , Escherichia coli/metabolismo , Regulação Bacteriana da Expressão Gênica/fisiologia , Transportadores de Cassetes de Ligação de ATP/genética , Transporte Biológico , Escherichia coli/genética , Proteínas de Escherichia coli/genética , Deleção de Genes , Modelos Biológicos , Nitrosação , Oxirredução , Estresse Fisiológico , Transcrição GênicaRESUMO
BACKGROUND: Functional genomic and epigenomic research relies fundamentally on sequencing based methods like ChIP-seq for the detection of DNA-protein interactions. These techniques return large, high dimensional data sets with visually complex structures, such as multi-modal peaks extended over large genomic regions. Current tools for visualisation and data exploration represent and leverage these complex features only to a limited extent. RESULTS: We present DGW, an open source software package for simultaneous alignment and clustering of multiple epigenomic marks. DGW uses Dynamic Time Warping to adaptively rescale and align genomic distances which allows to group regions of interest with similar shapes, thereby capturing the structure of epigenomic marks. We demonstrate the effectiveness of the approach in a simulation study and on a real epigenomic data set from the ENCODE project. CONCLUSIONS: Our results show that DGW automatically recognises and aligns important genomic features such as transcription start sites and splicing sites from histone marks. DGW is available as an open source Python package.
Assuntos
Simulação por Computador , Epigenômica/métodos , Genoma Humano , Código das Histonas , Software , Imunoprecipitação da Cromatina , Análise por Conglomerados , DNA/metabolismo , Proteínas de Ligação a DNA/metabolismo , Epigênese Genética , Humanos , Leucemia/genéticaRESUMO
Fluctuations in mRNA levels only partially contribute to determine variations in mRNA availability for translation, producing the well-known poor correlation between transcriptome and proteome data. Recent advances in microscopy now enable researchers to obtain high resolution images of ribosomes on transcripts, providing precious snapshots of translation in vivo. Here we propose RiboAbacus, a mathematical model that for the first time incorporates imaging data in a predictive model of transcript-specific ribosome densities and translational efficiencies. RiboAbacus uses a mechanistic model of ribosome dynamics, enabling the quantification of the relative importance of different features (such as codon usage and the 5' ramp effect) in determining the accuracy of predictions. The model has been optimized in the human Hek-293 cell line to fit thousands of images of human polysomes obtained by atomic force microscopy, from which we could get a reference distribution of the number of ribosomes per mRNA with unmatched resolution. After validation, we applied RiboAbacus to three case studies of known transcriptome-proteome datasets for estimating the translational efficiencies, resulting in an increased correlation with corresponding proteomes. RiboAbacus is an intuitive tool that allows an immediate estimation of crucial translation properties for entire transcriptomes, based on easily obtainable transcript expression levels.
Assuntos
Modelos Biológicos , Polirribossomos/ultraestrutura , Biossíntese de Proteínas , Transcriptoma , Animais , Células HEK293 , Humanos , Células MCF-7 , Microscopia de Força Atômica , Proteômica , Coelhos , Reticulócitos/ultraestrutura , Ribossomos/ultraestrutura , SoftwareRESUMO
MOTIVATION: Reconstructing the topology of gene regulatory networks (GRNs) from time series of gene expression data remains an important open problem in computational systems biology. Existing GRN inference algorithms face one of two limitations: model-free methods are scalable but suffer from a lack of interpretability and cannot in general be used for out of sample predictions. On the other hand, model-based methods focus on identifying a dynamical model of the system. These are clearly interpretable and can be used for predictions; however, they rely on strong assumptions and are typically very demanding computationally. RESULTS: Here, we propose a new hybrid approach for GRN inference, called Jump3, exploiting time series of expression data. Jump3 is based on a formal on/off model of gene expression but uses a non-parametric procedure based on decision trees (called 'jump trees') to reconstruct the GRN topology, allowing the inference of networks of hundreds of genes. We show the good performance of Jump3 on in silico and synthetic networks and applied the approach to identify regulatory interactions activated in the presence of interferon gamma.
Assuntos
Algoritmos , Biologia Computacional/métodos , Árvores de Decisões , Regulação da Expressão Gênica , Redes Reguladoras de Genes , Biologia de Sistemas/métodos , Animais , Bases de Dados Factuais , Macrófagos/metabolismo , Camundongos , Análise de Sequência com Séries de Oligonucleotídeos , Saccharomyces cerevisiae/genética , Fatores de Transcrição/metabolismoRESUMO
Escherichia coli is a facultatively anaerobic bacterium. With glucose if no external electron acceptors are available, ATP is produced by substrate level phosphorylation. The intracellular redox balance is maintained by mixed-acid fermentation, that is, the production and excretion of several organic acids. When oxygen is available, E. coli switches to aerobic respiration to achieve redox balance and optimal energy conservation by proton translocation linked to electron transfer. The switch between fermentative and aerobic respiratory growth is driven by extensive changes in gene expression and protein synthesis, resulting in global changes in metabolic fluxes and metabolite concentrations. This oxygen response is determined by the interaction of global and local genetic regulatory mechanisms, as well as by enzymatic regulation. The response is affected by basic physical constraints such as diffusion, thermodynamics and the requirement for a balance of carbon, electrons and energy (predominantly the proton motive force and the ATP pool). A comprehensive systems level understanding of the oxygen response of E. coli requires the integrated interpretation of experimental data that are pertinent to the multiple levels of organization that mediate the response. In the pan-European venture, Systems Biology of Microorganisms (SysMO) and specifically within the project Systems Understanding of Microbial Oxygen Metabolism (SUMO), regulator activities, gene expression, metabolite levels and metabolic flux datasets were obtained using a standardized and reproducible chemostat-based experimental system. These different types and qualities of data were integrated using mathematical models. The approach described here has revealed a much more detailed picture of the aerobic-anaerobic response, especially for the environmentally critical microaerobic range that is located between unlimited oxygen availability and anaerobiosis.
Assuntos
Escherichia coli/metabolismo , Oxigênio/metabolismo , Biologia de Sistemas , Trifosfato de Adenosina/metabolismo , Aerobiose , Escherichia coli/genética , Proteínas de Escherichia coli/genética , Proteínas de Escherichia coli/metabolismo , Regulação Bacteriana da Expressão GênicaRESUMO
AIMS: Carbon monoxide (CO)-releasing molecules (CO-RMs) are being developed with the ultimate goal of safely utilizing the therapeutic potential of CO clinically. One such application is antimicrobial activity; therefore, we aimed to characterize and compare the effects of the CO-RM, CORM-3, and its inactivated counterpart, where all labile CO has been removed, at the transcriptomic and cellular level. RESULTS: We found that both compounds are able to penetrate the cell, but the inactive form is not inhibitory to bacterial growth under conditions where CORM-3 is. Transcriptomic analyses revealed that the bacterial response to inactivated CORM-3 (iCORM-3) is much lower than to the active compound and that a wide range of processes appear to be affected by CORM-3 and to a lesser extent iCORM-3, including energy metabolism, membrane transport, motility, and the metabolism of many sulfur-containing species, including cysteine and methionine. INNOVATION: This work has demonstrated that both CORM-3 and its inactivated counterpart react with cellular functions to yield a complex response at the transcriptomic level. A full understanding of the actions of both compounds is vital to understand the toxic effects of CO-RMs. CONCLUSION: This work has furthered our understanding of how CORM-3 behaves at the cellular level and identifies the responses that occur when the host is exposed to the Ru compound as well as those that result from the released CO. This is a vital step in laying the groundwork for future development of optimized CO-RMs for eventual use in antimicrobial therapy.
Assuntos
Antibacterianos/farmacologia , Escherichia coli/efeitos dos fármacos , Regulação Bacteriana da Expressão Gênica/efeitos dos fármacos , Compostos Organometálicos/farmacologia , Compostos de Sulfidrila/metabolismo , Transcriptoma/efeitos dos fármacos , Transportadores de Cassetes de Ligação de ATP/genética , Transportadores de Cassetes de Ligação de ATP/metabolismo , Anaerobiose , Avaliação Pré-Clínica de Medicamentos , Escherichia coli/genética , Escherichia coli/metabolismo , Proteínas de Escherichia coli/genética , Proteínas de Escherichia coli/metabolismo , Genoma Bacteriano , Redes e Vias Metabólicas/efeitos dos fármacos , Consumo de Oxigênio , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismoRESUMO
MOTIVATION: A knowledge of the dynamics of transcription factors is fundamental to understand the transcriptional regulation mechanism. Nowadays, an experimental measure of transcription factor activities in vivo represents a challenge. Several methods have been developed to infer these activities from easily measurable quantities such as mRNA expression of target genes. A limitation of these methods is represented by the fact that they rely on very simple single-layer structures, typically consisting of one or more transcription factors regulating a number of target genes. RESULTS: We present a novel statistical inference methodology to reverse engineer the dynamics of transcription factors in hierarchical network motifs such as feed-forward loops. The approach we present is based on a continuous time representation of the system where the high-level master transcription factor is represented as a two state Markov jump process driving a system of differential equations. We solve the inference problem using an efficient variational approach and demonstrate our method on simulated data and two real datasets. The results on real data show that the predictions of our approach can capture biological behaviours in a more effective way than single-layer models of transcription, and can lead to novel biological insights. AVAILABILITY: http://homepages.inf.ed.ac.uk/gsanguin/software.html CONTACT: g.sanguinetti@ed.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Redes Reguladoras de Genes , Fatores de Transcrição/metabolismo , Linhagem Celular Tumoral , Regulação da Expressão Gênica , Humanos , Cadeias de MarkovRESUMO
Peroxynitrite is formed in macrophages by the diffusion-limited reaction of superoxide and nitric oxide. This highly reactive species is thought to contribute to bacterial killing by interaction with diverse targets and nitration of protein tyrosines. This work presents for the first time a comprehensive analysis of transcriptional responses to peroxynitrite under tightly controlled chemostat growth conditions. Up-regulation of the cysteine biosynthesis pathway and an increase in S-nitrosothiol levels suggest S-nitrosylation to be a consequence of peroxynitrite exposure. Genes involved in the assembly/repair of iron-sulfur clusters also show enhanced transcription. Unexpectedly, arginine biosynthesis gene transcription levels were also elevated after treatment with peroxynitrite. Analysis of the negative regulator for these genes, ArgR, showed that post-translational nitration of tyrosine residues within this protein is responsible for its degradation in vitro. Further up-regulation was seen in oxidative stress response genes, including katG and ahpCF. However, genes known to be up-regulated by nitric oxide and nitrosating agents (e.g. hmp and norVW) were unaffected. Probabilistic modeling of the transcriptomic data identified five altered transcription factors in response to peroxynitrite exposure, including OxyR and ArgR. Hydrogen peroxide can be present as a contaminant in commercially available peroxynitrite preparations. Transcriptomic analysis of cells treated with hydrogen peroxide alone also revealed up-regulation of oxidative stress response genes but not of many other genes that are up-regulated by peroxynitrite. Thus, the cellular responses to peroxynitrite and hydrogen peroxide are distinct.