RESUMO
The quantification of the kinetic rates of RNA synthesis, processing, and degradation are largely based on the integrative analysis of total and nascent transcription, the latter being quantified through RNA metabolic labeling. We developed INSPEcT-, a computational method based on the mathematical modeling of premature and mature RNA expression that is able to quantify kinetic rates from steady-state or time course total RNA-seq data without requiring any information on nascent transcripts. Our approach outperforms available solutions, closely recapitulates the kinetic rates obtained through RNA metabolic labeling, improves the ability to detect changes in transcript half-lives, reduces the cost and complexity of the experiments, and can be adopted to study experimental conditions in which nascent transcription cannot be readily profiled. Finally, we applied INSPEcT- to the characterization of post-transcriptional regulation landscapes in dozens of physiological and disease conditions. This approach was included in the INSPEcT Bioconductor package, which can now unveil RNA dynamics from steady-state or time course data, with or without the profiling of nascent RNA.
Assuntos
RNA-Seq , RNA/metabolismo , Biologia Computacional/métodos , Doença/genética , Expressão Gênica , Genoma , Humanos , Cinética , RNA/biossíntese , Processamento Pós-Transcricional do RNA , RNA-Seq/métodos , TiouridinaRESUMO
Cell-to-cell variability in protein concentrations is strongly affected by extrinsic noise, especially for highly expressed genes. Extrinsic noise can be due to fluctuations of several possible cellular factors connected to cell physiology and to the level of key enzymes in the expression process. However, how to identify the predominant sources of extrinsic noise in a biological system is still an open question. This work considers a general stochastic model of gene expression with extrinsic noise represented as fluctuations of the different model rates, and focuses on the out-of-equilibrium expression dynamics. Combining analytical calculations with stochastic simulations, we characterize how extrinsic noise shapes the protein variability during gene activation or inactivation, depending on the prevailing source of extrinsic variability, on its intensity and timescale. In particular, we show that qualitatively different noise profiles can be identified depending on which are the fluctuating parameters. This indicates an experimentally accessible way to pinpoint the dominant sources of extrinsic noise using time-coarse experiments.
Assuntos
Fenômenos Fisiológicos Celulares , Proteínas , Expressão Gênica , Processos Estocásticos , Modelos BiológicosRESUMO
This work studies the effects of the two rounds of Whole Genome Duplication (WGD) at the origin of the vertebrate lineage on the architecture of the human gene regulatory networks. We integrate information on transcriptional regulation, miRNA regulation, and protein-protein interactions to comparatively analyse the role of WGD and Small Scale Duplications (SSD) in the structural properties of the resulting multilayer network. We show that complex network motifs, such as combinations of feed-forward loops and bifan arrays, deriving from WGD events are specifically enriched in the network. Pairs of WGD-derived proteins display a strong tendency to interact both with each other and with common partners and WGD-derived transcription factors play a prominent role in the retention of a strong regulatory redundancy. Combinatorial regulation and synergy between different regulatory layers are in general enhanced by duplication events, but the two types of duplications contribute in different ways. Overall, our findings suggest that the two WGD events played a substantial role in increasing the multi-layer complexity of the vertebrate regulatory network by enhancing its combinatorial organization, with potential consequences on its overall robustness and ability to perform high-level functions like signal integration and noise control. Lastly, we discuss in detail the RAR/RXR pathway as an illustrative example of the evolutionary impact of WGD duplications in human.
Assuntos
Evolução Molecular , Duplicação Gênica/genética , Redes Reguladoras de Genes/genética , Genoma Humano/genética , Animais , Genômica , Humanos , Modelos Genéticos , Vertebrados/genéticaRESUMO
Heterogeneity is a fundamental feature of complex phenotypes. So far, genomic screenings have profiled thousands of samples providing insights into the transcriptome of the cell. However, disentangling the heterogeneity of these transcriptomic Big Data to identify defective biological processes remains challenging. Here we present GSECA, a method exploiting the bimodal behavior of RNA-sequencing gene expression profiles to identify altered gene sets in heterogeneous patient cohorts. Using simulated and experimental RNA-sequencing data sets, we show that GSECA provides higher performances than other available algorithms in detecting truly altered biological processes in large cohorts. Applied to 5941 samples from 14 different cancer types, GSECA correctly identified the alteration of the PI3K/AKT signaling pathway driven by the somatic loss of PTEN and verified the emerging role of PTEN in modulating immune-related processes. In particular, we showed that, in prostate cancer, PTEN loss appears to establish an immunosuppressive tumor microenvironment through the activation of STAT3, and low PTEN expression levels have a detrimental impact on patient disease-free survival. GSECA is available at https://github.com/matteocereda/GSECA.
Assuntos
Big Data , Sequenciamento do Exoma/estatística & dados numéricos , RNA/genética , Transcriptoma/genética , Linhagem Celular Tumoral , Intervalo Livre de Doença , Regulação da Expressão Gênica/genética , Humanos , Internet , PTEN Fosfo-Hidrolase/genética , Fator de Transcrição STAT3/genética , Análise de Sequência de RNA , Transdução de Sinais/genética , Software , Microambiente Tumoral/genéticaRESUMO
MicroRNAs play important roles in many biological processes. Their aberrant expression can have oncogenic or tumor suppressor function directly participating to carcinogenesis, malignant transformation, invasiveness and metastasis. Indeed, miRNA profiles can distinguish not only between normal and cancerous tissue but they can also successfully classify different subtypes of a particular cancer. Here, we focus on a particular class of transcripts encoding polycistronic miRNA genes that yields multiple miRNA components. We describe 'clustered MiRNA Master Regulator Analysis (ClustMMRA)', a fully redesigned release of the MMRA computational pipeline (MiRNA Master Regulator Analysis), developed to search for clustered miRNAs potentially driving cancer molecular subtyping. Genomically clustered miRNAs are frequently co-expressed to target different components of pro-tumorigenic signaling pathways. By applying ClustMMRA to breast cancer patient data, we identified key miRNA clusters driving the phenotype of different tumor subgroups. The pipeline was applied to two independent breast cancer datasets, providing statistically concordant results between the two analyses. We validated in cell lines the miR-199/miR-214 as a novel cluster of miRNAs promoting the triple negative breast cancer (TNBC) phenotype through its control of proliferation and EMT.
Assuntos
Transição Epitelial-Mesenquimal/genética , MicroRNAs/genética , Família Multigênica/genética , Neoplasias de Mama Triplo Negativas/genética , Neoplasias de Mama Triplo Negativas/patologia , Linhagem Celular Tumoral , Proliferação de Células , Conjuntos de Dados como Assunto , Inativação Gênica , Humanos , Invasividade Neoplásica/genética , Reprodutibilidade dos Testes , Neoplasias de Mama Triplo Negativas/classificaçãoRESUMO
This preface introduces the content of the BioMed Central Bioinformatics journal Supplement related to the 15th annual meeting of the Bioinformatics Italian Society, BITS2018. The Conference was held in Torino, Italy, from June 27th to 29th, 2018.
Assuntos
Biologia Computacional , Algoritmos , Animais , Caenorhabditis elegans/embriologia , Caenorhabditis elegans/genética , Elementos de DNA Transponíveis/genética , Genômica , Humanos , Itália , SoftwareRESUMO
Free-electron lasers (FELs) based on superconducting accelerator technology and storage ring facilities operate with bunch repetition rates in the MHz range, and the need arises for bunch-by-bunch electron and photon diagnostics. For photon-pulse-resolved measurements of spectral distributions, fast one-dimensional profile monitors are required. The linear array detector KALYPSO (KArlsruhe Linear arraY detector for MHz-rePetition rate SpectrOscopy) has been developed for electron bunch or photon pulse synchronous read-out with frame rates of up to 2.7â MHz. At the FLASH facility at DESY, a current version of KALYPSO with 256 pixels has been installed at a grating spectrometer as online diagnostics to monitor the pulse-resolved spectra of the high-repetition-rate FEL pulses. Application-specific front-end electronics based on MicroTCA standard have been developed for data acquisition and processing. Continuous data read-out with low latency in the microsecond range enables the integration into fast feedback applications. In this paper, pulse-resolved FEL spectra recorded at 1.0â MHz repetition rate for various operation conditions at FLASH are presented, and the first application of an adaptive feedback for accelerator control based on photon beam diagnostics is demonstrated.
Assuntos
Refratometria/instrumentação , Elétrons , Desenho de Equipamento , Lasers , Fótons , Espalhamento de Radiação , SíncrotronsRESUMO
Timing is essential for many cellular processes, from cellular responses to external stimuli to the cell cycle and circadian clocks. Many of these processes are based on gene expression. For example, an activated gene may be required to reach in a precise time a threshold level of expression that triggers a specific downstream process. However, gene expression is subject to stochastic fluctuations, naturally inducing an uncertainty in this threshold-crossing time with potential consequences on biological functions and phenotypes. Here, we consider such 'timing fluctuations' and we ask how they can be controlled. Our analytical estimates and simulations show that, for an induced gene, timing variability is minimal if the threshold level of expression is approximately half of the steady-state level. Timing fluctuations can be reduced by increasing the transcription rate, while they are insensitive to the translation rate. In presence of self-regulatory strategies, we show that self-repression reduces timing noise for threshold levels that have to be reached quickly, while self-activation is optimal at long times. These results lay a framework for understanding stochasticity of endogenous systems such as the cell cycle, as well as for the design of synthetic trigger circuits.
Assuntos
Regulação da Expressão Gênica , Ciclo Celular , Relógios Circadianos , Simulação por Computador , Redes Reguladoras de Genes , Homeostase , Modelos Genéticos , Processos Estocásticos , Fatores de TempoRESUMO
Matrix factorization (MF) is an established paradigm for large-scale biological data analysis with tremendous potential in computational biology. Here, we challenge MF in depicting the molecular bases of epidemiologically described disease-disease (DD) relationships. As a use case, we focus on the inverse comorbidity association between Alzheimer's disease (AD) and lung cancer (LC), described as a lower than expected probability of developing LC in AD patients. To this day, the molecular mechanisms underlying DD relationships remain poorly explained and their better characterization might offer unprecedented clinical opportunities. To this goal, we extend our previously designed MF-based framework for the molecular characterization of DD relationships. Considering AD-LC inverse comorbidity as a case study, we highlight multiple molecular mechanisms, among which we confirm the involvement of processes related to the immune system and mitochondrial metabolism. We then distinguish mechanisms specific to LC from those shared with other cancers through a pan-cancer analysis. Additionally, new candidate molecular players, such as estrogen receptor (ER), cadherin 1 (CDH1) and histone deacetylase (HDAC), are pinpointed as factors that might underlie the inverse relationship, opening the way to new investigations. Finally, some lung cancer subtype-specific factors are also detected, also suggesting the existence of heterogeneity across patients in the context of inverse comorbidity.
Assuntos
Doença de Alzheimer/epidemiologia , Biologia Computacional , Neoplasias Pulmonares/epidemiologia , Modelos Biológicos , Algoritmos , Doença de Alzheimer/complicações , Doença de Alzheimer/etiologia , Comorbidade , Biologia Computacional/métodos , Humanos , Neoplasias Pulmonares/complicações , Neoplasias Pulmonares/etiologiaRESUMO
MicroRNAs have been found to be necessary for regulating genes implicated in almost all signaling pathways, and consequently their dysfunction influences many diseases, including cancer. Understanding of the complexity of the microRNA-mediated regulatory network has grown in terms of size, connectivity and dynamics with the development of computational and, more recently, experimental high-throughput approaches for microRNA target identification. Newly developed studies on recurrent microRNA-mediated circuits in regulatory networks, also known as network motifs, have substantially contributed to addressing this complexity, and therefore to helping understand the ways by which microRNAs achieve their regulatory role. This review provides a summarizing view of the state-of-the-art, and perspectives of research efforts on microRNA-mediated regulatory motifs. In this review, we discuss the topological properties characterizing different types of circuits, and the regulatory features theoretically enabled by such properties, with a special emphasis on examples of circuits typifying their biological significance in experimentally validated contexts. Finally, we will consider possible future developments, in particular regarding microRNA-mediated circuits involving long non-coding RNAs and epigenetic regulators.
Assuntos
Redes Reguladoras de Genes , MicroRNAs/genética , Animais , Biologia Computacional , Terapia Genética , Humanos , Camundongos , MicroRNAs/metabolismo , Neoplasias/genética , Neoplasias/terapia , RNA Longo não Codificante/genética , RNA Longo não Codificante/metabolismo , Fatores de Transcrição/metabolismoRESUMO
Estrogen receptor-α (ERα) has central role in hormone-dependent breast cancer and its ligand-induced functions have been extensively characterized. However, evidence exists that ERα has functions that are independent of ligands. In the present work, we investigated the binding of ERα to chromatin in the absence of ligands and its functions on gene regulation. We demonstrated that in MCF7 breast cancer cells unliganded ERα binds to more than 4,000 chromatin sites. Unexpectedly, although almost entirely comprised in the larger group of estrogen-induced binding sites, we found that unliganded-ERα binding is specifically linked to genes with developmental functions, compared with estrogen-induced binding. Moreover, we found that siRNA-mediated down-regulation of ERα in absence of estrogen is accompanied by changes in the expression levels of hundreds of coding and noncoding RNAs. Down-regulated mRNAs showed enrichment in genes related to epithelial cell growth and development. Stable ERα down-regulation using shRNA, which caused cell growth arrest, was accompanied by increased H3K27me3 at ERα binding sites. Finally, we found that FOXA1 and AP2γ binding to several sites is decreased upon ERα silencing, suggesting that unliganded ERα participates, together with other factors, in the maintenance of the luminal-specific cistrome in breast cancer cells.
Assuntos
Neoplasias da Mama/genética , Receptor alfa de Estrogênio/metabolismo , Genoma Humano/genética , Sítios de Ligação , Neoplasias da Mama/patologia , Proliferação de Células , Imunoprecipitação da Cromatina , Feminino , Ontologia Genética , Humanos , Ligantes , Células MCF-7 , Reação em Cadeia da Polimerase , RNA Interferente Pequeno/metabolismoRESUMO
Using arbitrary periodic pulse patterns we show the enhancement of specific frequencies in a frequency comb. The envelope of a regular frequency comb originates from equally spaced, identical pulses and mimics the single pulse spectrum. We investigated spectra originating from the periodic emission of pulse trains with gaps and individual pulse heights, which are commonly observed, for example, at high-repetition-rate free electron lasers, high power lasers, and synchrotrons. The ANKA synchrotron light source was filled with defined patterns of short electron bunches generating coherent synchrotron radiation in the terahertz range. We resolved the intensities of the frequency comb around 0.258 THz using the heterodyne mixing spectroscopy with a resolution of down to 1 Hz and provide a comprehensive theoretical description. Adjusting the electron's revolution frequency, a gapless spectrum can be recorded, improving the resolution by up to 7 and 5 orders of magnitude compared to FTIR and recent heterodyne measurements, respectively. The results imply avenues to optimize and increase the signal-to-noise ratio of specific frequencies in the emitted synchrotron radiation spectrum to enable novel ultrahigh resolution spectroscopy and metrology applications from the terahertz to the x-ray region.
RESUMO
It is well known that, under suitable conditions, microRNAs are able to fine tune the relative concentration of their targets to any desired value. We show that this function is particularly effective when one of the targets is a Transcription Factor (TF) which regulates the other targets. This combination defines a new class of feed-forward loops (FFLs) in which the microRNA plays the role of master regulator. Using both deterministic and stochastic equations, we show that these FFLs are indeed able not only to fine-tune the TF/target ratio to any desired value as a function of the miRNA concentration but also, thanks to the peculiar topology of the circuit, to ensure the stability of this ratio against stochastic fluctuations. These two effects are due to the interplay between the direct transcriptional regulation and the indirect TF/Target interaction due to competition of TF and target for miRNA binding (the so called "sponge effect"). We then perform a genome wide search of these FFLs in the human regulatory network and show that they are characterized by a very peculiar enrichment pattern. In particular, they are strongly enriched in all the situations in which the TF and its target have to be precisely kept at the same concentration notwithstanding the environmental noise. As an example we discuss the FFL involving E2F1 as Transcription Factor, RB1 as target and miR-17 family as master regulator. These FFLs ensure a tight control of the E2F/RB ratio which in turns ensures the stability of the transition from the G0/G1 to the S phase in quiescent cells.
Assuntos
Redes Reguladoras de Genes , MicroRNAs/genética , Modelos Genéticos , Algoritmos , Biologia Computacional , Fator de Transcrição E2F1/metabolismo , Genes do Retinoblastoma , Instabilidade Genômica , Humanos , MicroRNAs/metabolismo , Processamento Pós-Transcricional do RNA , Processos Estocásticos , Fatores de Transcrição/metabolismoRESUMO
Topic modeling is a popular technique in machine learning and natural language processing, where a corpus of text documents is classified into themes or topics using word frequency analysis. This approach has proven successful in various biological data analysis applications, such as predicting cancer subtypes with high accuracy and identifying genes, enhancers, and stable cell types simultaneously from sparse single-cell epigenomics data. The advantage of using a topic model is that it not only serves as a clustering algorithm, but it can also explain clustering results by providing word probability distributions over topics. Our study proposes a novel topic modeling approach for clustering single cells and detecting topics (gene signatures) in single-cell datasets that measure multiple omics simultaneously. We applied this approach to examine the transcriptional heterogeneity of luminal and triple-negative breast cancer cells using patient-derived xenograft models with acquired resistance to chemotherapy and targeted therapy. Through this approach, we identified protein-coding genes and long non-coding RNAs (lncRNAs) that group thousands of cells into biologically similar clusters, accurately distinguishing drug-sensitive and -resistant breast cancer types. In comparison to standard state-of-the-art clustering analyses, our approach offers an optimal partitioning of genes into topics and cells into clusters simultaneously, producing easily interpretable clustering outcomes. Additionally, we demonstrate that an integrative clustering approach, which combines the information from mRNAs and lncRNAs treated as disjoint omics layers, enhances the accuracy of cell classification.
RESUMO
Chronic obstructive pulmonary disease (COPD) is an etiologically complex disease characterized by acute exacerbations and stable phases. We aimed to identify biological functions modulated in specific COPD conditions, using whole blood samples collected in the AERIS clinical study (NCT01360398). Considered conditions were exacerbation onset, severity of airway obstruction, and presence of respiratory pathogens in sputum samples. With an integrative multi-network gene community detection (MNGCD) approach, we analyzed expression profiles to identify communities of correlated genes. The approach combined different layers of gene interactions for each explored condition/subset of samples: gene expression similarity, protein-protein interactions, transcription factors, and microRNAs validated regulons. Heme metabolism, interferon-alpha, and interferon-gamma pathways were modulated in patients at both exacerbation and stable-state visits, but with the involvement of distinct sets of genes. An important gene community was enriched with G2M checkpoint, E2F targets, and mitotic spindle pathways during exacerbation. Targets of TAL1 regulator and hsa-let-7b - 5p microRNA were modulated with increasing severity of airway obstruction. Bacterial infections with Moraxella catarrhalis and, particularly, Haemophilus influenzae triggered a specific cellular and inflammatory response in acute exacerbations, indicating an active reaction of the host to infections. In conclusion, COPD is a complex multifactorial disease that requires in-depth investigations of its causes and features during its evolution and whole blood transcriptome profiling can contribute to capturing some relevant regulatory mechanisms associated with this disease. In this work, we explored multi-network modeling that integrated diverse layers of regulatory gene networks and enhanced our comprehension of the biological functions implicated in the COPD pathogenesis.
Assuntos
Redes Reguladoras de Genes , Doença Pulmonar Obstrutiva Crônica , Idoso , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Perfilação da Expressão Gênica/métodos , Redes Reguladoras de Genes/genética , MicroRNAs/genética , Doença Pulmonar Obstrutiva Crônica/genética , Doença Pulmonar Obstrutiva Crônica/microbiologiaRESUMO
Large-scale data on single-cell gene expression have the potential to unravel the specific transcriptional programs of different cell types. The structure of these expression datasets suggests a similarity with several other complex systems that can be analogously described through the statistics of their basic building blocks. Transcriptomes of single cells are collections of messenger RNA abundances transcribed from a common set of genes just as books are different collections of words from a shared vocabulary, genomes of different species are specific compositions of genes belonging to evolutionary families, and ecological niches can be described by their species abundances. Following this analogy, we identify several emergent statistical laws in single-cell transcriptomic data closely similar to regularities found in linguistics, ecology, or genomics. A simple mathematical framework can be used to analyze the relations between different laws and the possible mechanisms behind their ubiquity. Importantly, treatable statistical models can be useful tools in transcriptomics to disentangle the actual biological variability from general statistical effects present in most component systems and from the consequences of the sampling process inherent to the experimental technique.
Assuntos
Perfilação da Expressão Gênica , Transcriptoma , Humanos , Genômica/métodos , Ecossistema , EcologiaRESUMO
The description of physical processes with many-particle systems is a key approach to the modeling of numerous physical systems. For example in storage rings, where ultrarelativistic particles are agglomerated in dense bunches, the modeling and measurement of their phase-space distribution is of paramount importance: at any time the phase-space distribution not only determines the complete space-time evolution but also provides fundamental performance characteristics for storage ring operation. Here, we demonstrate a non-destructive tomographic imaging technique for the 2D longitudinal phase-space distribution of ultrarelativistic electron bunches. For this purpose, we utilize a unique setup, which streams turn-by-turn near-field measurements of bunch profiles at MHz repetition rates. To demonstrate the feasibility of our method, we induce a non-equilibrium state and show that the phase-space distribution microstructuring as well as the phase-space distribution dynamics can be observed in great detail. Our approach offers a pathway to control ultrashort bunches and supports, as one example, the development of compact accelerators with low energy footprints.
RESUMO
BACKGROUND: In the last few years several studies have shown that Transposable Elements (TEs) in the human genome are significantly associated with Transcription Factor Binding Sites (TFBSs) and that in several cases their expansion within the genome led to a substantial rewiring of the regulatory network. Another important feature of the regulatory network which has been thoroughly studied is the combinatorial organization of transcriptional regulation. In this paper we combine these two observations and suggest that TEs, besides rewiring the network, also played a central role in the evolution of particular patterns of combinatorial gene regulation. RESULTS: To address this issue we searched for TEs overlapping Estrogen Receptor α (ERα) binding peaks in two publicly available ChIP-seq datasets from the MCF7 cell line corresponding to different modalities of exposure to estrogen. We found a remarkable enrichment of a few specific classes of Transposons. Among these a prominent role was played by MIR (Mammalian Interspersed Repeats) transposons. These TEs underwent a dramatic expansion at the beginning of the mammalian radiation and then stabilized. We conjecture that the special affinity of ERα for the MIR class of TEs could be at the origin of the important role assumed by ERα in Mammalians. We then searched for TFBSs within the TEs overlapping ChIP-seq peaks. We found a strong enrichment of a few precise combinations of TFBS. In several cases the corresponding Transcription Factors (TFs) were known cofactors of ERα, thus supporting the idea of a co-regulatory role of TFBS within the same TE. Moreover, most of these correlations turned out to be strictly associated to specific classes of TEs thus suggesting the presence of a well-defined "transposon code" within the regulatory network. CONCLUSIONS: In this work we tried to shed light into the role of Transposable Elements (TEs) in shaping the regulatory network of higher eukaryotes. To test this idea we focused on a particular transcription factor: the Estrogen Receptor α (ERα) and we found that ERα preferentially targets a well defined set of TEs and that these TEs host combinations of transcriptional regulators involving several of known co-regulators of ERα. Moreover, a significant number of these TEs turned out to be conserved between human and mouse and located in the vicinity (and thus candidate to be regulators) of important estrogen-related genes.
Assuntos
Elementos de DNA Transponíveis , Receptor alfa de Estrogênio/genética , Regulação da Expressão Gênica , Fatores de Transcrição/genética , Sequência de Bases , Sítios de Ligação , Genoma Humano , Humanos , Células MCF-7 , Anotação de Sequência Molecular , Dados de Sequência MolecularRESUMO
MicroRNAs are endogenous non-coding RNAs which negatively regulate the expression of protein-coding genes in plants and animals. They are known to play an important role in several biological processes and, together with transcription factors, form a complex and highly interconnected regulatory network. Looking at the structure of this network, it is possible to recognize a few overrepresented motifs which are expected to perform important elementary regulatory functions. Among them, a special role is played by the microRNA-mediated feedforward loop in which a master transcription factor regulates a microRNA and, together with it, a set of target genes. In this paper we show analytically and through simulations that the incoherent version of this motif can couple the fine-tuning of a target protein level with an efficient noise control, thus conferring precision and stability to the overall gene expression program, especially in the presence of fluctuations in upstream regulators. Among the other results, a nontrivial prediction of our model is that the optimal attenuation of fluctuations coincides with a modest repression of the target expression. This feature is coherent with the expected fine-tuning function and in agreement with experimental observations of the actual impact of a wide class of microRNAs on the protein output of their targets. Finally, we describe the impact on noise-buffering efficiency of the cross-talk between microRNA targets that can naturally arise if the microRNA-mediated circuit is not considered as isolated, but embedded in a larger network of regulations.
Assuntos
Redes Reguladoras de Genes/genética , MicroRNAs/metabolismo , Biologia Computacional/métodos , Expressão Gênica , MicroRNAs/genética , Fatores de Transcrição/genéticaRESUMO
The integration of transcriptional data with other layers of information, such as the post-transcriptional regulation mediated by microRNAs, can be crucial to identify the driver genes and the subtypes of complex and heterogeneous diseases such as cancer. This paper presents an approach based on topic modeling to accomplish this integration task. More specifically, we show how an algorithm based on a hierarchical version of stochastic block modeling can be naturally extended to integrate any combination of 'omics data. We test this approach on breast cancer samples from the TCGA database, integrating data on messenger RNA, microRNAs, and copy number variations. We show that the inclusion of the microRNA layer significantly improves the accuracy of subtype classification. Moreover, some of the hidden structures or "topics" that the algorithm extracts actually correspond to genes and microRNAs involved in breast cancer development and are associated to the survival probability.