RESUMEN
BACKGROUND: Many long non-coding RNAs, known to be involved in transcriptional regulation, are enriched in the nucleus and interact with chromatin. However, their mechanisms of chromatin interaction and the served cellular functions are poorly understood. We sought to characterize the mechanisms of lncRNA nuclear retention by systematically mapping the sequence and chromatin features that distinguish lncRNA-interacting genomic segments. RESULTS: We found DNA 5-mer frequencies to be predictive of chromatin interactions for all lncRNAs, suggesting sequence-specificity as a global theme in the interactome. Sequence features representing protein-DNA and protein-RNA binding motifs revealed potential mechanisms for specific lncRNAs. Complementary to these global themes, transcription-related features and DNA-RNA triplex formation potential were noted to be highly predictive for two mutually exclusive sets of lncRNAs. DNA methylation was also noted to be a significant predictor, but only when combined with other epigenomic features. CONCLUSIONS: Taken together, our statistical findings suggest that a group of lncRNAs interacts with transcriptionally inactive chromatin through triplex formation, whereas another group interacts with transcriptionally active regions and is involved in DNA Damage Response (DDR) through formation of R-loops. Curiously, we observed a strong pattern of enrichment of 5-mers in four potentially interacting entities: lncRNA-bound DNA tiles, lncRNAs, miRNA seed sequences, and repeat elements. This finding points to a broad sequence-based network of interactions that may underlie regulation of fundamental cellular functions. Overall, this study reveals diverse sequence and chromatin features related to lncRNA-chromatin interactions, suggesting potential mechanisms of nuclear retention and regulatory function.
Asunto(s)
ARN Largo no Codificante , ARN Largo no Codificante/metabolismo , Cromatina/genética , ADN/química , Regulación de la Expresión GénicaRESUMEN
Neuronal networks are the standard heuristic model today for describing brain activity associated with animal behavior. Recent studies have revealed an extensive role for a completely distinct layer of networked activities in the brain-the gene regulatory network (GRN)-that orchestrates expression levels of hundreds to thousands of genes in a behavior-related manner. We examine emerging insights into the relationships between these two types of networks and discuss their interplay in spatial as well as temporal dimensions, across multiple scales of organization. We discuss properties expected of behavior-related GRNs by drawing inspiration from the rich literature on GRNs related to animal development, comparing and contrasting these two broad classes of GRNs as they relate to their respective phenotypic manifestations. Developmental GRNs also represent a third layer of network biology, playing out over a third timescale, which is believed to play a crucial mediatory role between neuronal networks and behavioral GRNs. We end with a special emphasis on social behavior, discuss whether unique GRN organization and cis-regulatory architecture underlies this special class of behavior, and review literature that suggests an affirmative answer.
Asunto(s)
Conducta , Encéfalo/fisiología , Redes Reguladoras de Genes , Animales , Encéfalo/crecimiento & desarrollo , Regulación del Desarrollo de la Expresión Génica , HumanosRESUMEN
MOTIVATION: ASTRAL is the current leading method for species tree estimation from phylogenomic datasets (i.e. hundreds to thousands of genes) that addresses gene tree discord resulting from incomplete lineage sorting (ILS). ASTRAL is statistically consistent under the multi-locus coalescent model (MSC), runs in polynomial time, and is able to run on large datasets. Key to ASTRAL's algorithm is the use of dynamic programming to find an optimal solution to the MQSST (maximum quartet support supertree) within a constraint space that it computes from the input. Yet, ASTRAL can fail to complete within reasonable timeframes on large datasets with many genes and species, because in these cases the constraint space it computes is too large. RESULTS: Here, we introduce FASTRAL, a phylogenomic estimation method. FASTRAL is based on ASTRAL, but uses a different technique for constructing the constraint space. The technique we use to define the constraint space maintains statistical consistency and is polynomial time; thus we prove that FASTRAL is a polynomial time algorithm that is statistically consistent under the MSC. Our performance study on both biological and simulated datasets demonstrates that FASTRAL matches or improves on ASTRAL with respect to species tree topology accuracy (and under high ILS conditions it is statistically significantly more accurate), while being dramatically faster-especially on datasets with large numbers of genes and high ILS-due to using a significantly smaller constraint space. AVAILABILITYAND IMPLEMENTATION: FASTRAL is available in open-source form at https://github.com/PayamDiba/FASTRAL. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
RESUMEN
Estrogen Receptor α (ERα) is a major lineage determining transcription factor (TF) in mammary gland development. Dysregulation of ERα-mediated transcriptional program results in cancer. Transcriptomic and epigenomic profiling of breast cancer cell lines has revealed large numbers of enhancers involved in this regulatory program, but how these enhancers encode function in their sequence remains poorly understood. A subset of ERα-bound enhancers are transcribed into short bidirectional RNA (enhancer RNA or eRNA), and this property is believed to be a reliable marker of active enhancers. We therefore analyze thousands of ERα-bound enhancers and build quantitative, mechanism-aware models to discriminate eRNAs from non-transcribing enhancers based on their sequence. Our thermodynamics-based models provide insights into the roles of specific TFs in ERα-mediated transcriptional program, many of which are supported by the literature. We use in silico perturbations to predict TF-enhancer regulatory relationships and integrate these findings with experimentally determined enhancer-promoter interactions to construct a gene regulatory network. We also demonstrate that the model can prioritize breast cancer-related sequence variants while providing mechanistic explanations for their function. Finally, we experimentally validate the model-proposed mechanisms underlying three such variants.
Asunto(s)
Neoplasias de la Mama , Elementos de Facilitación Genéticos , Receptor alfa de Estrógeno , Humanos , Receptor alfa de Estrógeno/genética , Receptor alfa de Estrógeno/metabolismo , Neoplasias de la Mama/genética , Neoplasias de la Mama/metabolismo , Femenino , Regulación Neoplásica de la Expresión Génica , Transcripción Genética , Redes Reguladoras de Genes , Células MCF-7 , Regiones Promotoras Genéticas , Línea Celular TumoralRESUMEN
Elementary modes (EMs) are steady-state metabolic flux vectors with minimal set of active reactions. Each EM corresponds to a metabolic pathway. Therefore, studying EMs is helpful for analyzing the production of biotechnologically important metabolites. However, memory requirements for computing EMs may hamper their applicability as, in most genome-scale metabolic models, no EM can be computed due to running out of memory. In this study, we present a method for computing randomly sampled EMs. In this approach, a network reduction algorithm is used for EM computation, which is based on flux balance-based methods. We show that this approach can be used to recover the EMs in the medium- and genome-scale metabolic network models, while the EMs are sampled in an unbiased way. The applicability of such results is shown by computing "estimated" control-effective flux values in Escherichia coli metabolic network.
Asunto(s)
Redes y Vías Metabólicas , Modelos Biológicos , Biología de Sistemas/métodos , Escherichia coli/genética , Escherichia coli/metabolismoRESUMEN
Cross-validation (CV) is a technique to assess the generalizability of a model to unseen data. This technique relies on assumptions that may not be satisfied when studying genomics datasets. For example, random CV (RCV) assumes that a randomly selected set of samples, the test set, well represents unseen data. This assumption doesn't hold true where samples are obtained from different experimental conditions, and the goal is to learn regulatory relationships among the genes that generalize beyond the observed conditions. In this study, we investigated how the CV procedure affects the assessment of supervised learning methods used to learn gene regulatory networks (or in other applications). We compared the performance of a regression-based method for gene expression prediction estimated using RCV with that estimated using a clustering-based CV (CCV) procedure. Our analysis illustrates that RCV can produce over-optimistic estimates of the model's generalizability compared to CCV. Next, we defined the 'distinctness' of test set from training set and showed that this measure is predictive of performance of the regression method. Finally, we introduced a simulated annealing method to construct partitions with gradually increasing distinctness and showed that performance of different gene expression prediction methods can be better evaluated using this method.
Asunto(s)
Redes Reguladoras de Genes , Genómica , Modelos Biológicos , Algoritmos , Biología Computacional/métodos , Perfilación de la Expresión Génica , Genómica/métodos , Humanos , Neoplasias/genética , Reproducibilidad de los ResultadosRESUMEN
Food-intake control is mediated by a heterogeneous network of different neural subtypes, distributed over various hypothalamic nuclei and other brain structures, in which each subtype can release more than one neurotransmitter or neurohormone. The complexity of the interactions of these subtypes poses a challenge to understanding their specific contributions to food-intake control, and apparent consistencies in the dataset can be contradicted by new findings. For example, the growing consensus that arcuate nucleus neurons expressing Agouti-related peptide (AgRP neurons) promote feeding, while those expressing pro-opiomelanocortin (POMC neurons) suppress feeding, is contradicted by findings that low AgRP neuron activity and high POMC neuron activity can be associated with high levels of food intake. Similarly, the growing consensus that GABAergic neurons in the lateral hypothalamus suppress feeding is contradicted by findings suggesting the opposite. Yet the complexity of the food-intake control network admits many different network behaviors. It is possible that anomalous associations between the responses of certain neural subtypes and feeding are actually consistent with known interactions, but their effect on feeding depends on the responses of the other neural subtypes in the network. We explored this possibility through computational analysis. We made a computer model of the interactions between the hypothalamic and other neural subtypes known to be involved in food-intake control, and optimized its parameters so that model behavior matched observed behavior over an extensive test battery. We then used specialized computational techniques to search the entire model state space, where each state represents a different configuration of the responses of the units (model neural subtypes) in the network. We found that the anomalous associations between the responses of certain hypothalamic neural subtypes and feeding are actually consistent with the known structure of the food-intake control network, and we could specify the ways in which the anomalous configurations differed from the expected ones. By analyzing the temporal relationships between different states we identified the conditions under which the anomalous associations can occur, and these stand as model predictions.