RESUMEN
Recursive splicing (RS) starts by defining an "RS-exon," which is then spliced to the preceding exon, thus creating a recursive 5' splice site (RS-5ss). Previous studies focused on cryptic RS-exons, and now we find that the exon junction complex (EJC) represses RS of hundreds of annotated, mainly constitutive RS-exons. The core EJC factors, and the peripheral factors PNN and RNPS1, maintain RS-exon inclusion by repressing spliceosomal assembly on RS-5ss. The EJC also blocks 5ss located near exon-exon junctions, thus repressing inclusion of cryptic microexons. The prevalence of annotated RS-exons is high in deuterostomes, while the cryptic RS-exons are more prevalent in Drosophila, where EJC appears less capable of repressing RS. Notably, incomplete repression of RS also contributes to physiological alternative splicing of several human RS-exons. Finally, haploinsufficiency of the EJC factor Magoh in mice is associated with skipping of RS-exons in the brain, with relevance to the microcephaly phenotype and human diseases.
Asunto(s)
Empalme Alternativo/fisiología , Exones/fisiología , Sitios de Empalme de ARN/fisiología , Animales , Línea Celular , Núcleo Celular , Drosophila , Células HEK293 , Células HeLa , Humanos , Intrones , Células K562 , Ratones , Proteínas Nucleares , Precursores del ARN/fisiología , Empalme del ARN/fisiología , ARN Mensajero/genética , Proteínas de Unión al ARN , Ribonucleoproteínas/fisiología , Transcriptoma/genéticaRESUMEN
Understanding the regulatory interactions that control gene expression during the development of novel tissues is a key goal of evolutionary developmental biology. Here, we show that Mbnl3 has undergone a striking process of evolutionary specialization in eutherian mammals resulting in the emergence of a novel placental function for the gene. Mbnl3 belongs to a family of RNA-binding proteins whose members regulate multiple aspects of RNA metabolism. We find that, in eutherians, while both Mbnl3 and its paralog Mbnl2 are strongly expressed in placenta, Mbnl3 expression has been lost from nonplacental tissues in association with the evolution of a novel promoter. Moreover, Mbnl3 has undergone accelerated protein sequence evolution leading to changes in its RNA-binding specificities and cellular localization. While Mbnl2 and Mbnl3 share partially redundant roles in regulating alternative splicing, polyadenylation site usage and, in turn, placenta maturation, Mbnl3 has also acquired novel biological functions. Specifically, Mbnl3 knockout (M3KO) alone results in increased placental growth associated with higher Myc expression. Furthermore, Mbnl3 loss increases fetal resource allocation during limiting conditions, suggesting that location of Mbnl3 on the X chromosome has led to its role in limiting placental growth, favoring the maternal side of the parental genetic conflict.
Asunto(s)
Placenta , Proteínas de Unión al ARN , Empalme Alternativo/genética , Animales , Euterios/genética , Femenino , Placenta/metabolismo , Embarazo , ARN/metabolismo , Proteínas de Unión al ARN/genética , Proteínas de Unión al ARN/metabolismoRESUMEN
Although splicing occurs largely co-transcriptionally, the order by which introns are removed does not necessarily follow the order in which they are transcribed. Whereas several genomic features are known to influence whether or not an intron is spliced before its downstream neighbor, multiple questions related to adjacent introns' splicing order (AISO) remain unanswered. Here, we present Insplico, the first standalone software for quantifying AISO that works with both short and long read sequencing technologies. We first demonstrate its applicability and effectiveness using simulated reads and by recapitulating previously reported AISO patterns, which unveiled overlooked biases associated with long read sequencing. We next show that AISO around individual exons is remarkably constant across cell and tissue types and even upon major spliceosomal disruption, and it is evolutionarily conserved between human and mouse brains. We also establish a set of universal features associated with AISO patterns across various animal and plant species. Finally, we used Insplico to investigate AISO in the context of tissue-specific exons, particularly focusing on SRRM4-dependent microexons. We found that the majority of such microexons have non-canonical AISO, in which the downstream intron is spliced first, and we suggest two potential modes of SRRM4 regulation of microexons related to their AISO and various splicing-related features. Insplico is available on gitlab.com/aghr/insplico.
Asunto(s)
Genoma , Empalme del ARN , Animales , Ratones , Humanos , Intrones/genética , RNA-Seq , Empalme del ARN/genética , Empalmosomas/genética , Empalme Alternativo , Proteínas del Tejido Nervioso/genéticaRESUMEN
Pre-mRNA splicing is a critical step of gene expression in eukaryotes. Transcriptome-wide splicing patterns are complex and primarily regulated by a diverse set of recognition elements and associated RNA-binding proteins. The retention and splicing (RES) complex is formed by three different proteins (Bud13p, Pml1p and Snu17p) and is involved in splicing in yeast. However, the importance of the RES complex for vertebrate splicing, the intronic features associated with its activity, and its role in development are unknown. In this study, we have generated loss-of-function mutants for the three components of the RES complex in zebrafish and showed that they are required during early development. The mutants showed a marked neural phenotype with increased cell death in the brain and a decrease in differentiated neurons. Transcriptomic analysis of bud13, snip1 (pml1) and rbmx2 (snu17) mutants revealed a global defect in intron splicing, with strong mis-splicing of a subset of introns. We found these RES-dependent introns were short, rich in GC and flanked by GC depleted exons, all of which are features associated with intron definition. Using these features, we developed and validated a predictive model that classifies RES dependent introns. Altogether, our study uncovers the essential role of the RES complex during vertebrate development and provides new insights into its function during splicing.
Asunto(s)
Proteínas Portadoras/metabolismo , Intrones/genética , Empalme del ARN/fisiología , Proteínas de Pez Cebra/metabolismo , Pez Cebra/embriología , Animales , Animales Modificados Genéticamente , Encéfalo/embriología , Proteínas Portadoras/genética , Embrión no Mamífero , Femenino , Regulación del Desarrollo de la Expresión Génica , Modelos Logísticos , Mutación con Pérdida de Función , Masculino , Modelos Genéticos , Proteínas de Pez Cebra/genéticaRESUMEN
Alternative splicing (AS) generates remarkable regulatory and proteomic complexity in metazoans. However, the functions of most AS events are not known, and programs of regulated splicing remain to be identified. To address these challenges, we describe the Vertebrate Alternative Splicing and Transcription Database (VastDB), the largest resource of genome-wide, quantitative profiles of AS events assembled to date. VastDB provides readily accessible quantitative information on the inclusion levels and functional associations of AS events detected in RNA-seq data from diverse vertebrate cell and tissue types, as well as developmental stages. The VastDB profiles reveal extensive new intergenic and intragenic regulatory relationships among different classes of AS and previously unknown and conserved landscapes of tissue-regulated exons. Contrary to recent reports concluding that nearly all human genes express a single major isoform, VastDB provides evidence that at least 48% of multiexonic protein-coding genes express multiple splice variants that are highly regulated in a cell/tissue-specific manner, and that >18% of genes simultaneously express multiple major isoforms across diverse cell and tissue types. Isoforms encoded by the latter set of genes are generally coexpressed in the same cells and are often engaged by translating ribosomes. Moreover, they are encoded by genes that are significantly enriched in functions associated with transcriptional control, implying they may have an important and wide-ranging role in controlling cellular activities. VastDB thus provides an unprecedented resource for investigations of AS function and regulation.
Asunto(s)
Empalme Alternativo , Bases de Datos de Ácidos Nucleicos , Exones , Redes Reguladoras de Genes , Isoformas de Proteínas , Animales , Pollos , Humanos , Ratones , Isoformas de Proteínas/biosíntesis , Isoformas de Proteínas/genéticaRESUMEN
Summary: Tracking thousands of alternative splicing (AS) events genome-wide makes their downstream analysis computationally challenging and laborious. Here, we present Matt, the first UNIX command-line toolkit with focus on high-level AS analyses. With 50 commands it facilitates computational AS analyses by (i) expediting repetitive data-preparation tasks, (ii) offering routine high-level analyses, including the extraction of exon/intron features, discriminative feature detection, motif enrichment analysis, and the generation of motif RNA-maps, (iii) improving reproducibility by documenting all analysis steps and (iv) accelerating the implementation of own analysis pipelines by offering users to exploit its modular functionality. Availability and implementation: matt.crg.eu under GNU LGPLv3, together with comprehensive documentation and application examples. Matt is implemented in Perl and R, invokes pdfLATEX and depends only on Perl Core modules/the R Base package simplifying its installation. Supplementary information: Supplementary data are available at Bioinformatics online.
Asunto(s)
Empalme Alternativo , Motivos de Nucleótidos , Programas Informáticos , Biología Computacional , Exones , Genoma , Reproducibilidad de los ResultadosRESUMEN
The spliceosome is the complex molecular machinery that sequentially assembles on eukaryotic messenger RNA precursors to remove introns (pre-mRNA splicing), a physiologically regulated process altered in numerous pathologies. We report transcriptome-wide analyses upon systematic knock down of 305 spliceosome components and regulators in human cancer cells and the reconstruction of functional splicing factor networks that govern different classes of alternative splicing decisions. The results disentangle intricate circuits of splicing factor cross-regulation, reveal that the precise architecture of late-assembling U4/U6.U5 tri-small nuclear ribonucleoprotein (snRNP) complexes regulates splice site pairing, and discover an unprecedented division of labor among protein components of U1 snRNP for regulating exon definition and alternative 5' splice site selection. Thus, we provide a resource to explore physiological and pathological mechanisms of splicing regulation.
Asunto(s)
Empalme Alternativo , Empalmosomas , Transcriptoma , Empalmosomas/metabolismo , Humanos , Sitios de Empalme de ARN , Exones , Factores de Empalme de ARN/metabolismo , Factores de Empalme de ARN/genética , Ribonucleoproteína Nuclear Pequeña U1/metabolismo , Intrones , Ribonucleoproteína Nuclear Pequeña U4-U6/metabolismo , Ribonucleoproteína Nuclear Pequeña U4-U6/genética , Línea Celular Tumoral , Ribonucleoproteína Nuclear Pequeña U5/metabolismo , Redes Reguladoras de Genes , Técnicas de Silenciamiento del Gen , Empalme del ARNRESUMEN
Array-based comparative genomic hybridization (Array-CGH) is an important technology in molecular biology for the detection of DNA copy number polymorphisms between closely related genomes. Hidden Markov Models (HMMs) are popular tools for the analysis of Array-CGH data, but current methods are only based on first-order HMMs having constrained abilities to model spatial dependencies between measurements of closely adjacent chromosomal regions. Here, we develop parsimonious higher-order HMMs enabling the interpolation between a mixture model ignoring spatial dependencies and a higher-order HMM exhaustively modeling spatial dependencies. We apply parsimonious higher-order HMMs to the analysis of Array-CGH data of the accessions C24 and Col-0 of the model plant Arabidopsis thaliana. We compare these models against first-order HMMs and other existing methods using a reference of known deletions and sequence deviations. We find that parsimonious higher-order HMMs clearly improve the identification of these polymorphisms. Moreover, we perform a functional analysis of identified polymorphisms revealing novel details of genomic differences between C24 and Col-0. Additional model evaluations are done on widely considered Array-CGH data of human cell lines indicating that parsimonious HMMs are also well-suited for the analysis of non-plant specific data. All these results indicate that parsimonious higher-order HMMs are useful for Array-CGH analyses. An implementation of parsimonious higher-order HMMs is available as part of the open source Java library Jstacs (www.jstacs.de/index.php/PHHMM).
Asunto(s)
Arabidopsis/genética , Hibridación Genómica Comparativa/métodos , Cadenas de Markov , Genoma Humano , Genoma de Planta , Humanos , Polimorfismo GenéticoRESUMEN
Splicing factor 3B subunit 1 (SF3B1) is involved in pre-mRNA branch site recognition and is the target of antitumor-splicing inhibitors. Mutations in SF3B1 are observed in 15% of patients with chronic lymphocytic leukemia (CLL) and are associated with poor prognosis, but their pathogenic mechanisms remain poorly understood. Using deep RNA-sequencing data from 298 CLL tumor samples and isogenic SF3B1 WT and K700E-mutated CLL cell lines, we characterize targets and pre-mRNA sequence features associated with the selection of cryptic 3' splice sites upon SF3B1 mutation, including an event in the MAP3K7 gene relevant for activation of NF-κB signaling. Using the H3B-8800 splicing modulator, we show, for the first time in CLL, cytotoxic effects in vitro in primary CLL samples and in SF3B1-mutated isogenic CLL cell lines, accompanied by major splicing changes and delayed leukemic infiltration in a CLL xenotransplant mouse model. H3B-8800 displayed preferential lethality towards SF3B1-mutated cells and synergism with the BCL2 inhibitor venetoclax, supporting the potential use of SF3B1 inhibitors as a novel therapeutic strategy in CLL.
Asunto(s)
Antineoplásicos , Leucemia Linfocítica Crónica de Células B , Ratones , Animales , Leucemia Linfocítica Crónica de Células B/tratamiento farmacológico , Leucemia Linfocítica Crónica de Células B/genética , Leucemia Linfocítica Crónica de Células B/patología , Factores de Empalme de ARN/genética , Precursores del ARN , Fosfoproteínas/genética , Mutación/genética , Sitios de Empalme de ARN , Factores de Transcripción/genéticaRESUMEN
Alternative splicing (AS) can vastly expand animal transcriptomes and proteomes. Two main open questions in the field are how AS is regulated across cell/tissue types and disease, and what roles different AS events play. To facilitate AS research, we have created the computational VastDB framework, which comprises a series of complementary software and resources that we describe in this chapter. The VastDB framework is especially designed to aid biomedical researchers without a strong computational background. It offers tools and resources to: (a) quantify AS and identify differentially spliced AS events using RNA-seq data (vast-tools), (b) perform multiple genomic and sequence analyses for investigating AS events (Matt), (c) identify AS events with genomic and regulatory conservation among species (ExOrthist), and (d) help with the biological interpretation of the results, and, ultimately, with the identification of interesting AS events to design wet-lab experiments (VastDB and PastDB).
Asunto(s)
Empalme Alternativo , Programas Informáticos , Animales , Biología Computacional/métodos , Exones , Genoma , Genómica/métodosRESUMEN
The persistence of latent HIV reservoirs allows for viral rebound upon antiretroviral therapy interruption, hindering effective HIV-1 cure. Emerging evidence suggests that modulation of innate immune stimulation could impact viral latency and contribute to the clearing of HIV reservoir. Here, the latency reactivation capacity of a subclass of selective JAK2 inhibitors was characterized as a potential novel therapeutic strategy for HIV-1 cure. Notably, JAK2 inhibitors reversed HIV-1 latency in non-clonal lymphoid and myeloid in vitro models of HIV-1 latency and also ex vivo in CD4+ T cells from ART+ PWH, albeit its function was not dependent on JAK2 expression. Immunophenotypic characterization and whole transcriptomic profiling supported reactivation data, showing common gene expression signatures between latency reactivating agents (LRA; JAK2i fedratinib and PMA) in contrast to other JAK inhibitors, but with significantly fewer affected gene sets in the pathway analysis. In depth evaluation of differentially expressed genes, identified a significant upregulation of IRF7 expression despite the blockade of the JAK-STAT pathway and downregulation of proinflammatory cytokines and chemokines. Moreover, IRF7 expression levels positively correlated with HIV latency reactivation capacity of JAK2 inhibitors and also other common LRAs. Collectively, these results represent a promising step towards HIV eradication by demonstrating the potential of innate immune modulation for reducing the viral reservoir through a novel pathway driven by IRF7.
Asunto(s)
Infecciones por VIH , VIH-1 , Inhibidores de las Cinasas Janus , Citocinas/farmacología , Infecciones por VIH/tratamiento farmacológico , Humanos , Inhibidores de las Cinasas Janus/uso terapéutico , Quinasas Janus , Factores de Transcripción STAT , Transducción de Señal , Activación Viral , Latencia del VirusRESUMEN
Transition from maternal to embryonic transcriptional control is crucial for embryogenesis. However, alternative splicing regulation during this process remains understudied. Using transcriptomic data from human, mouse, and cow preimplantation development, we show that the stage of zygotic genome activation (ZGA) exhibits the highest levels of exon skipping diversity reported for any cell or tissue type. Much of this exon skipping is temporary, leads to disruptive noncanonical isoforms, and occurs in genes enriched for DNA damage response in the three species. Two core spliceosomal components, Snrpb and Snrpd2, regulate these patterns. These genes have low maternal expression at ZGA and increase sharply thereafter. Microinjection of Snrpb/d2 messenger RNA into mouse zygotes reduces the levels of exon skipping at ZGA and leads to increased p53-mediated DNA damage response. We propose that mammalian embryos undergo an evolutionarily conserved, developmentally programmed splicing failure at ZGA that contributes to the attenuation of cellular responses to DNA damage.
Asunto(s)
Regulación del Desarrollo de la Expresión Génica , Cigoto , Animales , Bovinos , Daño del ADN , Embrión de Mamíferos , Desarrollo Embrionario/genética , Femenino , Genoma , Mamíferos/genética , Ratones , Cigoto/metabolismoRESUMEN
Variable order Markov models and variable order Bayesian trees have been proposed for the recognition of cis-regulatory elements, and it has been demonstrated that they outperform traditional models such as position weight matrices, Markov models, and Bayesian trees for the recognition of binding sites in prokaryotes. Here, we study to which degree variable order models can improve the recognition of eukaryotic cis-regulatory elements. We find that variable order models can improve the recognition of binding sites of all the studied transcription factors. To ease a systematic evaluation of different model combinations based on problem-specific data sets and allow genomic scans of cis-regulatory elements based on fixed and variable order Markov models and Bayesian trees, we provide the VOMBATserver to the public community.
Asunto(s)
Algoritmos , Mapeo Cromosómico/métodos , Modelos Genéticos , Elementos Reguladores de la Transcripción/genética , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Factores de Transcripción/genética , Teorema de Bayes , Simulación por Computador , Cadenas de Markov , Modelos Estadísticos , Reconocimiento de Normas Patrones Automatizadas/métodosRESUMEN
Several splicing-modulating compounds, including Sudemycins and Spliceostatin A, display anti-tumor properties. Combining transcriptome, bioinformatic and mutagenesis analyses, we delineate sequence determinants of the differential sensitivity of 3' splice sites to these drugs. Sequences 5' from the branch point (BP) region strongly influence drug sensitivity, with additional functional BPs reducing, and BP-like sequences allowing, drug responses. Drug-induced retained introns are typically shorter, displaying higher GC content and weaker polypyrimidine-tracts and BPs. Drug-induced exon skipping preferentially affects shorter alternatively spliced regions with weaker BPs. Remarkably, structurally similar drugs display both common and differential effects on splicing regulation, SSA generally displaying stronger effects on intron retention, and Sudemycins more acute effects on exon skipping. Collectively, our results illustrate how splicing modulation is exquisitely sensitive to the sequence context of 3' splice sites and to small structural differences between drugs.
Asunto(s)
Antineoplásicos/farmacología , Precursores del ARN/genética , Sitios de Empalme de ARN/genética , Ribonucleoproteína Nuclear Pequeña U2/genética , Animales , Supervivencia Celular/efectos de los fármacos , Supervivencia Celular/genética , Células HeLa , Humanos , Ratones , Células 3T3 NIH , Piranos/farmacología , Empalme del ARN/efectos de los fármacos , Empalme del ARN/genética , Compuestos de Espiro/farmacología , Empalmosomas/efectos de los fármacos , Empalmosomas/genéticaRESUMEN
Epithelial-mesenchymal interactions are crucial for the development of numerous animal structures. Thus, unraveling how molecular tools are recruited in different lineages to control interplays between these tissues is key to understanding morphogenetic evolution. Here, we study Esrp genes, which regulate extensive splicing programs and are essential for mammalian organogenesis. We find that Esrp homologs have been independently recruited for the development of multiple structures across deuterostomes. Although Esrp is involved in a wide variety of ontogenetic processes, our results suggest ancient roles in non-neural ectoderm and regulating specific mesenchymal-to-epithelial transitions in deuterostome ancestors. However, consistent with the extensive rewiring of Esrp-dependent splicing programs between phyla, most developmental defects observed in vertebrate mutants are related to other types of morphogenetic processes. This is likely connected to the origin of an event in Fgfr, which was recruited as an Esrp target in stem chordates and subsequently co-opted into the development of many novel traits in vertebrates.
Asunto(s)
Desarrollo Embrionario/genética , Transición Epitelial-Mesenquimal/fisiología , Empalme del ARN/fisiología , Proteínas de Unión al ARN/fisiología , Animales , Evolución Biológica , Sistemas CRISPR-Cas , Exones/fisiología , Femenino , Regulación del Desarrollo de la Expresión Génica/fisiología , Técnicas de Silenciamiento del Gen , Anfioxos , Masculino , Mutación , Proteínas de Unión al ARN/genética , Homología de Secuencia de Aminoácido , Transducción de Señal/genética , Strongylocentrotus purpuratus , Urocordados , Pez CebraRESUMEN
Cellular responses to starvation are of ancient origin since nutrient limitation has always been a common challenge to the stability of living systems. Hence, signaling molecules involved in sensing or transducing information about limiting metabolites are highly conserved, whereas transcription factors and the genes they regulate have diverged. In eukaryotes the AMP-activated protein kinase (AMPK) functions as a central regulator of cellular energy homeostasis. The yeast AMPK ortholog SNF1 controls the transcriptional network that counteracts carbon starvation conditions by regulating a set of transcription factors. Among those Cat8 and Sip4 have overlapping DNA-binding specificity for so-called carbon source responsive elements and induce target genes upon SNF1 activation. To analyze the evolution of the Cat8-Sip4 controlled transcriptional network we have compared the response to carbon limitation of Saccharomyces cerevisiae to that of Kluyveromyces lactis. In high glucose, S. cerevisiae displays tumor cell-like aerobic fermentation and repression of respiration (Crabtree-positive) while K. lactis has a respiratory-fermentative life-style, respiration being regulated by oxygen availability (Crabtree-negative), which is typical for many yeasts and for differentiated higher cells. We demonstrate divergent evolution of the Cat8-Sip4 network and present evidence that a role of Sip4 in controlling anabolic metabolism has been lost in the Saccharomyces lineage. We find that in K. lactis, but not in S. cerevisiae, the Sip4 protein plays an essential role in C2 carbon assimilation including induction of the glyoxylate cycle and the carnitine shuttle genes. Induction of KlSIP4 gene expression by KlCat8 is essential under these growth conditions and a primary function of KlCat8. Both KlCat8 and KlSip4 are involved in the regulation of lactose metabolism in K. lactis. In chromatin-immunoprecipitation experiments we demonstrate binding of both, KlSip4 and KlCat8, to selected CSREs and provide evidence that KlSip4 counteracts KlCat8-mediated transcription activation by competing for binding to some but not all CSREs. The finding that the hierarchical relationship of these transcription factors differs between K. lactis and S. cerevisiae and that the sets of target genes have diverged contributes to explaining the phenotypic differences in metabolic life-style.
Asunto(s)
Factores de Transcripción con Cremalleras de Leucina de Carácter Básico/genética , Regulación Fúngica de la Expresión Génica , Redes Reguladoras de Genes , Proteínas de Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/genética , Activación TranscripcionalRESUMEN
The binding affinity of DNA-binding proteins such as transcription factors is mainly determined by the base composition of the corresponding binding site on the DNA strand. Most proteins do not bind only a single sequence, but rather a set of sequences, which may be modeled by a sequence motif. Algorithms for de novo motif discovery differ in their promoter models, learning approaches, and other aspects, but typically use the statistically simple position weight matrix model for the motif, which assumes statistical independence among all nucleotides. However, there is no clear justification for that assumption, leading to an ongoing debate about the importance of modeling dependencies between nucleotides within binding sites. In the past, modeling statistical dependencies within binding sites has been hampered by the problem of limited data. With the rise of high-throughput technologies such as ChIP-seq, this situation has now changed, making it possible to make use of statistical dependencies effectively. In this work, we investigate the presence of statistical dependencies in binding sites of the human enhancer-blocking insulator protein CTCF by using the recently developed model class of inhomogeneous parsimonious Markov models, which is capable of modeling complex dependencies while avoiding overfitting. These findings lead to a more detailed characterization of the CTCF binding motif, which is only poorly represented by independent nucleotide frequencies at several positions, predominantly at the 3' end.
Asunto(s)
Algoritmos , Proteínas de Unión al ADN/genética , Modelos Genéticos , Motivos de Nucleótidos/genética , Proteínas Represoras/genética , Secuencia de Bases , Sitios de Unión/genética , Factor de Unión a CCCTC , Línea Celular , Células Cultivadas , Proteínas de Unión al ADN/metabolismo , Células HeLa , Células Hep G2 , Humanos , Células K562 , Células MCF-7 , Cadenas de Markov , Unión Proteica , Proteínas Represoras/metabolismoRESUMEN
DNA-binding proteins are a main component of gene regulation as they activate or repress gene expression by binding to specific binding sites in target regions of genomic DNA. However, de-novo discovery of these binding sites in target regions obtained by wet-lab experiments is a challenging problem in computational biology, which has not yet been solved satisfactorily. Here, we present a detailed description and analysis of the de-novo motif discovery tool Dispom, which has been developed for finding binding sites of DNA-binding proteins that are differentially abundant in a set of target regions compared to a set of control regions. Two additional features of Dispom are its capability of modeling positional preferences of binding sites and adjusting the length of the motif in the learning process. Dispom yields an increased prediction accuracy compared to existing tools for de-novo motif discovery, suggesting that the combination of searching for differentially abundant motifs, inferring their positional distributions, and adjusting the motif lengths is beneficial for de-novo motif discovery. When applying Dispom to promoters of auxin-responsive genes and those of ABI3 target genes from Arabidopsis thaliana, we identify relevant binding motifs with pronounced positional distributions. These results suggest that learning motifs, their positional distributions, and their lengths by a discriminative learning principle may aid motif discovery from ChIP-chip and gene expression data. We make Dispom freely available as part of Jstacs, an open-source Java library that is tailored to statistical sequence analysis. To facilitate extensions of Dispom, we describe its implementation using Jstacs in this manuscript. In addition, we provide a stand-alone application of Dispom at http://www.jstacs.de/index.php/Dispom for instant use.
Asunto(s)
Proteínas de Unión al ADN/genética , ADN/genética , Programas Informáticos , Factores de Transcripción/genética , Secuencias de Aminoácidos , Sitios de Unión , Unión ProteicaRESUMEN
Many different computer programs for the prediction of transcription factor binding sites have been developed over the last decades. These programs differ from each other by pursuing different objectives and by taking into account different sources of information. For methods based on statistical approaches, these programs differ at an elementary level from each other by the statistical models used for individual binding sites and flanking sequences and by the learning principles employed for estimating the model parameters. According to our experience, both the models and the learning principles should be chosen with great care, depending on the specific task at hand, but many existing programs do not allow the user to choose them freely. Hence, we developed Jstacs, an object-oriented Java framework for sequence analysis, which allows the user to combine different statistical models and different learning principles in a modular manner with little effort. In this chapter we explain how Jstacs can be used for the recognition of transcription factor binding sites.
Asunto(s)
Biología Computacional/métodos , Factores de Transcripción/metabolismo , Secuencia de Bases , Sitios de Unión , Humanos , Funciones de Verosimilitud , Regiones Promotoras Genéticas/genética , Receptores de Esteroides/metabolismo , Reproducibilidad de los Resultados , Programas InformáticosRESUMEN
Multicomponent Passerini and Ugi reactions enable the fast and efficient synthesis of redox-active multifunctional selenium and tellurium compounds, of which some show considerable cytotoxicity against specific cancer cells.