Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
1.
PLoS Comput Biol ; 17(4): e1008909, 2021 04.
Artículo en Inglés | MEDLINE | ID: mdl-33861755

RESUMEN

Long regulatory elements (LREs), such as CpG islands, polydA:dT tracts or AU-rich elements, are thought to play key roles in gene regulation but, as opposed to conventional binding sites of transcription factors, few methods have been proposed to formally and automatically characterize them. We present here a computational approach named DExTER (Domain Exploration To Explain gene Regulation) dedicated to the identification of candidate LREs (cLREs) and apply it to the analysis of the genomes of P. falciparum and other eukaryotes. Our analyses show that all tested genomes contain several cLREs that are somewhat conserved along evolution, and that gene expression can be predicted with surprising accuracy on the basis of these long regions only. Regulation by cLREs exhibits very different behaviours depending on species and conditions. In P. falciparum and other Apicomplexan organisms as well as in Dictyostelium discoideum, the process appears highly dynamic, with different cLREs involved at different phases of the life cycle. For multicellular organisms, the same cLREs are involved in all tissues, but a dynamic behavior is observed along embryonic development stages. In P. falciparum, whose genome is known to be strongly depleted of transcription factors, cLREs are predictive of expression with an accuracy above 70%, and our analyses show that they are associated with both transcriptional and post-transcriptional regulation signals. Moreover, we assessed the biological relevance of one LRE discovered by DExTER in P. falciparum using an in vivo reporter assay. The source code (python) of DExTER is available at https://gite.lirmm.fr/menichelli/DExTER.


Asunto(s)
Genoma de Protozoos , Plasmodium falciparum/genética , Secuencias Reguladoras de Ácidos Nucleicos , Eucariontes/genética , Regulación de la Expresión Génica , Ontología de Genes , Genes Reporteros , Histonas/metabolismo , Procesamiento Postranscripcional del ARN , ARN sin Sentido/genética , ARN Mensajero/genética , Transcripción Genética
2.
PLoS Comput Biol ; 14(1): e1005889, 2018 01.
Artículo en Inglés | MEDLINE | ID: mdl-29293498

RESUMEN

Comparing and aligning protein sequences is an essential task in bioinformatics. More specifically, local alignment tools like BLAST are widely used for identifying conserved protein sub-sequences, which likely correspond to protein domains or functional motifs. However, to limit the number of false positives, these tools are used with stringent sequence-similarity thresholds and hence can miss several hits, especially for species that are phylogenetically distant from reference organisms. A solution to this problem is then to integrate additional contextual information to the procedure. Here, we propose to use domain co-occurrence to increase the sensitivity of pairwise sequence comparisons. Domain co-occurrence is a strong feature of proteins, since most protein domains tend to appear with a limited number of other domains on the same protein. We propose a method to take this information into account in a typical BLAST analysis and to construct new domain families on the basis of these results. We used Plasmodium falciparum as a case study to evaluate our method. The experimental findings showed an increase of 14% of the number of significant BLAST hits and an increase of 25% of the proteome area that can be covered with a domain. Our method identified 2240 new domains for which, in most cases, no model of the Pfam database could be linked. Moreover, our study of the quality of the new domains in terms of alignment and physicochemical properties show that they are close to that of standard Pfam domains. Source code of the proposed approach and supplementary data are available at: https://gite.lirmm.fr/menichelli/pairwise-comparison-with-cooccurrence.


Asunto(s)
Proteínas/química , Proteínas/genética , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Algoritmos , Secuencia de Aminoácidos , Biología Computacional , Bases de Datos de Proteínas , Plasmodium falciparum/química , Plasmodium falciparum/genética , Dominios Proteicos , Proteínas Protozoarias/química , Proteínas Protozoarias/genética , Alineación de Secuencia/estadística & datos numéricos , Análisis de Secuencia de Proteína/estadística & datos numéricos
3.
Genome Biol ; 25(1): 187, 2024 Jul 10.
Artículo en Inglés | MEDLINE | ID: mdl-38987807

RESUMEN

Characterizing the binding preferences of transcription factors (TFs) in different cell types and conditions is key to understand how they orchestrate gene expression. Here, we develop TFscope, a machine learning approach that identifies sequence features explaining the binding differences observed between two ChIP-seq experiments targeting either the same TF in two conditions or two TFs with similar motifs (paralogous TFs). TFscope systematically investigates differences in the core motif, nucleotide environment and co-factor motifs, and provides the contribution of each key feature in the two experiments. TFscope was applied to > 305 ChIP-seq pairs, and several examples are discussed.


Asunto(s)
Secuenciación de Inmunoprecipitación de Cromatina , Aprendizaje Automático , Factores de Transcripción , Factores de Transcripción/metabolismo , Sitios de Unión , Humanos , Motivos de Nucleótidos , Unión Proteica
4.
Nat Commun ; 12(1): 3297, 2021 06 02.
Artículo en Inglés | MEDLINE | ID: mdl-34078885

RESUMEN

Using the Cap Analysis of Gene Expression (CAGE) technology, the FANTOM5 consortium provided one of the most comprehensive maps of transcription start sites (TSSs) in several species. Strikingly, ~72% of them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers. Here, we probe these unassigned TSSs and show that, in all species studied, a significant fraction of CAGE peaks initiate at microsatellites, also called short tandem repeats (STRs). To confirm this transcription, we develop Cap Trap RNA-seq, a technology which combines cap trapping and long read MinION sequencing. We train sequence-based deep learning models able to predict CAGE signal at STRs with high accuracy. These models unveil the importance of STR surrounding sequences not only to distinguish STR classes, but also to predict the level of transcription initiation. Importantly, genetic variants linked to human diseases are preferentially found at STRs with high transcription initiation level, supporting the biological and clinical relevance of transcription initiation at STRs. Together, our results extend the repertoire of non-coding transcription associated with DNA tandem repeats and complexify STR polymorphism.


Asunto(s)
Repeticiones de Microsatélite , Redes Neurales de la Computación , Enfermedades Neurodegenerativas/genética , Sitio de Iniciación de la Transcripción , Iniciación de la Transcripción Genética , Células A549 , Animales , Secuencia de Bases , Biología Computacional/métodos , Aprendizaje Profundo , Elementos de Facilitación Genéticos , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Ratones , Enfermedades Neurodegenerativas/diagnóstico , Enfermedades Neurodegenerativas/metabolismo , Polimorfismo Genético , Regiones Promotoras Genéticas
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA