Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Resultados 1 - 20 de 75
Filtrar
1.
Bioinformatics ; 40(3)2024 Mar 04.
Artículo en Inglés | MEDLINE | ID: mdl-38407414

RESUMEN

MOTIVATION: Prediction and identification of core promoter elements and transcription factor binding sites is essential for understanding the mechanism of transcription initiation and deciphering the biological activity of a specific locus. Thus, there is a need for an up-to-date tool to detect and curate core promoter elements/motifs in any provided nucleotide sequences. RESULTS: Here, we introduce ElemeNT 2023-a new and enhanced version of the Elements Navigation Tool, which provides novel capabilities for assessing evolutionary conservation and for readily evaluating the quality of high-throughput transcription start site (TSS) datasets, leveraging preferential motif positioning. ElemeNT 2023 is accessible both as a fast web-based tool and via command line (no coding skills are required to run the tool). While this tool is focused on core promoter elements, it can also be used for searching any user-defined motif, including sequence-specific DNA binding sites. Furthermore, ElemeNT's CORE database, which contains predicted core promoter elements around annotated TSSs, is now expanded to cover 10 species, ranging from worms to human. In this applications note, we describe the new workflow and demonstrate a case study using ElemeNT 2023 for core promoter composition analysis of diverse species, revealing motif prevalence and highlighting evolutionary insights. We discuss how this tool facilitates the exploration of uncharted transcriptomic data, appraises TSS quality, and aids in designing synthetic promoters for gene expression optimization. Taken together, ElemeNT 2023 empowers researchers with comprehensive tools for meticulous analysis of sequence elements and gene expression strategies. AVAILABILITY AND IMPLEMENTATION: ElemeNT 2023 is freely available at https://www.juven-gershonlab.org/resources/element-v2023/. The source code and command line version of ElemeNT 2023 are available at https://github.com/OritAdato/ElemeNT. No coding skills are required to run the tool.


Asunto(s)
Programas Informáticos , Humanos , Regiones Promotoras Genéticas , Unión Proteica , Sitio de Iniciación de la Transcripción
2.
PLoS Comput Biol ; 17(8): e1009256, 2021 08.
Artículo en Inglés | MEDLINE | ID: mdl-34383743

RESUMEN

Metazoan core promoters, which direct the initiation of transcription by RNA polymerase II (Pol II), may contain short sequence motifs termed core promoter elements/motifs (e.g. the TATA box, initiator (Inr) and downstream core promoter element (DPE)), which recruit Pol II via the general transcription machinery. The DPE was discovered and extensively characterized in Drosophila, where it is strictly dependent on both the presence of an Inr and the precise spacing from it. Since the Drosophila DPE is recognized by the human transcription machinery, it is most likely that some human promoters contain a downstream element that is similar, though not necessarily identical, to the Drosophila DPE. However, only a couple of human promoters were shown to contain a functional DPE, and attempts to computationally detect human DPE-containing promoters have mostly been unsuccessful. Using a newly-designed motif discovery strategy based on Expectation-Maximization probabilistic partitioning algorithms, we discovered preferred downstream positions (PDP) in human promoters that resemble the Drosophila DPE. Available chromatin accessibility footprints revealed that Drosophila and human Inr+DPE promoter classes are not only highly structured, but also similar to each other, particularly in the proximal downstream region. Clustering of the corresponding sequence motifs using a neighbor-joining algorithm strongly suggests that canonical Inr+DPE promoters could be common to metazoan species. Using reporter assays we demonstrate the contribution of the identified downstream positions to the function of multiple human promoters. Furthermore, we show that alteration of the spacing between the Inr and PDP by two nucleotides results in reduced promoter activity, suggesting a spacing dependency of the newly discovered human PDP on the Inr. Taken together, our strategy identified novel functional downstream positions within human core promoters, supporting the existence of DPE-like motifs in human promoters.


Asunto(s)
Genoma Humano , Regiones Promotoras Genéticas , Algoritmos , Animales , Secuencia de Bases , Biología Computacional , Drosophila melanogaster/genética , Drosophila melanogaster/metabolismo , Regulación de la Expresión Génica , Células HEK293 , Humanos , Modelos Genéticos , Modelos Estadísticos , ARN Polimerasa II/metabolismo , Especificidad de la Especie , TATA Box , Transcripción Genética
3.
Nucleic Acids Res ; 48(D1): D65-D69, 2020 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-31680159

RESUMEN

The Eukaryotic Promoter Database (EPD), available online at https://epd.epfl.ch, provides accurate transcription start site (TSS) information for promoters of 15 model organisms plus corresponding functional genomics data that can be viewed in a genome browser, queried or analyzed via web interfaces, or exported in standard formats (FASTA, BED, CSV) for subsequent analysis with other tools. Recent work has focused on the improvement of the EPD promoter viewers, which use the UCSC Genome Browser as visualization platform. Thousands of high-resolution tracks for CAGE, ChIP-seq and similar data have been generated and organized into public track hubs. Customized, reproducible promoter views, combining EPD-supplied tracks with native UCSC Genome Browser tracks, can be accessed from the organism summary pages or from individual promoter entries. Moreover, thanks to recent improvements and stabilization of ncRNA gene catalogs, we were able to release promoter collections for certain classes of ncRNAs from human and mouse. Furthermore, we developed automatic computational protocols to assign orphan TSS peaks to downstream genes based on paired-end (RAMPAGE) TSS mapping data, which enabled us to add nearly 9000 new entries to the human promoter collection. Since our last article in this journal, EPD was extended to five more model organisms: rhesus monkey, rat, dog, chicken and Plasmodium falciparum.


Asunto(s)
Biología Computacional/métodos , Bases de Datos de Ácidos Nucleicos , Células Eucariotas/metabolismo , Genómica/métodos , Regiones Promotoras Genéticas , ARN no Traducido , Animales , Humanos , Programas Informáticos , Navegador Web
4.
Nat Methods ; 14(3): 316-322, 2017 03.
Artículo en Inglés | MEDLINE | ID: mdl-28092692

RESUMEN

Resolving the DNA-binding specificities of transcription factors (TFs) is of critical value for understanding gene regulation. Here, we present a novel, semiautomated protein-DNA interaction characterization technology, selective microfluidics-based ligand enrichment followed by sequencing (SMiLE-seq). SMiLE-seq is neither limited by DNA bait length nor biased toward strong affinity binders; it probes the DNA-binding properties of TFs over a wide affinity range in a fast and cost-effective fashion. We validated SMiLE-seq by analyzing 58 full-length human, mouse, and Drosophila TFs from distinct structural classes. All tested TFs yielded DNA-binding models with predictive power comparable to or greater than that of other in vitro assays. De novo motif discovery on all JUN-FOS heterodimers and several nuclear receptor-TF complexes provided novel insights into partner-specific heterodimer DNA-binding preferences. We also successfully analyzed the DNA-binding properties of uncharacterized human C2H2 zinc-finger proteins and validated several using ChIP-exo.


Asunto(s)
Dedos de Zinc CYS2-HIS2/fisiología , Proteínas de Unión al ADN/metabolismo , ADN/metabolismo , Proteínas Quinasas JNK Activadas por Mitógenos/metabolismo , Proteínas Proto-Oncogénicas c-fos/metabolismo , Factores de Transcripción/metabolismo , Animales , Sitios de Unión/genética , Biología Computacional , Drosophila/genética , Regulación de la Expresión Génica , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Proteínas Quinasas JNK Activadas por Mitógenos/genética , Ratones , Microfluídica/métodos , Proteínas Proto-Oncogénicas c-fos/genética , Análisis de Secuencia de ADN/métodos
5.
Bioinformatics ; 35(21): 4440-4441, 2019 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-31116370

RESUMEN

SUMMARY: We present SPar-K (Signal Partitioning with K-means), a method to search for archetypical chromatin architectures by partitioning a set of genomic regions characterized by chromatin signal profiles around ChIP-seq peaks and other kinds of functional sites. This method efficiently deals with problems of data heterogeneity, limited misalignment of anchor points and unknown orientation of asymmetric patterns. AVAILABILITY AND IMPLEMENTATION: SPar-K is a C++ program available on GitHub https://github.com/romaingroux/SPar-K and Docker Hub https://hub.docker.com/r/rgroux/spar-k. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Programas Informáticos , Cromatina , Inmunoprecipitación de Cromatina , Genoma , Genómica
6.
Nucleic Acids Res ; 46(D1): D175-D180, 2018 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-29069466

RESUMEN

The Mass Genome Annotation (MGA) repository is a resource designed to store published next generation sequencing data and other genome annotation data (such as gene start sites, SNPs, etc.) in a completely standardised format. Each sample has undergone local processing in order the meet the strict MGA format requirements. The original data source, the reformatting procedure and the biological characteristics of the samples are described in an accompanying documentation file manually edited by data curators. 10 model organisms are currently represented: Homo sapiens, Mus musculus, Danio rerio, Drosophila melanogaster, Apis mellifera, Caenorhabditis elegans, Arabidopsis thaliana, Zea mays, Saccharomyces cerevisiae and Schizosaccharomyces pombe. As of today, the resource contains over 24 000 samples. In conjunction with other tools developed by our group (the ChIP-Seq and SSA servers), it allows users to carry out a great variety of analysis task with MGA samples, such as making aggregation plots and heat maps for selected genomic regions, finding peak regions, generating custom tracks for visualizing genomic features in a UCSC genome browser window, or downloading chromatin data in a table format suitable for local processing with more advanced statistical analysis software such as R. Home page: http://ccg.vital-it.ch/mga/.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Animales , Inmunoprecipitación de Cromatina , Curaduría de Datos , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Internet , Anotación de Secuencia Molecular , Motor de Búsqueda
7.
Bioinformatics ; 34(14): 2483-2484, 2018 07 15.
Artículo en Inglés | MEDLINE | ID: mdl-29514181

RESUMEN

Summary: Transcription factors regulate gene expression by binding to specific short DNA sequences of 5-20 bp to regulate the rate of transcription of genetic information from DNA to messenger RNA. We present PWMScan, a fast web-based tool to scan server-resident genomes for matches to a user-supplied PWM or transcription factor binding site model from a public database. Availability and implementation: The web server and source code are available at http://ccg.vital-it.ch/pwmscan and https://sourceforge.net/projects/pwmscan, respectively. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Genómica/métodos , Posición Específica de Matrices de Puntuación , Secuencias Reguladoras de Ácidos Nucleicos , Programas Informáticos , Factores de Transcripción/metabolismo , ADN/metabolismo , Humanos , Unión Proteica
8.
Nucleic Acids Res ; 45(D1): D139-D144, 2017 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-27899579

RESUMEN

SNP2TFBS is a computational resource intended to support researchers investigating the molecular mechanisms underlying regulatory variation in the human genome. The database essentially consists of a collection of text files providing specific annotations for human single nucleotide polymorphisms (SNPs), namely whether they are predicted to abolish, create or change the affinity of one or several transcription factor (TF) binding sites. A SNP's effect on TF binding is estimated based on a position weight matrix (PWM) model for the binding specificity of the corresponding factor. These data files are regenerated at regular intervals by an automatic procedure that takes as input a reference genome, a comprehensive SNP catalogue and a collection of PWMs. SNP2TFBS is also accessible over a web interface, enabling users to view the information provided for an individual SNP, to extract SNPs based on various search criteria, to annotate uploaded sets of SNPs or to display statistics about the frequencies of binding sites affected by selected SNPs. Homepage: http://ccg.vital-it.ch/snp2tfbs/.


Asunto(s)
Sitios de Unión , Biología Computacional/métodos , Bases de Datos de Ácidos Nucleicos , Polimorfismo de Nucleótido Simple , Factores de Transcripción , Algoritmos , Genoma Humano , Genómica/métodos , Humanos , Unión Proteica , Factores de Transcripción/metabolismo , Navegador Web
9.
Nucleic Acids Res ; 45(D1): D51-D55, 2017 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-27899657

RESUMEN

We present an update of the Eukaryotic Promoter Database EPD (http://epd.vital-it.ch), more specifically on the EPDnew division, which contains comprehensive organisms-specific transcription start site (TSS) collections automatically derived from next generation sequencing (NGS) data. Thanks to the abundant release of new high-throughput transcript mapping data (CAGE, TSS-seq, GRO-cap) the database could be extended to plant and fungal species. We further report on the expansion of the mass genome annotation (MGA) repository containing promoter-relevant chromatin profiling data and on improvements for the EPD entry viewers. Finally, we present a new data access tool, ChIP-Extract, which enables computational biologists to extract diverse types of promoter-associated data in numerical table formats that are readily imported into statistical analysis platforms such as R.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Regiones Promotoras Genéticas , Animales , Eucariontes/genética , Hongos/genética , Humanos , Plantas/genética , Sitio de Iniciación de la Transcripción
10.
PLoS Comput Biol ; 12(10): e1005144, 2016 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-27716823

RESUMEN

The recruitment of RNA-Pol-II to the transcription start site (TSS) is an important step in gene regulation in all organisms. Core promoter elements (CPE) are conserved sequence motifs that guide Pol-II to the TSS by interacting with specific transcription factors (TFs). However, only a minority of animal promoters contains CPEs. It is still unknown how Pol-II selects the TSS in their absence. Here we present a comparative analysis of promoters' sequence composition and chromatin architecture in five eukaryotic model organisms, which shows the presence of common and unique DNA-encoded features used to organize chromatin. Analysis of Pol-II initiation patterns uncovers that, in the absence of certain CPEs, there is a strong correlation between the spread of initiation and the intensity of the 10 bp periodic signal in the nearest downstream nucleosome. Moreover, promoters' primary and secondary initiation sites show a characteristic 10 bp periodicity in the absence of CPEs. We also show that DNA natural variants in the region immediately downstream the TSS are able to affect both the nucleosome-DNA affinity and Pol-II initiation pattern. These findings support the notion that, in addition to CPEs mediated selection, sequence-induced nucleosome positioning could be a common and conserved mechanism of TSS selection in animals.


Asunto(s)
ADN/genética , Nucleosomas/genética , Regiones Promotoras Genéticas/genética , ARN Polimerasa II/genética , Sitio de Iniciación de la Transcripción/fisiología , Transcripción Genética/genética , Secuencia de Bases , Sitios de Unión , Simulación por Computador , Modelos Genéticos , Datos de Secuencia Molecular , Activación Transcripcional/genética
11.
Nucleic Acids Res ; 43(Database issue): D92-6, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25378343

RESUMEN

We present an update of EPDNew (http://epd.vital-it.ch), a recently introduced new part of the Eukaryotic Promoter Database (EPD) which has been described in more detail in a previous NAR Database Issue. EPD is an old database of experimentally characterized eukaryotic POL II promoters, which are conceptually defined as transcription initiation sites or regions. EPDnew is a collection of automatically compiled, organism-specific promoter lists complementing the old corpus of manually compiled promoter entries of EPD. This new part is exclusively derived from next generation sequencing data from high-throughput promoter mapping experiments. We report on the recent growth of EPDnew, its extension to additional model organisms and its improved integration with other bioinformatics resources developed by our group, in particular the Signal Search Analysis and ChIP-Seq web servers.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Regiones Promotoras Genéticas , Animales , Humanos , Internet , Ratones , Programas Informáticos , Sitio de Iniciación de la Transcripción
12.
BMC Bioinformatics ; 17 Suppl 1: 4, 2016 Jan 11.
Artículo en Inglés | MEDLINE | ID: mdl-26818008

RESUMEN

BACKGROUND: Understanding the mechanisms by which transcription factors (TF) are recruited to their physiological target sites is crucial for understanding gene regulation. DNA sequence intrinsic features such as predicted binding affinity are often not very effective in predicting in vivo site occupancy and in any case could not explain cell-type specific binding events. Recent reports show that chromatin accessibility, nucleosome occupancy and specific histone post-translational modifications greatly influence TF site occupancy in vivo. In this work, we use machine-learning methods to build predictive models and assess the relative importance of different sequence-intrinsic and chromatin features in the TF-to-target-site recruitment process. METHODS: Our study primarily relies on recent data published by the ENCODE consortium. Five dissimilar TFs assayed in multiple cell-types were selected as examples: CTCF, JunD, REST, GABP and USF2. We used two types of candidate target sites: (a) predicted sites obtained by scanning the whole genome with a position weight matrix, and (b) cell-type specific peak lists provided by ENCODE. Quantitative in vivo occupancy levels in different cell-types were based on ChIP-seq data for the corresponding TFs. In parallel, we computed a number of associated sequence-intrinsic and experimental features (histone modification, DNase I hypersensitivity, etc.) for each site. Machine learning algorithms were then used in a binary classification and regression framework to predict site occupancy and binding strength, for the purpose of assessing the relative importance of different contextual features. RESULTS: We observed striking differences in the feature importance rankings between the five factors tested. PWM-scores were amongst the most important features only for CTCF and REST but of little value for JunD and USF2. Chromatin accessibility and active histone marks are potent predictors for all factors except REST. Structural DNA parameters, repressive and gene body associated histone marks are generally of little or no predictive value. CONCLUSIONS: We define a general and extensible computational framework for analyzing the importance of various DNA-intrinsic and chromatin-associated features in determining cell-type specific TF binding to target sites. The application of our methodology to ENCODE data has led to new insights on transcription regulatory processes and may serve as example for future studies encompassing even larger datasets.


Asunto(s)
Algoritmos , Linaje de la Célula/genética , Cromatina/metabolismo , Regulación de la Expresión Génica , Aprendizaje Automático , Factores de Transcripción/metabolismo , Sitios de Unión/genética , Cromatina/genética , Inmunoprecipitación de Cromatina , ADN/genética , Genoma Humano , Histonas/metabolismo , Humanos , Nucleosomas/genética , Nucleosomas/metabolismo , Procesamiento Proteico-Postraduccional
13.
BMC Genomics ; 17(1): 938, 2016 11 18.
Artículo en Inglés | MEDLINE | ID: mdl-27863463

RESUMEN

BACKGROUND: ChIP-seq and related high-throughput chromatin profilig assays generate ever increasing volumes of highly valuable biological data. To make sense out of it, biologists need versatile, efficient and user-friendly tools for access, visualization and itegrative analysis of such data. RESULTS: Here we present the ChIP-Seq command line tools and web server, implementing basic algorithms for ChIP-seq data analysis starting with a read alignment file. The tools are optimized for memory-efficiency and speed thus allowing for processing of large data volumes on inexpensive hardware. The web interface provides access to a large database of public data. The ChIP-Seq tools have a modular and interoperable design in that the output from one application can serve as input to another one. Complex and innovative tasks can thus be achieved by running several tools in a cascade. CONCLUSIONS: The various ChIP-Seq command line tools and web services either complement or compare favorably to related bioinformatics resources in terms of computational efficiency, ease of access to public data and interoperability with other web-based tools. The ChIP-Seq server is accessible at http://ccg.vital-it.ch/chipseq/ .


Asunto(s)
Inmunoprecipitación de Cromatina , Biología Computacional/métodos , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento , Programas Informáticos , Navegador Web , Anotación de Secuencia Molecular , Interfaz Usuario-Computador
14.
BMC Genomics ; 17 Suppl 1: 14, 2016 Jan 11.
Artículo en Inglés | MEDLINE | ID: mdl-26819094

RESUMEN

BACKGROUND: In cell differentiation, a less specialized cell differentiates into a more specialized one, even though all cells in one organism have (almost) the same genome. Epigenetic factors such as histone modifications are known to play a significant role in cell differentiation. We previously introduce cell-type trees to represent the differentiation of cells into more specialized types, a representation that partakes of both ontogeny and phylogeny. RESULTS: We propose a maximum-likelihood (ML) approach to build cell-type trees and show that this ML approach outperforms our earlier distance-based and parsimony-based approaches. We then study the reconstruction of ancestral cell types; since both ancestral and derived cell types can coexist in adult organisms, we propose a lifting algorithm to infer internal nodes. We present results on our lifting algorithm obtained both through simulations and on real datasets. CONCLUSIONS: We show that our ML-based approach outperforms previously proposed techniques such as distance-based and parsimony-based methods. We show our lifting-based approach works well on both simulated and real data.


Asunto(s)
Epigenómica , Algoritmos , Línea Celular , Inmunoprecipitación de Cromatina , Histonas/clasificación , Histonas/metabolismo , Humanos , Funciones de Verosimilitud , Filogenia
15.
Nucleic Acids Res ; 42(Web Server issue): W436-41, 2014 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-24792157

RESUMEN

The SIB Swiss Institute of Bioinformatics (www.isb-sib.ch) was created in 1998 as an institution to foster excellence in bioinformatics. It is renowned worldwide for its databases and software tools, such as UniProtKB/Swiss-Prot, PROSITE, SWISS-MODEL, STRING, etc, that are all accessible on ExPASy.org, SIB's Bioinformatics Resource Portal. This article provides an overview of the scientific and training resources SIB has consistently been offering to the life science community for more than 15 years.


Asunto(s)
Biología Computacional , Bases de Datos de Compuestos Químicos , Programas Informáticos , Evolución Biológica , Bioestadística , Diseño de Fármacos , Genómica , Humanos , Internet , Conformación Proteica , Proteómica , Biología de Sistemas
16.
Bioinformatics ; 30(17): 2406-13, 2014 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-24812341

RESUMEN

MOTIVATION: We have witnessed an enormous increase in ChIP-Seq data for histone modifications in the past few years. Discovering significant patterns in these data is an important problem for understanding biological mechanisms. RESULTS: We propose probabilistic partitioning methods to discover significant patterns in ChIP-Seq data. Our methods take into account signal magnitude, shape, strand orientation and shifts. We compare our methods with some current methods and demonstrate significant improvements, especially with sparse data. Besides pattern discovery and classification, probabilistic partitioning can serve other purposes in ChIP-Seq data analysis. Specifically, we exemplify its merits in the context of peak finding and partitioning of nucleosome positioning patterns in human promoters. AVAILABILITY AND IMPLEMENTATION: The software and code are available in the supplementary material. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Inmunoprecipitación de Cromatina/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Histonas/metabolismo , Análisis de Secuencia de ADN/métodos , Algoritmos , Humanos , Nucleosomas/metabolismo , Probabilidad , Regiones Promotoras Genéticas , Programas Informáticos
17.
Nucleic Acids Res ; 41(Database issue): D101-9, 2013 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-23193254

RESUMEN

UCNEbase (http://ccg.vital-it.ch/UCNEbase) is a free, web-accessible information resource on the evolution and genomic organization of ultra-conserved non-coding elements (UCNEs). It currently covers 4351 such elements in 18 different species. The majority of UCNEs are supposed to be transcriptional regulators of key developmental genes. As most of them occur as clusters near potential target genes, the database is organized along two hierarchical levels: individual UCNEs and ultra-conserved genomic regulatory blocks (UGRBs). UCNEbase introduces a coherent nomenclature for UCNEs reflecting their respective associations with likely target genes. Orthologous and paralogous UCNEs share components of their names and are systematically cross-linked. Detailed synteny maps between the human and other genomes are provided for all UGRBs. UCNEbase is managed by a relational database system and can be accessed by a variety of web-based query pages. As it relies on the UCSC genome browser as visualization platform, a large part of its data content is also available as browser viewable custom track files. UCNEbase is potentially useful to any computational, experimental or evolutionary biologist interested in conserved non-coding DNA elements in vertebrates.


Asunto(s)
ADN Intergénico/química , Bases de Datos de Ácidos Nucleicos , Elementos Reguladores de la Transcripción , Animales , Secuencia de Bases , Gráficos por Computador , Secuencia Conservada , Evolución Molecular , Genoma , Humanos , Internet , Sintenía , Terminología como Asunto , Interfaz Usuario-Computador , Vertebrados/genética
18.
Nucleic Acids Res ; 41(Database issue): D157-64, 2013 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-23193273

RESUMEN

The Eukaryotic Promoter Database (EPD), available online at http://epd.vital-it.ch, is a collection of experimentally defined eukaryotic POL II promoters which has been maintained for more than 25 years. A promoter is represented by a single position in the genome, typically the major transcription start site (TSS). EPD primarily serves biologists interested in analysing the motif content, chromatin structure or DNA methylation status of co-regulated promoter subsets. Initially, promoter evidence came from TSS mapping experiments targeted at single genes and published in journal articles. Today, the TSS positions provided by EPD are inferred from next-generation sequencing data distributed in electronic form. Traditionally, EPD has been a high-quality database with low coverage. The focus of recent efforts has been to reach complete gene coverage for important model organisms. To this end, we introduced a new section called EPDnew, which is automatically assembled from multiple, carefully selected input datasets. As another novelty, we started to use chromatin signatures in addition to mRNA 5'tags to locate promoters of weekly expressed genes. Regarding user interfaces, we introduced a new promoter viewer which enables users to explore promoter-defining experimental evidence in a UCSC genome browser window.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Regiones Promotoras Genéticas , Animales , Gráficos por Computador , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Internet , Ratones , Motivos de Nucleótidos , Sitio de Iniciación de la Transcripción
19.
Genomics ; 104(2): 79-86, 2014 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-25058025

RESUMEN

Scarce work has been done in the analysis of the composition of conserved non-coding elements (CNEs) that are identified by comparisons of two or more genomes and are found to exist in all metazoan genomes. Here we present the analysis of CNEs with a methodology that takes into account word occurrence at various lengths scales in the form of feature vector representation and rule based classifiers. We implement our approach on both protein-coding exons and CNEs, originating from human, insect (Drosophila melanogaster) and worm (Caenorhabditis elegans) genomes, that are either identified in the present study or obtained from the literature. Alignment free feature vector representation of sequences combined with rule-based classification methods leads to successful classification of the different CNEs classes. Biologically meaningful results are derived by comparison with the genomic signatures approach, and classification rates for a variety of functional elements of the genomes along with surrogates are presented.


Asunto(s)
Caenorhabditis elegans/genética , ADN Intergénico/genética , Drosophila melanogaster/genética , Análisis de Secuencia de ADN/métodos , Animales , Secuencia Conservada/genética , Evolución Molecular , Exones , Genómica , Humanos , Alineación de Secuencia
20.
BMC Bioinformatics ; 15: 269, 2014 Aug 08.
Artículo en Inglés | MEDLINE | ID: mdl-25104072

RESUMEN

BACKGROUND: In cell differentiation, a cell of a less specialized type becomes one of a more specialized type, even though all cells have the same genome. Transcription factors and epigenetic marks like histone modifications can play a significant role in the differentiation process. RESULTS: In this paper, we present a simple analysis of cell types and differentiation paths using phylogenetic inference based on ChIP-Seq histone modification data. We precisely defined the notion of cell-type trees and provided a procedure of building such trees. We propose new data representation techniques and distance measures for ChIP-Seq data and use these together with standard phylogenetic inference methods to build biologically meaningful cell-type trees that indicate how diverse types of cells are related. We demonstrate our approach on various kinds of histone modifications for various cell types, also using the datasets to explore various issues surrounding replicate data, variability between cells of the same type, and robustness. We use the results to get some interesting biological findings like important patterns of histone modification changes during cell differentiation process. CONCLUSIONS: We introduced and studied the novel problem of inferring cell type trees from histone modification data. The promising results we obtain point the way to a new approach to the study of cell differentiation. We also discuss how cell-type trees can be used to study the evolution of cell types.


Asunto(s)
Diferenciación Celular/genética , Epigenómica/métodos , Histonas/metabolismo , Filogenia , Inmunoprecipitación de Cromatina , Histonas/genética , Humanos , Análisis de Secuencia de ADN , Factores de Transcripción/genética , Factores de Transcripción/metabolismo
SELECCIÓN DE REFERENCIAS
Detalles de la búsqueda