Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 26
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Biology (Basel) ; 12(6)2023 Jun 13.
Artículo en Inglés | MEDLINE | ID: mdl-37372134

RESUMEN

As the genome carries the historical information of a species' biotic and environmental interactions, analyzing changes in genome structure over time by using powerful statistical physics methods (such as entropic segmentation algorithms, fluctuation analysis in DNA walks, or measures of compositional complexity) provides valuable insights into genome evolution. Nucleotide frequencies tend to vary along the DNA chain, resulting in a hierarchically patchy chromosome structure with heterogeneities at different length scales that range from a few nucleotides to tens of millions of them. Fluctuation analysis reveals that these compositional structures can be classified into three main categories: (1) short-range heterogeneities (below a few kilobase pairs (Kbp)) primarily attributed to the alternation of coding and noncoding regions, interspersed or tandem repeats densities, etc.; (2) isochores, spanning tens to hundreds of tens of Kbp; and (3) superstructures, reaching sizes of tens of megabase pairs (Mbp) or even larger. The obtained isochore and superstructure coordinates in the first complete T2T human sequence are now shared in a public database. In this way, interested researchers can use T2T isochore data, as well as the annotations for different genome elements, to check a specific hypothesis about genome structure. Similarly to other levels of biological organization, a hierarchical compositional structure is prevalent in the genome. Once the compositional structure of a genome is identified, various measures can be derived to quantify the heterogeneity of such structure. The distribution of segment G+C content has recently been proposed as a new genome signature that proves to be useful for comparing complete genomes. Another meaningful measure is the sequence compositional complexity (SCC), which has been used for genome structure comparisons. Lastly, we review the recent genome comparisons in species of the ancient phylum Cyanobacteria, conducted by phylogenetic regression of SCC against time, which have revealed positive trends towards higher genome complexity. These findings provide the first evidence for a driven progressive evolution of genome compositional structure.

2.
Int J Mol Sci ; 23(23)2022 Nov 26.
Artículo en Inglés | MEDLINE | ID: mdl-36499112

RESUMEN

The tropical common bean (Phaseolus vulgaris L.) is an obligatory short-day plant that requires relaxation of the photoperiod to induce flowering. Similar to other crops, photoperiod-induced floral initiation depends on the differentiation and maintenance of meristems. In this study, the global changes in transcript expression profiles were analyzed in two meristematic tissues corresponding to the vegetative and inflorescence meristems of two genotypes with different sensitivities to photoperiods. A total of 3396 differentially expressed genes (DEGs) were identified, and 1271 and 1533 were found to be up-regulated and down-regulated, respectively, whereas 592 genes showed discordant expression patterns between both genotypes. Arabidopsis homologues of DEGs were identified, and most of them were not previously involved in Arabidopsis floral transition, suggesting an evolutionary divergence of the transcriptional regulatory networks of the flowering process of both species. However, some genes belonging to the photoperiod and flower development pathways with evolutionarily conserved transcriptional profiles have been found. In addition, the flower meristem identity genes APETALA1 and LEAFY, as well as CONSTANS-LIKE 5, were identified as markers to distinguish between the vegetative and reproductive stages. Our data also indicated that the down-regulation of the photoperiodic genes seems to be directly associated with promoting floral transition under inductive short-day lengths. These findings provide valuable insight into the molecular factors that underlie meristematic development and contribute to understanding the photoperiod adaptation in the common bean.


Asunto(s)
Arabidopsis , Phaseolus , Arabidopsis/genética , Phaseolus/genética , Phaseolus/metabolismo , Regulación de la Expresión Génica de las Plantas , Genes de Plantas , Transcriptoma , Meristema , Flores/metabolismo , Inflorescencia/genética , Inflorescencia/metabolismo , Proteínas de Plantas/genética
3.
Hortic Res ; 2022 Jan 18.
Artículo en Inglés | MEDLINE | ID: mdl-35039829

RESUMEN

Trichomes are specialised epidermal cells developed in the aerial surface of almost every terrestrial plant. These structures form physical barriers, which combined with their capability of synthesis of complex molecules, prevent plagues from spreading and confer trichomes a key role in the defence against herbivores. In this work, the tomato gene HAIRPLUS (HAP) that controls glandular trichome density in tomato plants was characterised. HAP belongs to a group of proteins involved in histone tail modifications although some also bind methylated DNA. HAP loss of function promotes epigenomic modifications in the tomato genome reflected in numerous differentially methylated cytosines and causes transcriptomic changes in hap mutant plants. Taken together, these findings demonstrate that HAP links epigenome remodelling with multicellular glandular trichome development and reveal that HAP is a valuable genomic tool for pest resistance in tomato breeding.

4.
Sci Rep ; 10(1): 19073, 2020 11 04.
Artículo en Inglés | MEDLINE | ID: mdl-33149190

RESUMEN

Progressive evolution, or the tendency towards increasing complexity, is a controversial issue in biology, which resolution entails a proper measurement of complexity. Genomes are the best entities to address this challenge, as they encode the historical information of a species' biotic and environmental interactions. As a case study, we have measured genome sequence complexity in the ancient phylum Cyanobacteria. To arrive at an appropriate measure of genome sequence complexity, we have chosen metrics that do not decipher biological functionality but that show strong phylogenetic signal. Using a ridge regression of those metrics against root-to-tip distance, we detected positive trends towards higher complexity in three of them. Lastly, we applied three standard tests to detect if progressive evolution is passive or driven-the minimum, ancestor-descendant, and sub-clade tests. These results provide evidence for driven progressive evolution at the genome-level in the phylum Cyanobacteria.


Asunto(s)
Cianobacterias/genética , Evolución Molecular , Genoma Bacteriano , Cianobacterias/clasificación , Filogenia
5.
Plants (Basel) ; 9(4)2020 Apr 22.
Artículo en Inglés | MEDLINE | ID: mdl-32331491

RESUMEN

Pod maturation of common bean relies upon complex gene expression changes, which in turn are crucial for seed formation and dispersal. Hence, dissecting the transcriptional regulation of pod maturation would be of great significance for breeding programs. In this study, a comprehensive characterization of expression changes has been performed in two common bean cultivars (ancient and modern) by analyzing the transcriptomes of five developmental pod stages, from fruit setting to maturation. RNA-seq analysis allowed for the identification of key genes shared by both accessions, which in turn were homologous to known Arabidopsis maturation genes and furthermore showed a similar expression pattern along the maturation process. Gene- expression changes suggested a role in promoting an accelerated breakdown of photosynthetic and ribosomal machinery associated with chlorophyll degradation and early activation of alpha-linolenic acid metabolism. A further study of transcription factors and their DNA binding sites revealed three candidate genes whose functions may play a dominant role in regulating pod maturation. Altogether, this research identifies the first maturation gene set reported in common bean so far and contributes to a better understanding of the dynamic mechanisms of pod maturation, providing potentially useful information for genomic-assisted breeding of common bean yield and pod quality attributes.

6.
Methods Mol Biol ; 1766: 31-47, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-29605846

RESUMEN

The promoter region of around 70% of all genes in the human genome is overlapped by a CpG island (CGI). CGIs have known functions in the transcription initiation and outstanding compositional features like high G+C content and CpG ratios when compared to the bulk DNA. We have shown before that CGIs manifest as clusters of CpGs in mammalian genomes and can therefore be detected using clustering methods. These techniques have several advantages over sliding window approaches which apply compositional properties as thresholds. In this protocol we show how to determine local (CpG islands) and global (distance distribution) clustering properties of CG dinucleotides and how to generalize this analysis to any k-mer or combinations of it. In addition, we illustrate how to easily cross the output of a CpG island prediction algorithm with our methylation database to detect differentially methylated CGIs. The analysis is given in a step-by-step protocol and all necessary programs are implemented into a virtual machine or, alternatively, the software can be downloaded and easily installed.


Asunto(s)
Islas de CpG/genética , Metilación de ADN , Genoma Humano/genética , Animales , Composición de Base , Secuencia de Bases , ADN/química , ADN/genética , ADN/metabolismo , Humanos , Regiones Promotoras Genéticas/genética , Programas Informáticos , Iniciación de la Transcripción Genética
7.
Methods Mol Biol ; 1580: 149-174, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28439833

RESUMEN

High-throughput sequencing (HTS) data for small RNAs (noncoding RNA molecules that are 20-250 nucleotides in length) can now be routinely generated by minimally equipped wet laboratories; however, the bottleneck in HTS-based research has shifted now to the analysis of such huge amount of data. One of the reasons is that many analysis types require a Linux environment but computers, system administrators, and bioinformaticians suppose additional costs that often cannot be afforded by small to mid-sized groups or laboratories. Web servers are an alternative that can be used if the data is not subjected to privacy issues (what very often is an important issue with medical data). However, in any case they are less flexible than stand-alone programs limiting the number of workflows and analysis types that can be carried out.We show in this protocol how virtual machines can be used to overcome those problems and limitations. sRNAtoolboxVM is a virtual machine that can be executed on all common operating systems through virtualization programs like VirtualBox or VMware, providing the user with a high number of preinstalled programs like sRNAbench for small RNA analysis without the need to maintain additional servers and/or operating systems.


Asunto(s)
Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , ARN Pequeño no Traducido/genética , Programas Informáticos , Animales , Bases de Datos Genéticas , Perfilación de la Expresión Génica/métodos , Humanos , ARN Pequeño no Traducido/análisis , Interfaz Usuario-Computador
8.
Nucleic Acids Res ; 45(D1): D97-D103, 2017 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-27794041

RESUMEN

The 2017 update of NGSmethDB stores whole genome methylomes generated from short-read data sets obtained by bisulfite sequencing (WGBS) technology. To generate high-quality methylomes, stringent quality controls were integrated with third-part software, adding also a two-step mapping process to exploit the advantages of the new genome assembly models. The samples were all profiled under constant parameter settings, thus enabling comparative downstream analyses. Besides a significant increase in the number of samples, NGSmethDB now includes two additional data-types, which are a valuable resource for the discovery of methylation epigenetic biomarkers: (i) differentially methylated single-cytosines; and (ii) methylation segments (i.e. genome regions of homogeneous methylation). The NGSmethDB back-end is now based on MongoDB, a NoSQL hierarchical database using JSON-formatted documents and dynamic schemas, thus accelerating sample comparative analyses. Besides conventional database dumps, track hubs were implemented, which improved database access, visualization in genome browsers and comparative analyses to third-part annotations. In addition, the database can be also accessed through a RESTful API. Lastly, a Python client and a multiplatform virtual machine allow for program-driven access from user desktop. This way, private methylation data can be compared to NGSmethDB without the need to upload them to public servers. Database website: http://bioinfo2.ugr.es/NGSmethDB.


Asunto(s)
Metilación de ADN , Bases de Datos de Ácidos Nucleicos , Animales , Citosina/metabolismo , Genoma , Humanos
9.
Nucleic Acids Res ; 43(W1): W467-73, 2015 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-26019179

RESUMEN

Small RNA research is a rapidly growing field. Apart from microRNAs, which are important regulators of gene expression, other types of functional small RNA molecules have been reported in animals and plants. MicroRNAs are important in host-microbe interactions and parasite microRNAs might modulate the innate immunity of the host. Furthermore, small RNAs can be detected in bodily fluids making them attractive non-invasive biomarker candidates. Given the general broad interest in small RNAs, and in particular microRNAs, a large number of bioinformatics aided analysis types are needed by the scientific community. To facilitate integrated sRNA research, we developed sRNAtoolbox, a set of independent but interconnected tools for expression profiling from high-throughput sequencing data, consensus differential expression, target gene prediction, visual exploration in a genome context as a function of read length, gene list analysis and blast search of unmapped reads. All tools can be used independently or for the exploration and downstream analysis of sRNAbench results. Workflows like the prediction of consensus target genes of parasite microRNAs in the host followed by the detection of enriched pathways can be easily established. The web-interface interconnecting all these tools is available at http://bioinfo5.ugr.es/srnatoolbox.


Asunto(s)
ARN Pequeño no Traducido/metabolismo , Programas Informáticos , Perfilación de la Expresión Génica , Humanos , Internet , MicroARNs/metabolismo
10.
Comput Biol Chem ; 53 Pt A: 71-8, 2014 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-25182383

RESUMEN

Early global measures of genome complexity (power spectra, the analysis of fluctuations in DNA walks or compositional segmentation) uncovered a high degree of complexity in eukaryotic genome sequences. The main evolutionary mechanisms leading to increases in genome complexity (i.e. gene duplication and transposon proliferation) can all potentially produce increases in DNA clustering. To quantify such clustering and provide a genome-wide description of the formed clusters, we developed GenomeCluster, an algorithm able to detect clusters of whatever genome element identified by chromosome coordinates. We obtained a detailed description of clusters for ten categories of human genome elements, including functional (genes, exons, introns), regulatory (CpG islands, TFBSs, enhancers), variant (SNPs) and repeat (Alus, LINE1) elements, as well as DNase hypersensitivity sites. For each category, we located their clusters in the human genome, then quantifying cluster length and composition, and estimated the clustering level as the proportion of clustered genome elements. In average, we found a 27% of elements in clusters, although a considerable variation occurs among different categories. Genes form the lowest number of clusters, but these are the longest ones, both in bp and the average number of components, while the shortest clusters are formed by SNPs. Functional and regulatory elements (genes, CpG islands, TFBSs, enhancers) show the highest clustering level, as compared to DNase sites, repeats (Alus, LINE1) or SNPs. Many of the genome elements we analyzed are known to be composed of clusters of low-level entities. In addition, we found here that the clusters generated by GenomeCluster can be in turn clustered into high-level super-clusters. The observation of 'clusters-within-clusters' parallels the 'domains within domains' phenomenon previously detected through global statistical methods in eukaryotic sequences, and reveals a complex human genome landscape dominated by hierarchical clustering.


Asunto(s)
Algoritmos , Mapeo Cromosómico/estadística & datos numéricos , Genoma Humano , Familia de Multigenes , Factores de Transcripción/genética , Elementos Alu , Sitios de Unión , Mapeo Cromosómico/métodos , Islas de CpG , Elementos de Facilitación Genéticos , Exones , Genes Reguladores , Humanos , Intrones , Elementos de Nucleótido Esparcido Largo , Polimorfismo de Nucleótido Simple
11.
Nucleic Acids Res ; 42(Database issue): D53-9, 2014 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-24271385

RESUMEN

The updated release of 'NGSmethDB' (http://bioinfo2.ugr.es/NGSmethDB) is a repository for single-base whole-genome methylome maps for the best-assembled eukaryotic genomes. Short-read data sets from NGS bisulfite-sequencing projects of cell lines, fresh and pathological tissues are first pre-processed and aligned to the corresponding reference genome, and then the cytosine methylation levels are profiled. One major improvement is the application of a unique bioinformatics protocol to all data sets, thereby assuring the comparability of all values with each other. We implemented stringent quality controls to minimize important error sources, such as sequencing errors, bisulfite failures, clonal reads or single nucleotide variants (SNVs). This leads to reliable and high-quality methylomes, all obtained under uniform settings. Another significant improvement is the detection in parallel of SNVs, which might be crucial for many downstream analyses (e.g. SNVs and differential-methylation relationships). A next-generation methylation browser allows fast and smooth scrolling and zooming, thus speeding data download/upload, at the same time requiring fewer server resources. Several data mining tools allow the comparison/retrieval of methylation levels in different tissues or genome regions. NGSmethDB methylomes are also available as native tracks through a UCSC hub, which allows comparison with a wide range of third-party annotations, in particular phenotype or disease annotations.


Asunto(s)
Metilación de ADN , Bases de Datos de Ácidos Nucleicos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Animales , Línea Celular , Citosina/metabolismo , Epigénesis Genética , Variación Genética , Genoma , Genómica , Secuenciación de Nucleótidos de Alto Rendimiento/normas , Humanos , Internet , Ratones , Alineación de Secuencia , Análisis de Secuencia de ADN/normas
12.
Biomed Res Int ; 2013: 709042, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-24205506

RESUMEN

Hypomethylated, CpG-rich DNA segments (CpG islands, CGIs) are epigenome markers involved in key biological processes. Aberrant methylation is implicated in the appearance of several disorders as cancer, immunodeficiency, or centromere instability. Furthermore, methylation differences at promoter regions between human and chimpanzee strongly associate with genes involved in neurological/psychological disorders and cancers. Therefore, the evolutionary comparative analyses of CGIs can provide insights on the functional role of these epigenome markers in both health and disease. Given the lack of specific tools, we developed CpGislandEVO. Briefly, we first compile a database of statistically significant CGIs for the best assembled mammalian genome sequences available to date. Second, by means of a coupled browser front-end, we focus on the CGIs overlapping orthologous genes extracted from OrthoDB, thus ensuring the comparison between CGIs located on truly homologous genome segments. This allows comparing the main compositional features between homologous CGIs. Finally, to facilitate nucleotide comparisons, we lifted genome coordinates between assemblies from different species, which enables the analysis of sequence divergence by direct count of nucleotide substitutions and indels occurring between homologous CGIs. The resulting CpGislandEVO database, linking together CGIs and single-cytosine DNA methylation data from several mammalian species, is freely available at our website.


Asunto(s)
Islas de CpG , Bases de Datos de Ácidos Nucleicos , Evolución Molecular , Genoma Humano , Animales , Epigenómica/métodos , Humanos , Internet , Ratones , Primates , Ratas
13.
F1000Res ; 2: 217, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-24627790

RESUMEN

Whole genome methylation profiling at a single cytosine resolution is now feasible due to the advent of high-throughput sequencing techniques together with bisulfite treatment of the DNA. To obtain the methylation value of each individual cytosine, the bisulfite-treated sequence reads are first aligned to a reference genome, and then the profiling of the methylation levels is done from the alignments. A huge effort has been made to quickly and correctly align the reads and many different algorithms and programs to do this have been created. However, the second step is just as crucial and non-trivial, but much less attention has been paid to the final inference of the methylation states. Important error sources do exist, such as sequencing errors, bisulfite failure, clonal reads, and single nucleotide variants. We developed MethylExtract, a user friendly tool to: i) generate high quality, whole genome methylation maps and ii) detect sequence variation within the same sample preparation. The program is implemented into a single script and takes into account all major error sources. MethylExtract detects variation (SNVs - Single Nucleotide Variants) in a similar way to VarScan, a very sensitive method extensively used in SNV and genotype calling based on non-bisulfite-treated reads. The usefulness of MethylExtract is shown by means of extensive benchmarking based on artificial bisulfite-treated reads and a comparison to a recently published method, called Bis-SNP. MethylExtract is able to detect SNVs within High-Throughput Sequencing experiments of bisulfite treated DNA at the same time as it generates high quality methylation maps. This simultaneous detection of DNA methylation and sequence variation is crucial for many downstream analyses, for example when deciphering the impact of SNVs on differential methylation. An exclusive feature of MethylExtract, in comparison with existing software, is the possibility to assess the bisulfite failure in a statistical way. The source code, tutorial and artificial bisulfite datasets are available at http://bioinfo2.ugr.es/MethylExtract/ and http://sourceforge.net/projects/methylextract/, and also permanently accessible from 10.5281/zenodo.7144.

14.
J Theor Biol ; 297: 127-36, 2012 Mar 21.
Artículo en Inglés | MEDLINE | ID: mdl-22226985

RESUMEN

Relevant words in literary texts (key words) are known to be clustered, while common words are randomly distributed. Given the clustered distribution of many functional genome elements, we hypothesize that the biological text per excellence, the DNA sequence, might behave in the same way: k-length words (k-mers) with a clear function may be spatially clustered along the one-dimensional chromosome sequence, while less-important, non-functional words may be randomly distributed. To explore this linguistic analogy, we calculate a clustering coefficient for each k-mer (k=2-9bp) in human and mouse chromosome sequences, then checking if clustered words are enriched in the functional part of the genome. First, we found a positive general trend relating clustering level and word enrichment within exons and Transcription Factor Binding Sites (TFBSs), while a much weaker relation exists for repeats, and no relation at all exists for introns. Second, we found that 38.45% of the 200 top-clustered 8-mers, but only 7.70% of the non-clustered words, are represented in known motif databases. Third, enrichment/depletion experiments show that highly clustered words are significantly enriched in exons and TFBSs, while they are depleted in introns and repetitive DNA. Considering exons and TFBSs together, 1417 (or 72.26%) in human and 1385 (or 72.97%) in mouse of the top-clustered 8-mers showed a statistically significant association to either exons or TFBSs, thus strongly supporting the link between word clustering and biological function. Lastly, we identified a subset of clustered, diagnostic words that are enriched in exons but depleted in introns, and therefore might help to discriminate between these two gene regions. The clustering of DNA words thus appears as a novel principle to detect functionality in genome sequences. As evolutionary conservation is not a prerequisite, the proof of principle described here may open new ways to detect species-specific functional DNA sequences and the improvement of gene and promoter predictions, thus contributing to the quest for function in the genome.


Asunto(s)
ADN/genética , Modelos Genéticos , Algoritmos , Animales , Secuencia de Bases , Sitios de Unión/genética , Análisis por Conglomerados , Exones/genética , Humanos , Intrones/genética , Lingüística , Ratones , Especificidad de la Especie , Factores de Transcripción/genética
15.
Algorithms Mol Biol ; 6: 2, 2011 Jan 24.
Artículo en Inglés | MEDLINE | ID: mdl-21261981

RESUMEN

BACKGROUND: Many k-mers (or DNA words) and genomic elements are known to be spatially clustered in the genome. Well established examples are the genes, TFBSs, CpG dinucleotides, microRNA genes and ultra-conserved non-coding regions. Currently, no algorithm exists to find these clusters in a statistically comprehensible way. The detection of clustering often relies on densities and sliding-window approaches or arbitrarily chosen distance thresholds. RESULTS: We introduce here an algorithm to detect clusters of DNA words (k-mers), or any other genomic element, based on the distance between consecutive copies and an assigned statistical significance. We implemented the method into a web server connected to a MySQL backend, which also determines the co-localization with gene annotations. We demonstrate the usefulness of this approach by detecting the clusters of CAG/CTG (cytosine contexts that can be methylated in undifferentiated cells), showing that the degree of methylation vary drastically between inside and outside of the clusters. As another example, we used WordCluster to search for statistically significant clusters of olfactory receptor (OR) genes in the human genome. CONCLUSIONS: WordCluster seems to predict biological meaningful clusters of DNA words (k-mers) and genomic entities. The implementation of the method into a web server is available at http://bioinfo2.ugr.es/wordCluster/wordCluster.php including additional features like the detection of co-localization with gene regions or the annotation enrichment tool for functional analysis of overlapped genes.

16.
Nucleic Acids Res ; 39(Database issue): D75-9, 2011 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-20965971

RESUMEN

Next-generation sequencing (NGS) together with bisulphite conversion allows the generation of whole genome methylation maps at single-cytosine resolution. This allows studying the absence of methylation in a particular genome region over a range of tissues, the differential tissue methylation or the changes occurring along pathological conditions. However, no database exists fully addressing such requirements. We propose here NGSmethDB (http://bioinfo2.ugr.es/NGSmethDB/gbrowse/) for the storage and retrieval of methylation data derived from NGS. Two cytosine methylation contexts (CpG and CAG/CTG) are considered. Through a browser interface coupled to a MySQL backend and several data mining tools, the user can search for methylation states in a set of tissues, retrieve methylation values for a set of tissues in a given chromosomal region, or display the methylation of promoters among different tissues. NGSmethDB is currently populated with human, mouse and Arabidopsis data, but other methylomes will be incorporated through an automatic pipeline as soon as new data become available. Dump downloads for three coverage levels (1, 5 or 10 reads) are available. NGSmethDB will be useful for experimental researchers, as well as for bioinformaticians, who might use the data as input for further research.


Asunto(s)
Citosina/análisis , Metilación de ADN , Bases de Datos de Ácidos Nucleicos , Animales , Cromosomas/química , Islas de CpG , Minería de Datos , Genómica , Humanos , Ratones , Regiones Promotoras Genéticas , Análisis de Secuencia de ADN , Programas Informáticos , Interfaz Usuario-Computador
17.
BMC Genomics ; 11: 327, 2010 May 26.
Artículo en Inglés | MEDLINE | ID: mdl-20500903

RESUMEN

BACKGROUND: Unmethylated stretches of CpG dinucleotides (CpG islands) are an outstanding property of mammal genomes. Conventionally, these regions are detected by sliding window approaches using %G + C, CpG observed/expected ratio and length thresholds as main parameters. Recently, clustering methods directly detect clusters of CpG dinucleotides as a statistical property of the genome sequence. RESULTS: We compare sliding-window to clustering (i.e. CpGcluster) predictions by applying new ways to detect putative functionality of CpG islands. Analyzing the co-localization with several genomic regions as a function of window size vs. statistical significance (p-value), CpGcluster shows a higher overlap with promoter regions and highly conserved elements, at the same time showing less overlap with Alu retrotransposons. The major difference in the prediction was found for short islands (CpG islets), often exclusively predicted by CpGcluster. Many of these islets seem to be functional, as they are unmethylated, highly conserved and/or located within the promoter region. Finally, we show that window-based islands can spuriously overlap several, differentially regulated promoters as well as different methylation domains, which might indicate a wrong merge of several CpG islands into a single, very long island. The shorter CpGcluster islands seem to be much more specific when concerning the overlap with alternative transcription start sites or the detection of homogenous methylation domains. CONCLUSIONS: The main difference between sliding-window approaches and clustering methods is the length of the predicted islands. Short islands, often differentially methylated, are almost exclusively predicted by CpGcluster. This suggests that CpGcluster may be the algorithm of choice to explore the function of these short, but putatively functional CpG islands.


Asunto(s)
Algoritmos , Islas de CpG , Elementos Alu/genética , Análisis por Conglomerados , Secuencia Conservada/genética , Metilación de ADN/genética , Evolución Molecular , Humanos , Regiones Promotoras Genéticas/genética
18.
BMC Evol Biol ; 8: 107, 2008 Apr 11.
Artículo en Inglés | MEDLINE | ID: mdl-18405379

RESUMEN

BACKGROUND: The phylogenetic distribution of large-scale genome structure (i.e. mosaic compositional patchiness) has been explored mainly by analytical ultracentrifugation of bulk DNA. However, with the availability of large, good-quality chromosome sequences, and the recently developed computational methods to directly analyze patchiness on the genome sequence, an evolutionary comparative analysis can be carried out at the sequence level. RESULTS: The local variations in the scaling exponent of the Detrended Fluctuation Analysis are used here to analyze large-scale genome structure and directly uncover the characteristic scales present in genome sequences. Furthermore, through shuffling experiments of selected genome regions, computationally-identified, isochore-like regions were identified as the biological source for the uncovered large-scale genome structure. The phylogenetic distribution of short- and large-scale patchiness was determined in the best-sequenced genome assemblies from eleven eukaryotic genomes: mammals (Homo sapiens, Pan troglodytes, Mus musculus, Rattus norvegicus, and Canis familiaris), birds (Gallus gallus), fishes (Danio rerio), invertebrates (Drosophila melanogaster and Caenorhabditis elegans), plants (Arabidopsis thaliana) and yeasts (Saccharomyces cerevisiae). We found large-scale patchiness of genome structure, associated with in silico determined, isochore-like regions, throughout this wide phylogenetic range. CONCLUSION: Large-scale genome structure is detected by directly analyzing DNA sequences in a wide range of eukaryotic chromosome sequences, from human to yeast. In all these genomes, large-scale patchiness can be associated with the isochore-like regions, as directly detected in silico at the sequence level.


Asunto(s)
Genoma/genética , Isocoras/genética , Filogenia , Animales , Arabidopsis/genética , Biología Computacional , Perros , Genoma Fúngico/genética , Genoma Humano/genética , Genoma de Planta/genética , Humanos , Ratones , Pan troglodytes/genética , Ratas , Saccharomyces cerevisiae/genética , Análisis de Secuencia de ADN , Especificidad de la Especie
19.
BMC Bioinformatics ; 7: 446, 2006 Oct 12.
Artículo en Inglés | MEDLINE | ID: mdl-17038168

RESUMEN

BACKGROUND: Despite their involvement in the regulation of gene expression and their importance as genomic markers for promoter prediction, no objective standard exists for defining CpG islands (CGIs), since all current approaches rely on a large parameter space formed by the thresholds of length, CpG fraction and G+C content. RESULTS: Given the higher frequency of CpG dinucleotides at CGIs, as compared to bulk DNA, the distance distributions between neighboring CpGs should differ for bulk and island CpGs. A new algorithm (CpGcluster) is presented, based on the physical distance between neighboring CpGs on the chromosome and able to predict directly clusters of CpGs, while not depending on the subjective criteria mentioned above. By assigning a p-value to each of these clusters, the most statistically significant ones can be predicted as CGIs. CpGcluster was benchmarked against five other CGI finders by using a test sequence set assembled from an experimental CGI library. CpGcluster reached the highest overall accuracy values, while showing the lowest rate of false-positive predictions. Since a minimum-length threshold is not required, CpGcluster can find short but fully functional CGIs usually missed by other algorithms. The CGIs predicted by CpGcluster present the lowest degree of overlap with Alu retrotransposons and, simultaneously, the highest overlap with vertebrate Phylogenetic Conserved Elements (PhastCons). CpGcluster's CGIs overlapping with the Transcription Start Site (TSS) show the highest statistical significance, as compared to the islands in other genome locations, thus qualifying CpGcluster as a valuable tool in discriminating functional CGIs from the remaining islands in the bulk genome. CONCLUSION: CpGcluster uses only integer arithmetic, thus being a fast and computationally efficient algorithm able to predict statistically significant clusters of CpG dinucleotides. Another outstanding feature is that all predicted CGIs start and end with a CpG dinucleotide, which should be appropriate for a genomic feature whose functionality is based precisely on CpG dinucleotides. The only search parameter in CpGcluster is the distance between two consecutive CpGs, in contrast to previous algorithms. Therefore, none of the main statistical properties of CpG islands (neither G+C content, CpG fraction nor length threshold) are needed as search parameters, which may lead to the high specificity and low overlap with spurious Alu elements observed for CpGcluster predictions.


Asunto(s)
Algoritmos , Islas de CpG/genética , Animales , Genoma/genética , Humanos , Ratones
20.
Phys Rev E Stat Nonlin Soft Matter Phys ; 71(6 Pt 1): 061925, 2005 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-16089783

RESUMEN

We report on an entropic edge detector based on the local calculation of the Jensen-Shannon divergence with application to the search for CpG islands. CpG islands are pieces of the genome related to gene expression and cell differentiation, and thus to cancer formation. Searching for these CpG islands is a major task in genetics and bioinformatics. Some algorithms have been proposed in the literature, based on moving statistics in a sliding window, but its size may greatly influence the results. The local use of Jensen-Shannon divergence is a completely different strategy: the nucleotide composition inside the islands is different from that in their environment, so a statistical distance--the Jensen-Shannon divergence--between the composition of two adjacent windows may be used as a measure of their dissimilarity. Sliding this double window over the entire sequence allows us to segment it compositionally. The fusion of those segments into greater ones that satisfy certain identification criteria must be achieved in order to obtain the definitive results. We find that the local use of Jensen-Shannon divergence is very suitable in processing DNA sequences for searching for compositionally different structures such as CpG islands, as compared to other algorithms in literature.


Asunto(s)
Algoritmos , Mapeo Cromosómico/métodos , Islas de CpG/genética , Bases de Datos de Ácidos Nucleicos , Genoma Humano , Modelos Genéticos , Análisis de Secuencia de ADN/métodos , Composición de Base/genética , Secuencia de Bases , Simulación por Computador , ADN/análisis , ADN/química , ADN/genética , Humanos , Modelos Químicos , Datos de Secuencia Molecular , Reconocimiento de Normas Patrones Automatizadas/métodos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...