Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 26
Filtrar
1.
Int J Mol Sci ; 23(23)2022 Nov 26.
Artigo em Inglês | MEDLINE | ID: mdl-36499112

RESUMO

The tropical common bean (Phaseolus vulgaris L.) is an obligatory short-day plant that requires relaxation of the photoperiod to induce flowering. Similar to other crops, photoperiod-induced floral initiation depends on the differentiation and maintenance of meristems. In this study, the global changes in transcript expression profiles were analyzed in two meristematic tissues corresponding to the vegetative and inflorescence meristems of two genotypes with different sensitivities to photoperiods. A total of 3396 differentially expressed genes (DEGs) were identified, and 1271 and 1533 were found to be up-regulated and down-regulated, respectively, whereas 592 genes showed discordant expression patterns between both genotypes. Arabidopsis homologues of DEGs were identified, and most of them were not previously involved in Arabidopsis floral transition, suggesting an evolutionary divergence of the transcriptional regulatory networks of the flowering process of both species. However, some genes belonging to the photoperiod and flower development pathways with evolutionarily conserved transcriptional profiles have been found. In addition, the flower meristem identity genes APETALA1 and LEAFY, as well as CONSTANS-LIKE 5, were identified as markers to distinguish between the vegetative and reproductive stages. Our data also indicated that the down-regulation of the photoperiodic genes seems to be directly associated with promoting floral transition under inductive short-day lengths. These findings provide valuable insight into the molecular factors that underlie meristematic development and contribute to understanding the photoperiod adaptation in the common bean.


Assuntos
Arabidopsis , Phaseolus , Arabidopsis/genética , Phaseolus/genética , Phaseolus/metabolismo , Regulação da Expressão Gênica de Plantas , Genes de Plantas , Transcriptoma , Meristema , Flores/metabolismo , Inflorescência/genética , Inflorescência/metabolismo , Proteínas de Plantas/genética
2.
Nucleic Acids Res ; 45(D1): D97-D103, 2017 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-27794041

RESUMO

The 2017 update of NGSmethDB stores whole genome methylomes generated from short-read data sets obtained by bisulfite sequencing (WGBS) technology. To generate high-quality methylomes, stringent quality controls were integrated with third-part software, adding also a two-step mapping process to exploit the advantages of the new genome assembly models. The samples were all profiled under constant parameter settings, thus enabling comparative downstream analyses. Besides a significant increase in the number of samples, NGSmethDB now includes two additional data-types, which are a valuable resource for the discovery of methylation epigenetic biomarkers: (i) differentially methylated single-cytosines; and (ii) methylation segments (i.e. genome regions of homogeneous methylation). The NGSmethDB back-end is now based on MongoDB, a NoSQL hierarchical database using JSON-formatted documents and dynamic schemas, thus accelerating sample comparative analyses. Besides conventional database dumps, track hubs were implemented, which improved database access, visualization in genome browsers and comparative analyses to third-part annotations. In addition, the database can be also accessed through a RESTful API. Lastly, a Python client and a multiplatform virtual machine allow for program-driven access from user desktop. This way, private methylation data can be compared to NGSmethDB without the need to upload them to public servers. Database website: http://bioinfo2.ugr.es/NGSmethDB.


Assuntos
Metilação de DNA , Bases de Dados de Ácidos Nucleicos , Animais , Citosina/metabolismo , Genoma , Humanos
3.
Nucleic Acids Res ; 43(W1): W467-73, 2015 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-26019179

RESUMO

Small RNA research is a rapidly growing field. Apart from microRNAs, which are important regulators of gene expression, other types of functional small RNA molecules have been reported in animals and plants. MicroRNAs are important in host-microbe interactions and parasite microRNAs might modulate the innate immunity of the host. Furthermore, small RNAs can be detected in bodily fluids making them attractive non-invasive biomarker candidates. Given the general broad interest in small RNAs, and in particular microRNAs, a large number of bioinformatics aided analysis types are needed by the scientific community. To facilitate integrated sRNA research, we developed sRNAtoolbox, a set of independent but interconnected tools for expression profiling from high-throughput sequencing data, consensus differential expression, target gene prediction, visual exploration in a genome context as a function of read length, gene list analysis and blast search of unmapped reads. All tools can be used independently or for the exploration and downstream analysis of sRNAbench results. Workflows like the prediction of consensus target genes of parasite microRNAs in the host followed by the detection of enriched pathways can be easily established. The web-interface interconnecting all these tools is available at http://bioinfo5.ugr.es/srnatoolbox.


Assuntos
Pequeno RNA não Traduzido/metabolismo , Software , Perfilação da Expressão Gênica , Humanos , Internet , MicroRNAs/metabolismo
4.
Nucleic Acids Res ; 42(Database issue): D53-9, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24271385

RESUMO

The updated release of 'NGSmethDB' (http://bioinfo2.ugr.es/NGSmethDB) is a repository for single-base whole-genome methylome maps for the best-assembled eukaryotic genomes. Short-read data sets from NGS bisulfite-sequencing projects of cell lines, fresh and pathological tissues are first pre-processed and aligned to the corresponding reference genome, and then the cytosine methylation levels are profiled. One major improvement is the application of a unique bioinformatics protocol to all data sets, thereby assuring the comparability of all values with each other. We implemented stringent quality controls to minimize important error sources, such as sequencing errors, bisulfite failures, clonal reads or single nucleotide variants (SNVs). This leads to reliable and high-quality methylomes, all obtained under uniform settings. Another significant improvement is the detection in parallel of SNVs, which might be crucial for many downstream analyses (e.g. SNVs and differential-methylation relationships). A next-generation methylation browser allows fast and smooth scrolling and zooming, thus speeding data download/upload, at the same time requiring fewer server resources. Several data mining tools allow the comparison/retrieval of methylation levels in different tissues or genome regions. NGSmethDB methylomes are also available as native tracks through a UCSC hub, which allows comparison with a wide range of third-party annotations, in particular phenotype or disease annotations.


Assuntos
Metilação de DNA , Bases de Dados de Ácidos Nucleicos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Animais , Linhagem Celular , Citosina/metabolismo , Epigênese Genética , Variação Genética , Genoma , Genômica , Sequenciamento de Nucleotídeos em Larga Escala/normas , Humanos , Internet , Camundongos , Alinhamento de Sequência , Análise de Sequência de DNA/normas
5.
Nucleic Acids Res ; 39(Database issue): D75-9, 2011 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-20965971

RESUMO

Next-generation sequencing (NGS) together with bisulphite conversion allows the generation of whole genome methylation maps at single-cytosine resolution. This allows studying the absence of methylation in a particular genome region over a range of tissues, the differential tissue methylation or the changes occurring along pathological conditions. However, no database exists fully addressing such requirements. We propose here NGSmethDB (http://bioinfo2.ugr.es/NGSmethDB/gbrowse/) for the storage and retrieval of methylation data derived from NGS. Two cytosine methylation contexts (CpG and CAG/CTG) are considered. Through a browser interface coupled to a MySQL backend and several data mining tools, the user can search for methylation states in a set of tissues, retrieve methylation values for a set of tissues in a given chromosomal region, or display the methylation of promoters among different tissues. NGSmethDB is currently populated with human, mouse and Arabidopsis data, but other methylomes will be incorporated through an automatic pipeline as soon as new data become available. Dump downloads for three coverage levels (1, 5 or 10 reads) are available. NGSmethDB will be useful for experimental researchers, as well as for bioinformaticians, who might use the data as input for further research.


Assuntos
Citosina/análise , Metilação de DNA , Bases de Dados de Ácidos Nucleicos , Animais , Cromossomos/química , Ilhas de CpG , Mineração de Dados , Genômica , Humanos , Camundongos , Regiões Promotoras Genéticas , Análise de Sequência de DNA , Software , Interface Usuário-Computador
6.
Biology (Basel) ; 12(6)2023 Jun 13.
Artigo em Inglês | MEDLINE | ID: mdl-37372134

RESUMO

As the genome carries the historical information of a species' biotic and environmental interactions, analyzing changes in genome structure over time by using powerful statistical physics methods (such as entropic segmentation algorithms, fluctuation analysis in DNA walks, or measures of compositional complexity) provides valuable insights into genome evolution. Nucleotide frequencies tend to vary along the DNA chain, resulting in a hierarchically patchy chromosome structure with heterogeneities at different length scales that range from a few nucleotides to tens of millions of them. Fluctuation analysis reveals that these compositional structures can be classified into three main categories: (1) short-range heterogeneities (below a few kilobase pairs (Kbp)) primarily attributed to the alternation of coding and noncoding regions, interspersed or tandem repeats densities, etc.; (2) isochores, spanning tens to hundreds of tens of Kbp; and (3) superstructures, reaching sizes of tens of megabase pairs (Mbp) or even larger. The obtained isochore and superstructure coordinates in the first complete T2T human sequence are now shared in a public database. In this way, interested researchers can use T2T isochore data, as well as the annotations for different genome elements, to check a specific hypothesis about genome structure. Similarly to other levels of biological organization, a hierarchical compositional structure is prevalent in the genome. Once the compositional structure of a genome is identified, various measures can be derived to quantify the heterogeneity of such structure. The distribution of segment G+C content has recently been proposed as a new genome signature that proves to be useful for comparing complete genomes. Another meaningful measure is the sequence compositional complexity (SCC), which has been used for genome structure comparisons. Lastly, we review the recent genome comparisons in species of the ancient phylum Cyanobacteria, conducted by phylogenetic regression of SCC against time, which have revealed positive trends towards higher genome complexity. These findings provide the first evidence for a driven progressive evolution of genome compositional structure.

7.
J Theor Biol ; 297: 127-36, 2012 Mar 21.
Artigo em Inglês | MEDLINE | ID: mdl-22226985

RESUMO

Relevant words in literary texts (key words) are known to be clustered, while common words are randomly distributed. Given the clustered distribution of many functional genome elements, we hypothesize that the biological text per excellence, the DNA sequence, might behave in the same way: k-length words (k-mers) with a clear function may be spatially clustered along the one-dimensional chromosome sequence, while less-important, non-functional words may be randomly distributed. To explore this linguistic analogy, we calculate a clustering coefficient for each k-mer (k=2-9bp) in human and mouse chromosome sequences, then checking if clustered words are enriched in the functional part of the genome. First, we found a positive general trend relating clustering level and word enrichment within exons and Transcription Factor Binding Sites (TFBSs), while a much weaker relation exists for repeats, and no relation at all exists for introns. Second, we found that 38.45% of the 200 top-clustered 8-mers, but only 7.70% of the non-clustered words, are represented in known motif databases. Third, enrichment/depletion experiments show that highly clustered words are significantly enriched in exons and TFBSs, while they are depleted in introns and repetitive DNA. Considering exons and TFBSs together, 1417 (or 72.26%) in human and 1385 (or 72.97%) in mouse of the top-clustered 8-mers showed a statistically significant association to either exons or TFBSs, thus strongly supporting the link between word clustering and biological function. Lastly, we identified a subset of clustered, diagnostic words that are enriched in exons but depleted in introns, and therefore might help to discriminate between these two gene regions. The clustering of DNA words thus appears as a novel principle to detect functionality in genome sequences. As evolutionary conservation is not a prerequisite, the proof of principle described here may open new ways to detect species-specific functional DNA sequences and the improvement of gene and promoter predictions, thus contributing to the quest for function in the genome.


Assuntos
DNA/genética , Modelos Genéticos , Algoritmos , Animais , Sequência de Bases , Sítios de Ligação/genética , Análise por Conglomerados , Éxons/genética , Humanos , Íntrons/genética , Linguística , Camundongos , Especificidade da Espécie , Fatores de Transcrição/genética
8.
Hortic Res ; 2022 Jan 18.
Artigo em Inglês | MEDLINE | ID: mdl-35039829

RESUMO

Trichomes are specialised epidermal cells developed in the aerial surface of almost every terrestrial plant. These structures form physical barriers, which combined with their capability of synthesis of complex molecules, prevent plagues from spreading and confer trichomes a key role in the defence against herbivores. In this work, the tomato gene HAIRPLUS (HAP) that controls glandular trichome density in tomato plants was characterised. HAP belongs to a group of proteins involved in histone tail modifications although some also bind methylated DNA. HAP loss of function promotes epigenomic modifications in the tomato genome reflected in numerous differentially methylated cytosines and causes transcriptomic changes in hap mutant plants. Taken together, these findings demonstrate that HAP links epigenome remodelling with multicellular glandular trichome development and reveal that HAP is a valuable genomic tool for pest resistance in tomato breeding.

9.
BMC Genomics ; 11: 327, 2010 May 26.
Artigo em Inglês | MEDLINE | ID: mdl-20500903

RESUMO

BACKGROUND: Unmethylated stretches of CpG dinucleotides (CpG islands) are an outstanding property of mammal genomes. Conventionally, these regions are detected by sliding window approaches using %G + C, CpG observed/expected ratio and length thresholds as main parameters. Recently, clustering methods directly detect clusters of CpG dinucleotides as a statistical property of the genome sequence. RESULTS: We compare sliding-window to clustering (i.e. CpGcluster) predictions by applying new ways to detect putative functionality of CpG islands. Analyzing the co-localization with several genomic regions as a function of window size vs. statistical significance (p-value), CpGcluster shows a higher overlap with promoter regions and highly conserved elements, at the same time showing less overlap with Alu retrotransposons. The major difference in the prediction was found for short islands (CpG islets), often exclusively predicted by CpGcluster. Many of these islets seem to be functional, as they are unmethylated, highly conserved and/or located within the promoter region. Finally, we show that window-based islands can spuriously overlap several, differentially regulated promoters as well as different methylation domains, which might indicate a wrong merge of several CpG islands into a single, very long island. The shorter CpGcluster islands seem to be much more specific when concerning the overlap with alternative transcription start sites or the detection of homogenous methylation domains. CONCLUSIONS: The main difference between sliding-window approaches and clustering methods is the length of the predicted islands. Short islands, often differentially methylated, are almost exclusively predicted by CpGcluster. This suggests that CpGcluster may be the algorithm of choice to explore the function of these short, but putatively functional CpG islands.


Assuntos
Algoritmos , Ilhas de CpG , Elementos Alu/genética , Análise por Conglomerados , Sequência Conservada/genética , Metilação de DNA/genética , Evolução Molecular , Humanos , Regiões Promotoras Genéticas/genética
10.
Plants (Basel) ; 9(4)2020 Apr 22.
Artigo em Inglês | MEDLINE | ID: mdl-32331491

RESUMO

Pod maturation of common bean relies upon complex gene expression changes, which in turn are crucial for seed formation and dispersal. Hence, dissecting the transcriptional regulation of pod maturation would be of great significance for breeding programs. In this study, a comprehensive characterization of expression changes has been performed in two common bean cultivars (ancient and modern) by analyzing the transcriptomes of five developmental pod stages, from fruit setting to maturation. RNA-seq analysis allowed for the identification of key genes shared by both accessions, which in turn were homologous to known Arabidopsis maturation genes and furthermore showed a similar expression pattern along the maturation process. Gene- expression changes suggested a role in promoting an accelerated breakdown of photosynthetic and ribosomal machinery associated with chlorophyll degradation and early activation of alpha-linolenic acid metabolism. A further study of transcription factors and their DNA binding sites revealed three candidate genes whose functions may play a dominant role in regulating pod maturation. Altogether, this research identifies the first maturation gene set reported in common bean so far and contributes to a better understanding of the dynamic mechanisms of pod maturation, providing potentially useful information for genomic-assisted breeding of common bean yield and pod quality attributes.

11.
Sci Rep ; 10(1): 19073, 2020 11 04.
Artigo em Inglês | MEDLINE | ID: mdl-33149190

RESUMO

Progressive evolution, or the tendency towards increasing complexity, is a controversial issue in biology, which resolution entails a proper measurement of complexity. Genomes are the best entities to address this challenge, as they encode the historical information of a species' biotic and environmental interactions. As a case study, we have measured genome sequence complexity in the ancient phylum Cyanobacteria. To arrive at an appropriate measure of genome sequence complexity, we have chosen metrics that do not decipher biological functionality but that show strong phylogenetic signal. Using a ridge regression of those metrics against root-to-tip distance, we detected positive trends towards higher complexity in three of them. Lastly, we applied three standard tests to detect if progressive evolution is passive or driven-the minimum, ancestor-descendant, and sub-clade tests. These results provide evidence for driven progressive evolution at the genome-level in the phylum Cyanobacteria.


Assuntos
Cianobactérias/genética , Evolução Molecular , Genoma Bacteriano , Cianobactérias/classificação , Filogenia
12.
BMC Evol Biol ; 8: 107, 2008 Apr 11.
Artigo em Inglês | MEDLINE | ID: mdl-18405379

RESUMO

BACKGROUND: The phylogenetic distribution of large-scale genome structure (i.e. mosaic compositional patchiness) has been explored mainly by analytical ultracentrifugation of bulk DNA. However, with the availability of large, good-quality chromosome sequences, and the recently developed computational methods to directly analyze patchiness on the genome sequence, an evolutionary comparative analysis can be carried out at the sequence level. RESULTS: The local variations in the scaling exponent of the Detrended Fluctuation Analysis are used here to analyze large-scale genome structure and directly uncover the characteristic scales present in genome sequences. Furthermore, through shuffling experiments of selected genome regions, computationally-identified, isochore-like regions were identified as the biological source for the uncovered large-scale genome structure. The phylogenetic distribution of short- and large-scale patchiness was determined in the best-sequenced genome assemblies from eleven eukaryotic genomes: mammals (Homo sapiens, Pan troglodytes, Mus musculus, Rattus norvegicus, and Canis familiaris), birds (Gallus gallus), fishes (Danio rerio), invertebrates (Drosophila melanogaster and Caenorhabditis elegans), plants (Arabidopsis thaliana) and yeasts (Saccharomyces cerevisiae). We found large-scale patchiness of genome structure, associated with in silico determined, isochore-like regions, throughout this wide phylogenetic range. CONCLUSION: Large-scale genome structure is detected by directly analyzing DNA sequences in a wide range of eukaryotic chromosome sequences, from human to yeast. In all these genomes, large-scale patchiness can be associated with the isochore-like regions, as directly detected in silico at the sequence level.


Assuntos
Genoma/genética , Isocoros/genética , Filogenia , Animais , Arabidopsis/genética , Biologia Computacional , Cães , Genoma Fúngico/genética , Genoma Humano/genética , Genoma de Planta/genética , Humanos , Camundongos , Pan troglodytes/genética , Ratos , Saccharomyces cerevisiae/genética , Análise de Sequência de DNA , Especificidade da Espécie
13.
Methods Mol Biol ; 1766: 31-47, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29605846

RESUMO

The promoter region of around 70% of all genes in the human genome is overlapped by a CpG island (CGI). CGIs have known functions in the transcription initiation and outstanding compositional features like high G+C content and CpG ratios when compared to the bulk DNA. We have shown before that CGIs manifest as clusters of CpGs in mammalian genomes and can therefore be detected using clustering methods. These techniques have several advantages over sliding window approaches which apply compositional properties as thresholds. In this protocol we show how to determine local (CpG islands) and global (distance distribution) clustering properties of CG dinucleotides and how to generalize this analysis to any k-mer or combinations of it. In addition, we illustrate how to easily cross the output of a CpG island prediction algorithm with our methylation database to detect differentially methylated CGIs. The analysis is given in a step-by-step protocol and all necessary programs are implemented into a virtual machine or, alternatively, the software can be downloaded and easily installed.


Assuntos
Ilhas de CpG/genética , Metilação de DNA , Genoma Humano/genética , Animais , Composição de Bases , Sequência de Bases , DNA/química , DNA/genética , DNA/metabolismo , Humanos , Regiões Promotoras Genéticas/genética , Software , Iniciação da Transcrição Genética
14.
Methods Mol Biol ; 1580: 149-174, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28439833

RESUMO

High-throughput sequencing (HTS) data for small RNAs (noncoding RNA molecules that are 20-250 nucleotides in length) can now be routinely generated by minimally equipped wet laboratories; however, the bottleneck in HTS-based research has shifted now to the analysis of such huge amount of data. One of the reasons is that many analysis types require a Linux environment but computers, system administrators, and bioinformaticians suppose additional costs that often cannot be afforded by small to mid-sized groups or laboratories. Web servers are an alternative that can be used if the data is not subjected to privacy issues (what very often is an important issue with medical data). However, in any case they are less flexible than stand-alone programs limiting the number of workflows and analysis types that can be carried out.We show in this protocol how virtual machines can be used to overcome those problems and limitations. sRNAtoolboxVM is a virtual machine that can be executed on all common operating systems through virtualization programs like VirtualBox or VMware, providing the user with a high number of preinstalled programs like sRNAbench for small RNA analysis without the need to maintain additional servers and/or operating systems.


Assuntos
Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Pequeno RNA não Traduzido/genética , Software , Animais , Bases de Dados Genéticas , Perfilação da Expressão Gênica/métodos , Humanos , Pequeno RNA não Traduzido/análise , Interface Usuário-Computador
15.
BMC Bioinformatics ; 7: 446, 2006 Oct 12.
Artigo em Inglês | MEDLINE | ID: mdl-17038168

RESUMO

BACKGROUND: Despite their involvement in the regulation of gene expression and their importance as genomic markers for promoter prediction, no objective standard exists for defining CpG islands (CGIs), since all current approaches rely on a large parameter space formed by the thresholds of length, CpG fraction and G+C content. RESULTS: Given the higher frequency of CpG dinucleotides at CGIs, as compared to bulk DNA, the distance distributions between neighboring CpGs should differ for bulk and island CpGs. A new algorithm (CpGcluster) is presented, based on the physical distance between neighboring CpGs on the chromosome and able to predict directly clusters of CpGs, while not depending on the subjective criteria mentioned above. By assigning a p-value to each of these clusters, the most statistically significant ones can be predicted as CGIs. CpGcluster was benchmarked against five other CGI finders by using a test sequence set assembled from an experimental CGI library. CpGcluster reached the highest overall accuracy values, while showing the lowest rate of false-positive predictions. Since a minimum-length threshold is not required, CpGcluster can find short but fully functional CGIs usually missed by other algorithms. The CGIs predicted by CpGcluster present the lowest degree of overlap with Alu retrotransposons and, simultaneously, the highest overlap with vertebrate Phylogenetic Conserved Elements (PhastCons). CpGcluster's CGIs overlapping with the Transcription Start Site (TSS) show the highest statistical significance, as compared to the islands in other genome locations, thus qualifying CpGcluster as a valuable tool in discriminating functional CGIs from the remaining islands in the bulk genome. CONCLUSION: CpGcluster uses only integer arithmetic, thus being a fast and computationally efficient algorithm able to predict statistically significant clusters of CpG dinucleotides. Another outstanding feature is that all predicted CGIs start and end with a CpG dinucleotide, which should be appropriate for a genomic feature whose functionality is based precisely on CpG dinucleotides. The only search parameter in CpGcluster is the distance between two consecutive CpGs, in contrast to previous algorithms. Therefore, none of the main statistical properties of CpG islands (neither G+C content, CpG fraction nor length threshold) are needed as search parameters, which may lead to the high specificity and low overlap with spurious Alu elements observed for CpGcluster predictions.


Assuntos
Algoritmos , Ilhas de CpG/genética , Animais , Genoma/genética , Humanos , Camundongos
16.
Nucleic Acids Res ; 32(Web Server issue): W287-92, 2004 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-15215396

RESUMO

Isochores are long genome segments homogeneous in G+C. Here, we describe an algorithm (IsoFinder) running on the web (http://bioinfo2.ugr.es/IsoF/isofinder.html) able to predict isochores at the sequence level. We move a sliding pointer from left to right along the DNA sequence. At each position of the pointer, we compute the mean G+C values to the left and to the right of the pointer. We then determine the position of the pointer for which the difference between left and right mean values (as measured by the t-statistic) reaches its maximum. Next, we determine the statistical significance of this potential cutting point, after filtering out short-scale heterogeneities below 3 kb by applying a coarse-graining technique. Finally, the program checks whether this significance exceeds a probability threshold. If so, the sequence is cut at this point into two subsequences; otherwise, the sequence remains undivided. The procedure continues recursively for each of the two resulting subsequences created by each cut. This leads to the decomposition of a chromosome sequence into long homogeneous genome regions (LHGRs) with well-defined mean G+C contents, each significantly different from the G+C contents of the adjacent LHGRs. Most LHGRs can be identified with Bernardi's isochores, given their correlation with biological features such as gene density, SINE and LINE (short, long interspersed repetitive elements) densities, recombination rate or single nucleotide polymorphism variability. The resulting isochore maps are available at our web site (http://bioinfo2.ugr.es/isochores/), and also at the UCSC Genome Browser (http://genome.cse.ucsc.edu/).


Assuntos
Biologia Computacional , Genômica , Isocoros/química , Software , Algoritmos , Gráficos por Computador , Internet , Complexo Principal de Histocompatibilidade , Interface Usuário-Computador
17.
Phys Rev E Stat Nonlin Soft Matter Phys ; 71(6 Pt 1): 061925, 2005 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-16089783

RESUMO

We report on an entropic edge detector based on the local calculation of the Jensen-Shannon divergence with application to the search for CpG islands. CpG islands are pieces of the genome related to gene expression and cell differentiation, and thus to cancer formation. Searching for these CpG islands is a major task in genetics and bioinformatics. Some algorithms have been proposed in the literature, based on moving statistics in a sliding window, but its size may greatly influence the results. The local use of Jensen-Shannon divergence is a completely different strategy: the nucleotide composition inside the islands is different from that in their environment, so a statistical distance--the Jensen-Shannon divergence--between the composition of two adjacent windows may be used as a measure of their dissimilarity. Sliding this double window over the entire sequence allows us to segment it compositionally. The fusion of those segments into greater ones that satisfy certain identification criteria must be achieved in order to obtain the definitive results. We find that the local use of Jensen-Shannon divergence is very suitable in processing DNA sequences for searching for compositionally different structures such as CpG islands, as compared to other algorithms in literature.


Assuntos
Algoritmos , Mapeamento Cromossômico/métodos , Ilhas de CpG/genética , Bases de Dados de Ácidos Nucleicos , Genoma Humano , Modelos Genéticos , Análise de Sequência de DNA/métodos , Composição de Bases/genética , Sequência de Bases , Simulação por Computador , DNA/análise , DNA/química , DNA/genética , Humanos , Modelos Químicos , Dados de Sequência Molecular , Reconhecimento Automatizado de Padrão/métodos
18.
Gene ; 333: 121-33, 2004 May 26.
Artigo em Inglês | MEDLINE | ID: mdl-15177687

RESUMO

The sequencing of prokaryotic genomes covering a wide taxonomic range has sparked renewed interest in intrachromosomal compositional (GC) heterogeneity, largely in view of lateral transfers. We present here a brief overview of some methods for visualizing and quantifying GC variation in prokaryotes. We used these methods to examine heterogeneity levels in sequenced prokaryotes, for a range of scales or stringencies. Some species are consistently homogeneous, whereas others are markedly heterogeneous in comparison, in particular Aeropyrum pernix, Xylella fastidiosa, Mycoplasma genitalium, Enterococcus faecalis, Bacillus subtilis, Pyrobaculum aerophilum, Vibrio vulnificus chromosome I, Deinococcus radiodurans chromosome II and Halobacterium. As we discuss here, the wide range of heterogeneities calls for reexamination of an accepted belief, namely that the endogenous DNA of bacteria and archaea should typically exhibit low intrachromosomal GC contrasts. Supplementary results for all species analyzed are available at our website: http://bioinfo2.ugr.es/prok.


Assuntos
Composição de Bases/genética , DNA Bacteriano/genética , Genoma Bacteriano , Algoritmos , Pareamento de Bases/genética , Centrifugação com Gradiente de Concentração , Césio , Cloretos , Cromossomos de Archaea/genética , Cromossomos Bacterianos/genética , Códon/genética , DNA Arqueal/química , DNA Arqueal/genética , DNA Bacteriano/química , Genoma Arqueal , Isocoros/genética
19.
Gene ; 300(1-2): 117-27, 2002 Oct 30.
Artigo em Inglês | MEDLINE | ID: mdl-12468093

RESUMO

The human genome is a mosaic of isochores, which are long DNA segments (z.Gt;300 kbp) relatively homogeneous in G+C. Human isochores were first identified by density-gradient ultracentrifugation of bulk DNA, and differ in important features, e.g. genes are found predominantly in the GC-richest isochores. Here, we use a reliable segmentation method to partition the longest contigs in the human genome draft sequence into long homogeneous genome regions (LHGRs), thereby revealing the isochore structure of the human genome. The advantages of the isochore maps presented here are: (1) sequence heterogeneities at different scales are shown in the same plot; (2) pair-wise compositional differences between adjacent regions are all statistically significant; (3) isochore boundaries are accurately defined to single base pair resolution; and (4) both gradual and abrupt isochore boundaries are simultaneously revealed. Taking advantage of the wide sample of genome sequence analyzed, we investigate the correspondence between LHGRs and true human isochores revealed through DNA centrifugation. LHGRs show many of the typical isochore features, mainly size distribution, G+C range, and proportions of the isochore classes. The relative density of genes, Alu and long interspersed nuclear element repeats and the different types of single nucleotide polymorphisms on LHGRs also coincide with expectations in true isochores. Potential applications of isochore maps range from the improvement of gene-finding algorithms to the prediction of linkage disequilibrium levels in association studies between marker genes and complex traits. The coordinates for the LHGRs identified in all the contigs longer than 2 Mb in the human genome sequence are available at the online resource on isochore mapping: http://bioinfo2.ugr.es/isochores.


Assuntos
Genoma Humano , Isocoros/genética , Elementos Alu/genética , Composição de Bases , Mapeamento Cromossômico , Cromossomos Humanos Par 21/genética , Cromossomos Humanos Par 22/genética , DNA/química , DNA/genética , Genes/genética , Humanos , Elementos Nucleotídeos Longos e Dispersos/genética , Polimorfismo de Nucleotídeo Único/genética
20.
Comput Biol Chem ; 27(1): 5-10, 2003 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-12798034

RESUMO

The isochore concept in the human genome sequence was challenged in an analysis by the International Human Genome Sequencing Consortium (IHGSC). We argue here that a statement in the IHGSC's analysis concerning the existence of isochores is misleading, because the homogeneity was not examined at a large enough length scale and consequently an inappropriate statistical test was applied. A test of the existence of isochores should be equivalent to a test of homogeneity or equality of windowed GC%. The statistical test applied in the IHGSC's analysis, the binomial test, is a test of whether individual bases are independent and identically-distributed (iid). For testing the existence of isochores, or homogeneity in windowed GC%, we propose to use another statistical test: the analysis of variance (ANOVA). It can be shown that DNA sequences that are rejected by the binomial test may not be rejected by the ANOVA test.


Assuntos
Isocoros/química , Análise de Variância , Composição de Bases , Distribuição Binomial , Biologia Computacional/métodos , Biologia Computacional/estatística & dados numéricos , Ilhas de CpG , Sequência Rica em GC , Genoma Humano , Humanos , Modelos Estatísticos , Análise de Sequência de DNA/estatística & dados numéricos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA