Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 2 de 2
Filter
Add more filters










Database
Language
Publication year range
1.
BMC Bioinformatics ; 7: 446, 2006 Oct 12.
Article in English | MEDLINE | ID: mdl-17038168

ABSTRACT

BACKGROUND: Despite their involvement in the regulation of gene expression and their importance as genomic markers for promoter prediction, no objective standard exists for defining CpG islands (CGIs), since all current approaches rely on a large parameter space formed by the thresholds of length, CpG fraction and G+C content. RESULTS: Given the higher frequency of CpG dinucleotides at CGIs, as compared to bulk DNA, the distance distributions between neighboring CpGs should differ for bulk and island CpGs. A new algorithm (CpGcluster) is presented, based on the physical distance between neighboring CpGs on the chromosome and able to predict directly clusters of CpGs, while not depending on the subjective criteria mentioned above. By assigning a p-value to each of these clusters, the most statistically significant ones can be predicted as CGIs. CpGcluster was benchmarked against five other CGI finders by using a test sequence set assembled from an experimental CGI library. CpGcluster reached the highest overall accuracy values, while showing the lowest rate of false-positive predictions. Since a minimum-length threshold is not required, CpGcluster can find short but fully functional CGIs usually missed by other algorithms. The CGIs predicted by CpGcluster present the lowest degree of overlap with Alu retrotransposons and, simultaneously, the highest overlap with vertebrate Phylogenetic Conserved Elements (PhastCons). CpGcluster's CGIs overlapping with the Transcription Start Site (TSS) show the highest statistical significance, as compared to the islands in other genome locations, thus qualifying CpGcluster as a valuable tool in discriminating functional CGIs from the remaining islands in the bulk genome. CONCLUSION: CpGcluster uses only integer arithmetic, thus being a fast and computationally efficient algorithm able to predict statistically significant clusters of CpG dinucleotides. Another outstanding feature is that all predicted CGIs start and end with a CpG dinucleotide, which should be appropriate for a genomic feature whose functionality is based precisely on CpG dinucleotides. The only search parameter in CpGcluster is the distance between two consecutive CpGs, in contrast to previous algorithms. Therefore, none of the main statistical properties of CpG islands (neither G+C content, CpG fraction nor length threshold) are needed as search parameters, which may lead to the high specificity and low overlap with spurious Alu elements observed for CpGcluster predictions.


Subject(s)
Algorithms , CpG Islands/genetics , Animals , Genome/genetics , Humans , Mice
2.
Phys Rev E Stat Nonlin Soft Matter Phys ; 71(6 Pt 1): 061925, 2005 Jun.
Article in English | MEDLINE | ID: mdl-16089783

ABSTRACT

We report on an entropic edge detector based on the local calculation of the Jensen-Shannon divergence with application to the search for CpG islands. CpG islands are pieces of the genome related to gene expression and cell differentiation, and thus to cancer formation. Searching for these CpG islands is a major task in genetics and bioinformatics. Some algorithms have been proposed in the literature, based on moving statistics in a sliding window, but its size may greatly influence the results. The local use of Jensen-Shannon divergence is a completely different strategy: the nucleotide composition inside the islands is different from that in their environment, so a statistical distance--the Jensen-Shannon divergence--between the composition of two adjacent windows may be used as a measure of their dissimilarity. Sliding this double window over the entire sequence allows us to segment it compositionally. The fusion of those segments into greater ones that satisfy certain identification criteria must be achieved in order to obtain the definitive results. We find that the local use of Jensen-Shannon divergence is very suitable in processing DNA sequences for searching for compositionally different structures such as CpG islands, as compared to other algorithms in literature.


Subject(s)
Algorithms , Chromosome Mapping/methods , CpG Islands/genetics , Databases, Nucleic Acid , Genome, Human , Models, Genetic , Sequence Analysis, DNA/methods , Base Composition/genetics , Base Sequence , Computer Simulation , DNA/analysis , DNA/chemistry , DNA/genetics , Humans , Models, Chemical , Molecular Sequence Data , Pattern Recognition, Automated/methods
SELECTION OF CITATIONS
SEARCH DETAIL