Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 2 de 2
Filtrar
Más filtros

Bases de datos
Tipo de estudio
Tipo del documento
Asunto de la revista
País de afiliación
Intervalo de año de publicación
1.
Bioinformatics ; 19(9): 1070-8, 2003 Jun 12.
Artículo en Inglés | MEDLINE | ID: mdl-12801867

RESUMEN

MOTIVATION: A major challenge in gene expression analysis is effective data organization and visualization. One of the most popular tools for this task is hierarchical clustering. Hierarchical clustering allows a user to view relationships in scales ranging from single genes to large sets of genes, while at the same time providing a global view of the expression data. However, hierarchical clustering is very sensitive to noise, it usually lacks of a method to actually identify distinct clusters, and produces a large number of possible leaf orderings of the hierarchical clustering tree. In this paper we propose a new hierarchical clustering algorithm which reduces susceptibility to noise, permits up to k siblings to be directly related, and provides a single optimal order for the resulting tree. RESULTS: We present an algorithm that efficiently constructs a k-ary tree, where each node can have up to k children, and then optimally orders the leaves of that tree. By combining k clusters at each step our algorithm becomes more robust against noise and missing values. By optimally ordering the leaves of the resulting tree we maintain the pairwise relationships that appear in the original method, without sacrificing the robustness. Our k-ary construction algorithm runs in O(n(3)) regardless of k and our ordering algorithm runs in O(4(k)n(3)). We present several examples that show that our k-ary clustering algorithm achieves results that are superior to the binary tree results in both global presentation and cluster identification. AVAILABILITY: We have implemented the above algorithms in C++ on the Linux operating system.


Asunto(s)
Algoritmos , Análisis por Conglomerados , Árboles de Decisión , Perfilación de la Expresión Génica/métodos , Reconocimiento de Normas Patrones Automatizadas , Análisis de Secuencia de ADN/métodos , Animales , Bolsa de Fabricio/patología , Pollos , Regulación Neoplásica de la Expresión Génica/genética , Genes myc/genética , Linfoma de Células B/genética , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Control de Calidad , Alineación de Secuencia , Homología de Secuencia , Procesos Estocásticos , Interfaz Usuario-Computador
2.
Proc Natl Acad Sci U S A ; 101(28): 10349-54, 2004 Jul 13.
Artículo en Inglés | MEDLINE | ID: mdl-15240876

RESUMEN

Mammalian genomes are densely populated with long duplicated sequences. In this paper, we demonstrate the existence of doublets, short duplications between 25 and 100 bp, distinct from previously described repeats. Each doublet is a pair of exact matches, separated by some distance. The distribution of these intermatch distances is strikingly nonrandom. An unexpectedly high number of doublets have matches either within 100 bp (adjacent) or at distances tightly concentrated approximately 1,000 bp apart (nearby). We focus our study on these proximate doublets. First, they tend to have both matches on the same strand. By comparing nearby doublets shared in human and chimpanzee, we can also see that these doublets seem to arise by an insertion event that produces a copy without markedly affecting the surrounding sequence. Most doublets in humans are shared with chimpanzee, but many new pairs arose after the divergence of the species. Doublets found in human but not chimpanzee are most often composed of almost tandem matches, whereas older doublets (found in both species) are more likely to have matches spaced by approximately 1 kb, indicating that the nearly tandem doublets may be more dynamic. The spacing of doublets is highly conserved. So far, we have found clearly recognizable doublets in the following genomes: Homo sapiens, Mus musculus, Arabidopsis thaliana, and Caenorhabditis elegans, indicating that the mechanism generating these doublets is widespread. A mechanism that generates short local duplications while conserving polarity could have a profound impact on the evolution of regulatory and protein-coding sequences.


Asunto(s)
Evolución Molecular , Duplicación de Gen , Genoma Humano , Animales , Arabidopsis , Secuencia de Bases , Caenorhabditis elegans , Elementos Transponibles de ADN/genética , Humanos , Ratones , Datos de Secuencia Molecular , Pan troglodytes
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA