Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 3 de 3
Filtrar
Más filtros

Banco de datos
Tipo del documento
País de afiliación
Intervalo de año de publicación
1.
Proc Natl Acad Sci U S A ; 108(41): 16916-21, 2011 Oct 11.
Artículo en Inglés | MEDLINE | ID: mdl-21949369

RESUMEN

The goal of dimensionality reduction is to embed high-dimensional data in a low-dimensional space while preserving structure in the data relevant to exploratory data analysis such as clusters. However, existing dimensionality reduction methods often either fail to separate clusters due to the crowding problem or can only separate clusters at a single resolution. We develop a new approach to dimensionality reduction: tree preserving embedding. Our approach uses the topological notion of connectedness to separate clusters at all resolutions. We provide a formal guarantee of cluster separation for our approach that holds for finite samples. Our approach requires no parameters and can handle general types of data, making it easy to use in practice and suggesting new strategies for robust data visualization.


Asunto(s)
Interpretación Estadística de Datos , Algoritmos , Análisis por Conglomerados , Escritura Manual , Modelos Estadísticos , Radar , Análisis de Secuencia de Proteína/estadística & datos numéricos
2.
Stat Appl Genet Mol Biol ; 8: Article 13, 2009.
Artículo en Inglés | MEDLINE | ID: mdl-19222380

RESUMEN

In this paper, we address the problem of detecting outlier samples with highly different expression patterns in microarray data. Although outliers are not common, they appear even in widely used benchmark data sets and can negatively affect microarray data analysis. It is important to identify outliers in order to explore underlying experimental or biological problems and remove erroneous data. We propose an outlier detection method based on principal component analysis (PCA) and robust estimation of Mahalanobis distances that is fully automatic. We demonstrate that our outlier detection method identifies biologically significant outliers with high accuracy and that outlier removal improves the prediction accuracy of classifiers. Our outlier detection method is closely related to existing robust PCA methods, so we compare our outlier detection method to a prominent robust PCA method.


Asunto(s)
Análisis de Secuencia por Matrices de Oligonucleótidos/estadística & datos numéricos , Acampadores DRG/estadística & datos numéricos , Neoplasias del Colon/diagnóstico , Neoplasias del Colon/genética , Bases de Datos Genéticas , Humanos , Análisis de Componente Principal
3.
BMC Genomics ; 6: 149, 2005 Oct 31.
Artículo en Inglés | MEDLINE | ID: mdl-16262895

RESUMEN

BACKGROUND: High throughput microarray-based single nucleotide polymorphism (SNP) genotyping has revolutionized the way genome-wide linkage scans and association analyses are performed. One of the key features of the array-based GeneChip Mapping 10K Array from Affymetrix is the automated SNP calling algorithm. The Affymetrix algorithm was trained on a database of ethnically diverse DNA samples to create SNP call zones that are used as static models to make genotype calls for experimental data. We describe here the implementation of clustering algorithms on large training datasets resulting in improved SNP call rates on the 10K GeneChip. RESULTS: A database of 948 individuals genotyped on the GeneChip Mapping 10K 2.0 Array was used to identify 822 SNPs that were called consistently less than 75% of the time. These SNPs represent on average 8.25% of the total SNPs on each chromosome with chromosome 19, the most gene-rich chromosome, containing the highest proportion of poor performers (18.7%). To remedy this, we created SNiPer, a new application which uses two clustering algorithms to yield increased call rates and equivalent concordance to Affymetrix called genotypes. We include a training set for these algorithms based on individual genotypes for 705 samples. SNiPer has the capability to be retrained for lab-specific training sets. SNiPer is freely available for download at http://www.tgen.org/neurogenomics/data. CONCLUSION: The correct calling of poor performing SNPs may prove to be key in future linkage studies performed on the 10K GeneChip. It would prove particularly invaluable for those diseases that map to chromosome 19, known to contain a high proportion of poorly performing SNPs. Our results illustrate that SNiPer can be used to increase call rates on the 10K GeneChip without sacrificing accuracy, thereby increasing the amount of valid data generated.


Asunto(s)
Mapeo Cromosómico/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Polimorfismo de Nucleótido Simple , Algoritmos , Cromosomas Humanos Par 19/genética , Análisis por Conglomerados , Análisis Mutacional de ADN , Perfilación de la Expresión Génica , Frecuencia de los Genes , Ligamiento Genético , Genotipo , Humanos , Modelos Estadísticos , Reacción en Cadena de la Polimerasa , Reproducibilidad de los Resultados , Alineación de Secuencia/métodos , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA