Evaluation of clustering algorithms for gene expression data using gene ontology annotations.
Chin Med J (Engl)
; 125(17): 3048-52, 2012 Sep.
Article
em En
| MEDLINE
| ID: mdl-22932178
BACKGROUND: Clustering is a useful exploratory technique for interpreting gene expression data to reveal groups of genes sharing common functional attributes. Biologists frequently face the problem of choosing an appropriate algorithm. We aimed to provide a standalone, easily accessible and biologically oriented criterion for expression data clustering evaluation. METHODS: An external criterion utilizing annotation based similarities between genes is proposed in this work. Gene ontology information is employed as the annotation source. Comparisons among six widely used clustering algorithms over various types of gene expression data sets were carried out based on the criterion proposed. RESULTS: The rank of these algorithms given by the criterion coincides with our common knowledge. Single-linkage has significantly poorer performance, even worse than the random algorithm. Ward's method archives the best performance in most cases. CONCLUSIONS: The criterion proposed has a strong ability to distinguish among different clustering algorithms with different distance measurements. It is also demonstrated that analyzing main contributors of the criterion may offer some guidelines in finding local compact clusters. As an addition, we suggest using Ward's algorithm for gene expression data analysis.
Buscar no Google
Base de dados:
MEDLINE
Assunto principal:
Algoritmos
/
Perfilação da Expressão Gênica
/
Anotação de Sequência Molecular
Limite:
Humans
Idioma:
En
Ano de publicação:
2012
Tipo de documento:
Article