Integrating biological knowledge based on functional annotations for biclustering of gene expression data.

Nepomuceno, Juan A; Troncoso, Alicia; Nepomuceno-Chamorro, Isabel A; Aguilar-Ruiz, Jesús S

Nepomuceno, Juan A; Troncoso, Alicia; Nepomuceno-Chamorro, Isabel A; Aguilar-Ruiz, Jesús S.

Afiliação

Nepomuceno JA; Departamento de Lenguajes y Sistemas Informáticos, Universidad de Sevilla, Avd. Reina Mercedes s/n, 41012 Seville, Spain. Electronic address: janepo@us.es.
Troncoso A; Department of Computer Engineering, Pablo de Olavide University, Ctra. Utrera km. 1, 41013 Seville, Spain.
Nepomuceno-Chamorro IA; Departamento de Lenguajes y Sistemas Informáticos, Universidad de Sevilla, Avd. Reina Mercedes s/n, 41012 Seville, Spain.
Aguilar-Ruiz JS; Department of Computer Engineering, Pablo de Olavide University, Ctra. Utrera km. 1, 41013 Seville, Spain.

Comput Methods Programs Biomed ; 119(3): 163-80, 2015 May.

Article em En | MEDLINE | ID: mdl-25843807

ABSTRACT

ABSTRACT

Gene expression data analysis is based on the assumption that co-expressed genes imply co-regulated genes. This assumption is being reformulated because the co-expression of a group of genes may be the result of an independent activation with respect to the same experimental condition and not due to the same regulatory regime. For this reason, traditional techniques are recently being improved with the use of prior biological knowledge from open-access repositories together with gene expression data. Biclustering is an unsupervised machine learning technique that searches patterns in gene expression data matrices. A scatter search-based biclustering algorithm that integrates biological information is proposed in this paper. In addition to the gene expression data matrix, the input of the algorithm is only a direct annotation file that relates each gene to a set of terms from a biological repository where genes are annotated. Two different biological measures, FracGO and SimNTO, are proposed to integrate this information by means of its addition to-be-optimized fitness function in the scatter search scheme. The measure FracGO is based on the biological enrichment and SimNTO is based on the overlapping among GO annotations of pairs of genes. Experimental results evaluate the proposed algorithm for two datasets and show the algorithm performs better when biological knowledge is integrated. Moreover, the analysis and comparison between the two different biological measures is presented and it is concluded that the differences depend on both the data source and how the annotation file has been built in the case GO is used. It is also shown that the proposed algorithm obtains a greater number of enriched biclusters than other classical biclustering algorithms typically used as benchmark and an analysis of the overlapping among biclusters reveals that the biclusters obtained present a low overlapping. The proposed methodology is a general-purpose algorithm which allows the integration of biological information from several sources and can be extended to other biclustering algorithms based on the optimization of a merit function.

Assuntos

Algoritmos; Perfilação da Expressão Gênica/estatística & dados numéricos; Anotação de Sequência Molecular/estatística & dados numéricos; Aprendizado de Máquina não Supervisionado/estatística & dados numéricos; Análise por Conglomerados; Mineração de Dados; Bases de Dados Genéticas/estatística & dados numéricos; Ontologia Genética/estatística & dados numéricos; Genes Fúngicos; Bases de Conhecimento; Leveduras/genética

Palavras-chave

Biclustering of gene expression data; Integration of biological knowledge; Scatter search

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Algoritmos / Perfilação da Expressão Gênica / Anotação de Sequência Molecular / Aprendizado de Máquina não Supervisionado Idioma: En Revista: Comput Methods Programs Biomed Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2015 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google