Improving automatic GO annotation with semantic similarity.

Sarker, Bishnu; Khare, Navya; Devignes, Marie-Dominique; Aridhi, Sabeur

Sarker, Bishnu; Khare, Navya; Devignes, Marie-Dominique; Aridhi, Sabeur.

Afiliação

Sarker B; CNRS, Inria, LORIA, University of Lorraine, 54000, Nancy, France.
Khare N; Khulna University of Engineering and Technology, Khulna, Bangladesh.
Devignes MD; School of Applied Computational Sciences, Meharry Medical College, Nashville, TN, USA.
Aridhi S; CNRS, Inria, LORIA, University of Lorraine, 54000, Nancy, France.

BMC Bioinformatics ; 23(Suppl 2): 433, 2022 Dec 12.

Article em En | MEDLINE | ID: mdl-36510133

ABSTRACT

ABSTRACT

BACKGROUND:

Automatic functional annotation of proteins is an open research problem in bioinformatics. The growing number of protein entries in public databases, for example in UniProtKB, poses challenges in manual functional annotation. Manual annotation requires expert human curators to search and read related research articles, interpret the results, and assign the annotations to the proteins. Thus, it is a time-consuming and expensive process. Therefore, designing computational tools to perform automatic annotation leveraging the high quality manual annotations that already exist in UniProtKB/SwissProt is an important research problem

RESULTS:

In this paper, we extend and adapt the GrAPFI (graph-based automatic protein function inference) (Sarker et al. in BMC Bioinform 21, 2020; Sarker et al., in Proceedings of 7th international conference on complex networks and their applications, Cambridge, 2018) method for automatic annotation of proteins with gene ontology (GO) terms renaming it as GrAPFI-GO. The original GrAPFI method uses label propagation in a similarity graph where proteins are linked through the domains, families, and superfamilies that they share. Here, we also explore various types of similarity measures based on common neighbors in the graph. Moreover, GO terms are arranged in a hierarchical manner according to semantic parent-child relations. Therefore, we propose an efficient pruning and post-processing technique that integrates both semantic similarity and hierarchical relations between the GO terms. We produce experimental results comparing the GrAPFI-GO method with and without considering common neighbors similarity. We also test the performance of GrAPFI-GO and other annotation tools for GO annotation on a benchmark of proteins with and without the proposed pruning and post-processing procedure.

CONCLUSION:

Our results show that the proposed semantic hierarchical post-processing potentially improves the performance of GrAPFI-GO and of other annotation tools as well. Thus, GrAPFI-GO exposes an original efficient and reusable procedure, to exploit the semantic relations among the GO terms in order to improve the automatic annotation of protein functions.

Assuntos

Biologia Computacional; Semântica; Humanos; Ontologia Genética; Anotação de Sequência Molecular; Biologia Computacional/métodos; Bases de Dados de Proteínas; Proteínas/química

Palavras-chave

Domain similarity network; Gene ontology annotation; GrAPFI; K-nearest neighbor; Label propagation; Protein function annotation; Semantic similarity

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Semântica / Biologia Computacional Limite: Humans Idioma: En Ano de publicação: 2022 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google