Your browser doesn't support javascript.
loading
Assigning protein function from domain-function associations using DomFun.
Rojano, Elena; Jabato, Fernando M; Perkins, James R; Córdoba-Caballero, José; García-Criado, Federico; Sillitoe, Ian; Orengo, Christine; Ranea, Juan A G; Seoane-Zonjic, Pedro.
Afiliación
  • Rojano E; Department of Molecular Biology and Biochemistry, University of Malaga, Bulevar Louis Pasteur, 31, 29010, Malaga, Spain.
  • Jabato FM; Institute of Biomedical Research in Malaga (IBIMA), Dr. Miguel Díaz Recio, 28, 29010, Malaga, Spain.
  • Perkins JR; Department of Molecular Biology and Biochemistry, University of Malaga, Bulevar Louis Pasteur, 31, 29010, Malaga, Spain.
  • Córdoba-Caballero J; Institute of Biomedical Research in Malaga (IBIMA), Dr. Miguel Díaz Recio, 28, 29010, Malaga, Spain.
  • García-Criado F; Department of Molecular Biology and Biochemistry, University of Malaga, Bulevar Louis Pasteur, 31, 29010, Malaga, Spain. jimrperkins@gmail.com.
  • Sillitoe I; CIBER of Rare Diseases, Av. Monforte de Lemos, 3-5. Pabellon 11. Planta 0, 28029, Madrid, Spain. jimrperkins@gmail.com.
  • Orengo C; Institute of Biomedical Research in Malaga (IBIMA), Dr. Miguel Díaz Recio, 28, 29010, Malaga, Spain. jimrperkins@gmail.com.
  • Ranea JAG; Department of Molecular Biology and Biochemistry, University of Malaga, Bulevar Louis Pasteur, 31, 29010, Malaga, Spain.
  • Seoane-Zonjic P; Department of Molecular Biology and Biochemistry, University of Malaga, Bulevar Louis Pasteur, 31, 29010, Malaga, Spain.
BMC Bioinformatics ; 23(1): 43, 2022 Jan 15.
Article en En | MEDLINE | ID: mdl-35033002
ABSTRACT

BACKGROUND:

Protein function prediction remains a key challenge. Domain composition affects protein function. Here we present DomFun, a Ruby gem that uses associations between protein domains and functions, calculated using multiple indices based on tripartite network analysis. These domain-function associations are combined at the protein level, to generate protein-function predictions.

RESULTS:

We analysed 16 tripartite networks connecting homologous superfamily and FunFam domains from CATH-Gene3D with functional annotations from the three Gene Ontology (GO) sub-ontologies, KEGG, and Reactome. We validated the results using the CAFA 3 benchmark platform for GO annotation, finding that out of the multiple association metrics and domain datasets tested, Simpson index for FunFam domain-function associations combined with Stouffer's method leads to the best performance in almost all scenarios. We also found that using FunFams led to better performance than superfamilies, and better results were found for GO molecular function compared to GO biological process terms. DomFun performed as well as the highest-performing method in certain CAFA 3 evaluation procedures in terms of [Formula see text] and [Formula see text] We also implemented our own benchmark procedure, Pathway Prediction Performance (PPP), which can be used to validate function prediction for additional annotations sources, such as KEGG and Reactome. Using PPP, we found similar results to those found with CAFA 3 for GO, moreover we found good performance for the other annotation sources. As with CAFA 3, Simpson index with Stouffer's method led to the top performance in almost all scenarios.

CONCLUSIONS:

DomFun shows competitive performance with other methods evaluated in CAFA 3 when predicting proteins function with GO, although results vary depending on the evaluation procedure. Through our own benchmark procedure, PPP, we have shown it can also make accurate predictions for KEGG and Reactome. It performs best when using FunFams, combining Simpson index derived domain-function associations using Stouffer's method. The tool has been implemented so that it can be easily adapted to incorporate other protein features, such as domain data from other sources, amino acid k-mers and motifs. The DomFun Ruby gem is available from https//rubygems.org/gems/DomFun . Code maintained at https//github.com/ElenaRojano/DomFun . Validation procedure scripts can be found at https//github.com/ElenaRojano/DomFun_project .
Asunto(s)
Palabras clave

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Proteínas / Biología Computacional Tipo de estudio: Prognostic_studies / Risk_factors_studies Idioma: En Revista: BMC Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2022 Tipo del documento: Article País de afiliación: España

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Proteínas / Biología Computacional Tipo de estudio: Prognostic_studies / Risk_factors_studies Idioma: En Revista: BMC Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2022 Tipo del documento: Article País de afiliación: España