A model to predict the function of hypothetical proteins through a nine-point classification scoring schema.

Ijaq, Johny; Malik, Girik; Kumar, Anuj; Das, Partha Sarathi; Meena, Narendra; Bethi, Neeraja; Sundararajan, Vijayaraghava Seshadri; Suravajhala, Prashanth

Ijaq, Johny; Malik, Girik; Kumar, Anuj; Das, Partha Sarathi; Meena, Narendra; Bethi, Neeraja; Sundararajan, Vijayaraghava Seshadri; Suravajhala, Prashanth.

Afiliação

Ijaq J; Department of Biotechnology, Osmania University, Hyderabad, 500007, India.
Malik G; Bioclues.org, Kukatpally, Hyderabad, 500072, India.
Kumar A; Department of Pediatrics, The Battelle Center for Mathematical Medicine, The Research Institute at Nationwide Children's Hospital, The Ohio State University, Columbus, OH, USA.
Das PS; Bioclues.org, Kukatpally, Hyderabad, 500072, India.
Meena N; Labrynthe, New Delhi, India.
Bethi N; Bioclues.org, Kukatpally, Hyderabad, 500072, India.
Sundararajan VS; Advanced Center for Computational and Applied Biotechnology, Uttarakhand Council for Biotechnology, Dehradun, 248007, India.
Suravajhala P; Bioclues.org, Kukatpally, Hyderabad, 500072, India.

BMC Bioinformatics ; 20(1): 14, 2019 Jan 08.

Article em En | MEDLINE | ID: mdl-30621574

ABSTRACT

ABSTRACT

BACKGROUND:

Hypothetical proteins [HP] are those that are predicted to be expressed in an organism, but no evidence of their existence is known. In the recent past, annotation and curation efforts have helped overcome the challenge in understanding their diverse functions. Techniques to decipher sequence-structure-function relationship, especially in terms of functional modelling of the HPs have been developed by researchers, but using the features as classifiers for HPs has not been attempted. With the rise in number of annotation strategies, next-generation sequencing methods have provided further understanding the functions of HPs.

RESULTS:

In our previous work, we developed a six-point classification scoring schema with annotation pertaining to protein family scores, orthology, protein interaction/association studies, bidirectional best BLAST hits, sorting signals, known databases and visualizers which were used to validate protein interactions. In this study, we introduced three more classifiers to our annotation system, viz. pseudogenes linked to HPs, homology modelling and non-coding RNAs associated to HPs. We discuss the challenges and performance of these classifiers using machine learning heuristics with an improved accuracy from Perceptron (81.08 to 97.67), Naive Bayes (54.05 to 96.67), Decision tree J48 (67.57 to 97.00), and SMO_npolyk (59.46 to 96.67).

CONCLUSION:

With the introduction of three new classification features, the performance of the nine-point classification scoring schema has an improved accuracy to functionally annotate the HPs.

Assuntos

Proteínas/classificação; Teorema de Bayes; Humanos

Palavras-chave

Classification features; Functional genomics; Hypothetical proteins; Machine learning

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Proteínas Tipo de estudo: Prognostic_studies / Risk_factors_studies Limite: Humans Idioma: En Ano de publicação: 2019 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google