Your browser doesn't support javascript.
loading
Computational analysis and prediction of PE_PGRS proteins using machine learning.
Li, Fuyi; Guo, Xudong; Xiang, Dongxu; Pitt, Miranda E; Bainomugisa, Arnold; Coin, Lachlan J M.
Affiliation
  • Li F; Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, VIC 3000, Australia.
  • Guo X; School of Information Engineering, Ningxia University, Yinchuan, Ningxia 750021, China.
  • Xiang D; Faculty of Engineering and Information Technology, The University of Melbourne, VIC 3000, Australia.
  • Pitt ME; Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, VIC 3000, Australia.
  • Bainomugisa A; Queensland Mycobacterium Reference Laboratory, Brisbane, Australia.
  • Coin LJM; Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, VIC 3000, Australia.
Comput Struct Biotechnol J ; 20: 662-674, 2022.
Article in En | MEDLINE | ID: mdl-35140886
Mycobacterium tuberculosis genome comprises approximately 10% of two families of poorly characterised genes due to their high GC content and highly repetitive nature. The largest sub-group, the proline-glutamic acid polymorphic guanine-cytosine-rich sequence (PE_PGRS) family, is thought to be involved in host response and disease pathogenicity. Due to their high genetic variability and complexity of analysis, they are typically disregarded for further research in genomic studies. There are currently limited online resources and homology computational tools that can identify and analyse PE_PGRS proteins. In addition, they are computational-intensive and time-consuming, and lack sensitivity. Therefore, computational methods that can rapidly and accurately identify PE_PGRS proteins are valuable to facilitate the functional elucidation of the PE_PGRS family proteins. In this study, we developed the first machine learning-based bioinformatics approach, termed PEPPER, to allow users to identify PE_PGRS proteins rapidly and accurately. PEPPER was built upon a comprehensive evaluation of 13 popular machine learning algorithms with various sequence and physicochemical features. Empirical studies demonstrated that PEPPER achieved significantly better performance than alignment-based approaches, BLASTP and PHMMER, in both prediction accuracy and speed. PEPPER is anticipated to facilitate community-wide efforts to conduct high-throughput identification and analysis of PE_PGRS proteins.
Key words

Full text: 1 Collection: 01-internacional Database: MEDLINE Type of study: Prognostic_studies / Risk_factors_studies Language: En Journal: Comput Struct Biotechnol J Year: 2022 Document type: Article Affiliation country: Country of publication:

Full text: 1 Collection: 01-internacional Database: MEDLINE Type of study: Prognostic_studies / Risk_factors_studies Language: En Journal: Comput Struct Biotechnol J Year: 2022 Document type: Article Affiliation country: Country of publication: