RESUMO
BACKGROUND: Most predictive methods currently available for the identification of protein secretion mechanisms have focused on classically secreted proteins. In fact, only two methods have been reported for predicting non-classically secreted proteins of Gram-positive bacteria. This study describes the implementation of a sequence-based classifier, denoted as NClassG+, for identifying non-classically secreted Gram-positive bacterial proteins. RESULTS: Several feature-based classifiers were trained using different sequence transformation vectors (frequencies, dipeptides, physicochemical factors and PSSM) and Support Vector Machines (SVMs) with Linear, Polynomial and Gaussian kernel functions. Nested k-fold cross-validation (CV) was applied to select the best models, using the inner CV loop to tune the model parameters and the outer CV group to compute the error. The parameters and Kernel functions and the combinations between all possible feature vectors were optimized using grid search. CONCLUSIONS: The final model was tested against an independent set not previously seen by the model, obtaining better predictive performance compared to SecretomeP V2.0 and SecretPV2.0 for the identification of non-classically secreted proteins. NClassG+ is freely available on the web at http://www.biolisi.unal.edu.co/web-servers/nclassgpositive/.
Assuntos
Proteínas de Bactérias/classificação , Biologia Computacional/métodos , Bactérias Gram-Positivas/metabolismo , Software , Inteligência Artificial , Proteínas de Bactérias/metabolismo , Bases de Dados de Proteínas , Modelos TeóricosRESUMO
The mycobacterial cell envelope has been implicated in the pathogenicity of tuberculosis and therefore has been a prime target for the identification and characterization of surface proteins with potential application in drug and vaccine development. In this study, the genome of Mycobacterium tuberculosis H37Rv was screened using Machine Learning tools that included feature-based predictors, general localizers and transmembrane topology predictors to identify proteins that are potentially secreted to the surface of M. tuberculosis, or to the extracellular milieu through different secretory pathways. The subcellular localization of a set of 8 hypothetically secreted/surface candidate proteins was experimentally assessed by cellular fractionation and immunoelectron microscopy (IEM) to determine the reliability of the computational methodology proposed here, using 4 secreted/surface proteins with experimental confirmation as positive controls and 2 cytoplasmic proteins as negative controls. Subcellular fractionation and IEM studies provided evidence that the candidate proteins Rv0403c, Rv3630, Rv1022, Rv0835, Rv0361 and Rv0178 are secreted either to the mycobacterial surface or to the extracellular milieu. Surface localization was also confirmed for the positive controls, whereas negative controls were located on the cytoplasm. Based on statistical learning methods, we obtained computational subcellular localization predictions that were experimentally assessed and allowed us to construct a computational protocol with experimental support that allowed us to identify a new set of secreted/surface proteins as potential vaccine candidates.
Assuntos
Proteínas da Membrana Bacteriana Externa/metabolismo , Biologia Computacional/métodos , Mycobacterium tuberculosis/metabolismo , Animais , Anticorpos Antibacterianos/química , Anticorpos Antibacterianos/metabolismo , Inteligência Artificial , Proteínas da Membrana Bacteriana Externa/química , Fracionamento Celular , Eletroforese em Gel de Poliacrilamida , Epitopos de Linfócito B/imunologia , Epitopos de Linfócito B/metabolismo , Escherichia coli/metabolismo , Immunoblotting , Microscopia Imunoeletrônica , Modelos Estatísticos , Mycobacterium smegmatis/metabolismo , Mycobacterium tuberculosis/química , Peptídeos/imunologia , Peptídeos/metabolismo , Coelhos , Sonicação , Frações Subcelulares/metabolismoRESUMO
BACKGROUND: The computational prediction of mycobacterial proteins' subcellular localization is of key importance for proteome annotation and for the identification of new drug targets and vaccine candidates. Several subcellular localization classifiers have been developed over the past few years, which have comprised both general localization and feature-based classifiers. Here, we have validated the ability of different bioinformatics approaches, through the use of SignalP 2.0, TatP 1.0, LipoP 1.0, Phobius, PA-SUB 2.5, PSORTb v.2.0.4 and Gpos-PLoc, to predict secreted bacterial proteins. These computational tools were compared in terms of sensitivity, specificity and Matthew's correlation coefficient (MCC) using a set of mycobacterial proteins having less than 40% identity, none of which are included in the training data sets of the validated tools and whose subcellular localization have been experimentally confirmed. These proteins belong to the TBpred training data set, a computational tool specifically designed to predict mycobacterial proteins. RESULTS: A final validation set of 272 mycobacterial proteins was obtained from the initial set of 852 mycobacterial proteins. According to the results of the validation metrics, all tools presented specificity above 0.90, while dispersion sensitivity and MCC values were above 0.22. PA-SUB 2.5 presented the highest values; however, these results might be biased due to the methodology used by this tool. PSORTb v.2.0.4 left 56 proteins out of the classification, while Gpos-PLoc left just one protein out. CONCLUSION: Both subcellular localization approaches had high predictive specificity and high recognition of true negatives for the tested data set. Among those tools whose predictions are not based on homology searches against SWISS-PROT, Gpos-PLoc was the general localization tool with the best predictive performance, while SignalP 2.0 was the best tool among the ones using a feature-based approach. Even though PA-SUB 2.5 presented the highest metrics, it should be taken into account that this tool was trained using all proteins reported in SWISS-PROT, which includes the protein set tested in this study, either as a BLAST search or as a training model.
Assuntos
Proteínas de Bactérias/análise , Biologia Computacional/métodos , Mycobacterium/química , Software , Algoritmos , Proteínas de Bactérias/química , Bases de Dados de Proteínas , Mycobacterium/metabolismoRESUMO
BACKGROUND: This study describes a bioinformatics approach designed to identify Plasmodium vivax proteins potentially involved in reticulocyte invasion. Specifically, different protein training sets were built and tuned based on different biological parameters, such as experimental evidence of secretion and/or involvement in invasion-related processes. A profile-based sequence method supported by hidden Markov models (HMMs) was then used to build classifiers to search for biologically-related proteins. The transcriptional profile of the P. vivax intra-erythrocyte developmental cycle was then screened using these classifiers. RESULTS: A bioinformatics methodology for identifying potentially secreted P. vivax proteins was designed using sequence redundancy reduction and probabilistic profiles. This methodology led to identifying a set of 45 proteins that are potentially secreted during the P. vivax intra-erythrocyte development cycle and could be involved in cell invasion. Thirteen of the 45 proteins have already been described as vaccine candidates; there is experimental evidence of protein expression for 7 of the 32 remaining ones, while no previous studies of expression, function or immunology have been carried out for the additional 25. CONCLUSIONS: The results support the idea that probabilistic techniques like profile HMMs improve similarity searches. Also, different adjustments such as sequence redundancy reduction using Pisces or Cd-Hit allowed data clustering based on rational reproducible measurements. This kind of approach for selecting proteins with specific functions is highly important for supporting large-scale analyses that could aid in the identification of genes encoding potential new target antigens for vaccine development and drug design. The present study has led to targeting 32 proteins for further testing regarding their ability to induce protective immune responses against P. vivax malaria.