RESUMO
Vaccination is one of the most significant inventions in medicine. Reverse vaccinology (RV) is a state-of-the-art technique to predict vaccine candidates from pathogen's genome(s). To promote vaccine development, we updated Vaxign2, the first web-based vaccine design program using reverse vaccinology with machine learning. Vaxign2 is a comprehensive web server for rational vaccine design, consisting of predictive and computational workflow components. The predictive part includes the original Vaxign filtering-based method and a new machine learning-based method, Vaxign-ML. The benchmarking results using a validation dataset showed that Vaxign-ML had superior prediction performance compared to other RV tools. Besides the prediction component, Vaxign2 implemented various post-prediction analyses to significantly enhance users' capability to refine the prediction results based on different vaccine design rationales and considerably reduce user time to analyze the Vaxign/Vaxign-ML prediction results. Users provide proteome sequences as input data, select candidates based on Vaxign outputs and Vaxign-ML scores, and perform post-prediction analysis. Vaxign2 also includes precomputed results from approximately 1 million proteins in 398 proteomes of 36 pathogens. As a demonstration, Vaxign2 was used to effectively analyse SARS-CoV-2, the coronavirus causing COVID-19. The comprehensive framework of Vaxign2 can support better and more rational vaccine design. Vaxign2 is publicly accessible at http://www.violinet.org/vaxign2.
Assuntos
Desenho de Fármacos , Internet , Aprendizado de Máquina , Software , Vacinas , Vacinologia/métodos , Antígenos Virais/química , Antígenos Virais/imunologia , COVID-19/virologia , Vacinas contra COVID-19/química , Vacinas contra COVID-19/imunologia , Epitopos/química , Epitopos/imunologia , Humanos , Proteoma , SARS-CoV-2/química , SARS-CoV-2/imunologia , SARS-CoV-2/metabolismo , Glicoproteína da Espícula de Coronavírus/química , Glicoproteína da Espícula de Coronavírus/imunologia , Vacinas/química , Vacinas/imunologia , Fluxo de TrabalhoRESUMO
MOTIVATION: Reverse vaccinology (RV) is a milestone in rational vaccine design, and machine learning (ML) has been applied to enhance the accuracy of RV prediction. However, ML-based RV still faces challenges in prediction accuracy and program accessibility. RESULTS: This study presents Vaxign-ML, a supervised ML classification to predict bacterial protective antigens (BPAgs). To identify the best ML method with optimized conditions, five ML methods were tested with biological and physiochemical features extracted from well-defined training data. Nested 5-fold cross-validation and leave-one-pathogen-out validation were used to ensure unbiased performance assessment and the capability to predict vaccine candidates against a new emerging pathogen. The best performing model (eXtreme Gradient Boosting) was compared to three publicly available programs (Vaxign, VaxiJen, and Antigenic), one SVM-based method, and one epitope-based method using a high-quality benchmark dataset. Vaxign-ML showed superior performance in predicting BPAgs. Vaxign-ML is hosted in a publicly accessible web server and a standalone version is also available. AVAILABILITY AND IMPLEMENTATION: Vaxign-ML website at http://www.violinet.org/vaxign/vaxign-ml, Docker standalone Vaxign-ML available at https://hub.docker.com/r/e4ong1031/vaxign-ml and source code is available at https://github.com/VIOLINet/Vaxign-ML-docker. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Antígenos de Bactérias , Vacinologia , Biologia Computacional , Aprendizado de Máquina , Software , Aprendizado de Máquina SupervisionadoRESUMO
To ultimately combat the emerging COVID-19 pandemic, it is desired to develop an effective and safe vaccine against this highly contagious disease caused by the SARS-CoV-2 coronavirus. Our literature and clinical trial survey showed that the whole virus, as well as the spike (S) protein, nucleocapsid (N) protein, and membrane (M) protein, have been tested for vaccine development against SARS and MERS. However, these vaccine candidates might lack the induction of complete protection and have safety concerns. We then applied the Vaxign and the newly developed machine learning-based Vaxign-ML reverse vaccinology tools to predict COVID-19 vaccine candidates. Our Vaxign analysis found that the SARS-CoV-2 N protein sequence is conserved with SARS-CoV and MERS-CoV but not from the other four human coronaviruses causing mild symptoms. By investigating the entire proteome of SARS-CoV-2, six proteins, including the S protein and five non-structural proteins (nsp3, 3CL-pro, and nsp8-10), were predicted to be adhesins, which are crucial to the viral adhering and host invasion. The S, nsp3, and nsp8 proteins were also predicted by Vaxign-ML to induce high protective antigenicity. Besides the commonly used S protein, the nsp3 protein has not been tested in any coronavirus vaccine studies and was selected for further investigation. The nsp3 was found to be more conserved among SARS-CoV-2, SARS-CoV, and MERS-CoV than among 15 coronaviruses infecting human and other animals. The protein was also predicted to contain promiscuous MHC-I and MHC-II T-cell epitopes, and the predicted linear B-cell epitopes were found to be localized on the surface of the protein. Our predicted vaccine targets have the potential for effective and safe COVID-19 vaccine development. We also propose that an "Sp/Nsp cocktail vaccine" containing a structural protein(s) (Sp) and a non-structural protein(s) (Nsp) would stimulate effective complementary immune responses.
Assuntos
Betacoronavirus , Infecções por Coronavirus , Aprendizado de Máquina , Pandemias , Pneumonia Viral , Vacinas Virais , Animais , Betacoronavirus/genética , Betacoronavirus/imunologia , COVID-19 , Vacinas contra COVID-19 , Infecções por Coronavirus/epidemiologia , Infecções por Coronavirus/genética , Infecções por Coronavirus/imunologia , Infecções por Coronavirus/prevenção & controle , Epitopos de Linfócito B/genética , Epitopos de Linfócito B/imunologia , Humanos , Imunogenicidade da Vacina , Coronavírus da Síndrome Respiratória do Oriente Médio/genética , Coronavírus da Síndrome Respiratória do Oriente Médio/imunologia , Pandemias/prevenção & controle , Pneumonia Viral/epidemiologia , Pneumonia Viral/genética , Pneumonia Viral/imunologia , Pneumonia Viral/prevenção & controle , SARS-CoV-2 , Proteínas Virais/genética , Proteínas Virais/imunologia , Vacinas Virais/genética , Vacinas Virais/imunologiaRESUMO
To ultimately combat the emerging COVID-19 pandemic, it is desired to develop an effective and safe vaccine against this highly contagious disease caused by the SARS-CoV-2 coronavirus. Our literature and clinical trial survey showed that the whole virus, as well as the spike (S) protein, nucleocapsid (N) protein, and membrane protein, have been tested for vaccine development against SARS and MERS. We further used the Vaxign reverse vaccinology tool and the newly developed Vaxign-ML machine learning tool to predict COVID-19 vaccine candidates. The N protein was found to be conserved in the more pathogenic strains (SARS/MERS/COVID-19), but not in the other human coronaviruses that mostly cause mild symptoms. By investigating the entire proteome of SARS-CoV-2, six proteins, including the S protein and five non-structural proteins (nsp3, 3CL-pro, and nsp8-10) were predicted to be adhesins, which are crucial to the viral adhering and host invasion. The S, nsp3, and nsp8 proteins were also predicted by Vaxign-ML to induce high protective antigenicity. Besides the commonly used S protein, the nsp3 protein has not been tested in any coronavirus vaccine studies and was selected for further investigation. The nsp3 was found to be more conserved among SARS-CoV-2, SARS-CoV, and MERS-CoV than among 15 coronaviruses infecting human and other animals. The protein was also predicted to contain promiscuous MHC-I and MHC-II T-cell epitopes, and linear B-cell epitopes localized in specific locations and functional domains of the protein. Our predicted vaccine targets provide new strategies for effective and safe COVID-19 vaccine development.
RESUMO
With many protective vaccine antigens reported in the literature and verified experimentally, how to use the knowledge mined from these antigens to support rational vaccine design and study underlying design mechanism remains unclear. In order to address the problem, a systematic bioinformatics analysis was performed on 291 Gram-positive and Gram-negative bacterial protective antigens with experimental evidence manually curated in the Protegen database. The bioinformatics analyses evaluated included subcellular localization, adhesin probability, peptide signaling, transmembrane α-helix and ß-barrel, conserved domain, Clusters of Orthologous Groups, and Gene Ontology functional annotations. Here we showed the critical role of adhesins, along with subcellular localization, peptide signaling, in predicting secreted extracellular or surface-exposed protective antigens, with mechanistic explanations supported by functional analysis. We also found a significant negative correlation of transmembrane α-helix to antigen protectiveness in Gram-positive and Gram-negative pathogens, while a positive correlation of transmembrane ß-barrel was observed in Gram-negative pathogens. The commonly less-focused cytoplasmic and cytoplasmic membrane proteins could be potentially predicted with the help of other selection criteria such as adhesin probability and functional analysis. The significant findings in this study can support rational vaccine design and enhance our understanding of vaccine design mechanisms.
RESUMO
A critical issue in the usage of cancer drugs is its association with various adverse events (AEs) in some, but not all, patients. The National Cancer Institute (NCI) Common Terminology Criteria for Adverse Events (CTCAE) is a controlled terminology for AE classification and analysis in cancer clinical trials. The Ontology of Adverse Events (OAE) is a community-based ontology in the domain of AEs. In this study, OAE was first updated by including AE severity grading and OAE-CTCAE mapping. An OAE subset containing CTCAE-related terms and their associated OAE terms was generated to facilitate term usage. A use case study based on a published cancer drug clinical trial demonstrates that OAE provides better hierarchical representation, includes semantic relations, and supports automated reasoning. Demonstrated with a single patient analysis, the OAE framework supports precision informatics for representing AEs and related genetic and clinical conditions in individual patients treated with cancer drugs.