Your browser doesn't support javascript.
loading
DeepVF: a deep learning-based hybrid framework for identifying virulence factors using the stacking strategy.
Xie, Ruopeng; Li, Jiahui; Wang, Jiawei; Dai, Wei; Leier, André; Marquez-Lago, Tatiana T; Akutsu, Tatsuya; Lithgow, Trevor; Song, Jiangning; Zhang, Yanju.
Afiliação
  • Xie R; Bioinformatics Lab at Guilin University of Electronic Technology.
  • Li J; Bioinformatics Lab at Guilin University of Electronic Technology.
  • Wang J; Biomedicine Discovery Institute and the Department of Microbiology at Monash University, Australia.
  • Dai W; School of Computer Science and Information Security, Guilin University of Electronic Technology, China.
  • Leier A; Department of Genetics and the Department of Cell, Developmental and Integrative Biology, University of Alabama at Birmingham (UAB) School of Medicine, USA.
  • Marquez-Lago TT; Department of Genetics and the Department of Cell, Developmental and Integrative Biology, University of Alabama at Birmingham (UAB) School of Medicine, USA.
  • Akutsu T; University of Tokyo, Japan.
  • Lithgow T; Biomedicine Discovery Institute and the Director of the Centre to Impact AMR at Monash University, Australia.
  • Song J; Group Leader in the Biomedicine Discovery Institute and the Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Australia.
  • Zhang Y; Leiden Institute of Advanced Computer Science, Leiden University.
Brief Bioinform ; 22(3)2021 05 20.
Article em En | MEDLINE | ID: mdl-32599617
ABSTRACT
Virulence factors (VFs) enable pathogens to infect their hosts. A wealth of individual, disease-focused studies has identified a wide variety of VFs, and the growing mass of bacterial genome sequence data provides an opportunity for computational methods aimed at predicting VFs. Despite their attractive advantages and performance improvements, the existing methods have some limitations and drawbacks. Firstly, as the characteristics and mechanisms of VFs are continually evolving with the emergence of antibiotic resistance, it is more and more difficult to identify novel VFs using existing tools that were previously developed based on the outdated data sets; secondly, few systematic feature engineering efforts have been made to examine the utility of different types of features for model performances, as the majority of tools only focused on extracting very few types of features. By addressing the aforementioned issues, the accuracy of VF predictors can likely be significantly improved. This, in turn, would be particularly useful in the context of genome wide predictions of VFs. In this work, we present a deep learning (DL)-based hybrid framework (termed DeepVF) that is utilizing the stacking strategy to achieve more accurate identification of VFs. Using an enlarged, up-to-date dataset, DeepVF comprehensively explores a wide range of heterogeneous features with popular machine learning algorithms. Specifically, four classical algorithms, including random forest, support vector machines, extreme gradient boosting and multilayer perceptron, and three DL algorithms, including convolutional neural networks, long short-term memory networks and deep neural networks are employed to train 62 baseline models using these features. In order to integrate their individual strengths, DeepVF effectively combines these baseline models to construct the final meta model using the stacking strategy. Extensive benchmarking experiments demonstrate the effectiveness of DeepVF it achieves a more accurate and stable performance compared with baseline models on the benchmark dataset and clearly outperforms state-of-the-art VF predictors on the independent test. Using the proposed hybrid ensemble model, a user-friendly online predictor of DeepVF (http//deepvf.erc.monash.edu/) is implemented. Furthermore, its utility, from the user's viewpoint, is compared with that of existing toolkits. We believe that DeepVF will be exploited as a useful tool for screening and identifying potential VFs from protein-coding gene sequences in bacterial genomes.
Assuntos
Palavras-chave

Texto completo: 1 Bases de dados: MEDLINE Assunto principal: Bactérias / Proteínas de Bactérias / Genoma Bacteriano / Bases de Dados de Proteínas / Fatores de Virulência / Aprendizado Profundo Tipo de estudo: Prognostic_studies Idioma: En Revista: Brief Bioinform Assunto da revista: BIOLOGIA / INFORMATICA MEDICA Ano de publicação: 2021 Tipo de documento: Article

Texto completo: 1 Bases de dados: MEDLINE Assunto principal: Bactérias / Proteínas de Bactérias / Genoma Bacteriano / Bases de Dados de Proteínas / Fatores de Virulência / Aprendizado Profundo Tipo de estudo: Prognostic_studies Idioma: En Revista: Brief Bioinform Assunto da revista: BIOLOGIA / INFORMATICA MEDICA Ano de publicação: 2021 Tipo de documento: Article