Your browser doesn't support javascript.
loading
Highly accurate classification and discovery of microbial protein-coding gene functions using FunGeneTyper: an extensible deep learning framework.
Zhang, Guoqing; Wang, Hui; Zhang, Zhiguo; Zhang, Lu; Guo, Guibing; Yang, Jian; Yuan, Fajie; Ju, Feng.
  • Zhang G; College of Environmental and Resource Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China.
  • Wang H; Key Laboratory of Coastal Environment and Resources of Zhejiang Province, School of Engineering, Westlake University, Hangzhou, Zhejiang 310030, China.
  • Zhang Z; Center of Synthetic Biology and Integrated Bioengineering, Westlake University, Hangzhou, Zhejiang 310030, China.
  • Zhang L; Representation Learning Laboratory, School of Engineering, Westlake University, Hangzhou, Zhejiang 310030, China.
  • Guo G; Key Laboratory of Coastal Environment and Resources of Zhejiang Province, School of Engineering, Westlake University, Hangzhou, Zhejiang 310030, China.
  • Yang J; Key Laboratory of Coastal Environment and Resources of Zhejiang Province, School of Engineering, Westlake University, Hangzhou, Zhejiang 310030, China.
  • Yuan F; Software College, Northeastern University, Shenyang, Liaoning 110169, China.
  • Ju F; Westlake Laboratory of Life Sciences and Biomedicine, School of Life Sciences, Westlake University, Hangzhou, Zhejiang 310024, China.
Brief Bioinform ; 25(4)2024 May 23.
Article en En | MEDLINE | ID: mdl-39007592
ABSTRACT
High-throughput DNA sequencing technologies decode tremendous amounts of microbial protein-coding gene sequences. However, accurately assigning protein functions to novel gene sequences remain a challenge. To this end, we developed FunGeneTyper, an extensible framework with two new deep learning models (i.e., FunTrans and FunRep), structured databases, and supporting resources for achieving highly accurate (Accuracy > 0.99, F1-score > 0.97) and fine-grained classification of antibiotic resistance genes (ARGs) and virulence factor genes. Using an experimentally confirmed dataset of ARGs comprising remote homologous sequences as the test set, our framework achieves by-far-the-best performance in the discovery of new ARGs from human gut (F1-score 0.6948), wastewater (0.6072), and soil (0.5445) microbiomes, beating the state-of-the-art bioinformatics tools and sequence alignment-based (F1-score 0.0556-0.5065) and domain-based (F1-score 0.2630-0.5224) annotation approaches. Furthermore, our framework is implemented as a lightweight, privacy-preserving, and plug-and-play neural network module, facilitating its versatility and accessibility to developers and users worldwide. We anticipate widespread utilization of FunGeneTyper (https//github.com/emblab-westlake/FunGeneTyper) for precise classification of protein-coding gene functions and the discovery of numerous valuable enzymes. This advancement will have a significant impact on various fields, including microbiome research, biotechnology, metagenomics, and bioinformatics.
Asunto(s)
Palabras clave

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Aprendizaje Profundo Límite: Humans Idioma: En Año: 2024 Tipo del documento: Article

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Aprendizaje Profundo Límite: Humans Idioma: En Año: 2024 Tipo del documento: Article