Your browser doesn't support javascript.
loading
A novel antibacterial peptide recognition algorithm based on BERT.
Zhang, Yue; Lin, Jianyuan; Zhao, Lianmin; Zeng, Xiangxiang; Liu, Xiangrong.
Afiliación
  • Zhang Y; Xiamen University, Xiamen 361005, China.
  • Lin J; Xiamen University, Xiamen 361005, China.
  • Zhao L; Xiamen University, Xiamen 361005, China.
  • Zeng X; Hunan University, Xiamen 361005, China.
  • Liu X; Xiamen University, Xiamen 361005, China.
Brief Bioinform ; 22(6)2021 11 05.
Article en En | MEDLINE | ID: mdl-34037687
ABSTRACT
As the best substitute for antibiotics, antimicrobial peptides (AMPs) have important research significance. Due to the high cost and difficulty of experimental methods for identifying AMPs, more and more researches are focused on using computational methods to solve this problem. Most of the existing calculation methods can identify AMPs through the sequence itself, but there is still room for improvement in recognition accuracy, and there is a problem that the constructed model cannot be universal in each dataset. The pre-training strategy has been applied to many tasks in natural language processing (NLP) and has achieved gratifying results. It also has great application prospects in the field of AMP recognition and prediction. In this paper, we apply the pre-training strategy to the model training of AMP classifiers and propose a novel recognition algorithm. Our model is constructed based on the BERT model, pre-trained with the protein data from UniProt, and then fine-tuned and evaluated on six AMP datasets with large differences. Our model is superior to the existing methods and achieves the goal of accurate identification of datasets with small sample size. We try different word segmentation methods for peptide chains and prove the influence of pre-training steps and balancing datasets on the recognition effect. We find that pre-training on a large number of diverse AMP data, followed by fine-tuning on new data, is beneficial for capturing both new data's specific features and common features between AMP sequences. Finally, we construct a new AMP dataset, on which we train a general AMP recognition model.
Asunto(s)
Palabras clave

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Algoritmos / Procesamiento de Lenguaje Natural / Programas Informáticos / Biología Computacional / Péptidos Antimicrobianos Tipo de estudio: Prognostic_studies Idioma: En Revista: Brief Bioinform Asunto de la revista: BIOLOGIA / INFORMATICA MEDICA Año: 2021 Tipo del documento: Article País de afiliación: China

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Algoritmos / Procesamiento de Lenguaje Natural / Programas Informáticos / Biología Computacional / Péptidos Antimicrobianos Tipo de estudio: Prognostic_studies Idioma: En Revista: Brief Bioinform Asunto de la revista: BIOLOGIA / INFORMATICA MEDICA Año: 2021 Tipo del documento: Article País de afiliación: China