MHCRoBERTa: pan-specific peptide-MHC class I binding prediction through transfer learning with label-agnostic protein sequences.

Wang, Fuxu; Wang, Haoyan; Wang, Lizhuang; Lu, Haoyu; Qiu, Shizheng; Zang, Tianyi; Zhang, Xinjun; Hu, Yang

Wang, Fuxu; Wang, Haoyan; Wang, Lizhuang; Lu, Haoyu; Qiu, Shizheng; Zang, Tianyi; Zhang, Xinjun; Hu, Yang.

Afiliação

Wang F; Center for Bioinformatics, Faculty of computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China.
Wang H; Center for Bioinformatics, Faculty of computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China.
Wang L; General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin, China.
Lu H; Center for Bioinformatics, school of life science and technology, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China.
Qiu S; Center for Bioinformatics, school of life science and technology, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China.
Zang T; Cisco Research, NLP team, California, United States.
Zhang X; Center for Bioinformatics, Faculty of computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China.
Hu Y; Center for Bioinformatics, Faculty of computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China.

Brief Bioinform ; 23(3)2022 05 13.

Article em En | MEDLINE | ID: mdl-35443027

RESUMO

Predicting the binding of peptide and major histocompatibility complex (MHC) plays a vital role in immunotherapy for cancer. The success of Alphafold of applying natural language processing (NLP) algorithms in protein secondary struction prediction has inspired us to explore the possibility of NLP methods in predicting peptide-MHC class I binding. Based on the above motivations, we propose the MHCRoBERTa method, RoBERTa pre-training approach, for predicting the binding affinity between type I MHC and peptides. Analysis of the results on benchmark dataset demonstrates that MHCRoBERTa can outperform other state-of-art prediction methods with an increase of the Spearman rank correlation coefficient (SRCC) value. Notably, our model gave a significant improvement on IC50 value. Our method has achieved SRCC value and AUC value as 0.785 and 0.817, respectively. Our SRCC value is 14.3% higher than NetMHCpan3.0 (the second highest SRCC value on pan-specific) and is 3% higher than MHCflurry (the second highest SRCC value on all methods). The AUC value is also better than any other pan-specific methods. Moreover, we visualize the multi-head self-attention for the token representation across the layers and heads by this method. Through the analysis of the representation of each layer and head, we can show whether the model has learned the syntax and semantics necessary to perform the prediction task well. All these results demonstrate that our model can accurately predict the peptide-MHC class I binding affinity and that MHCRoBERTa is a powerful tool for screening potential neoantigens for cancer immunotherapy. MHCRoBERTa is available as an open source software at github (https://github.com/FuxuWang/MHCRoBERTa).

Assuntos

Antígenos de Histocompatibilidade Classe I; Peptídeos; Algoritmos; Sequência de Aminoácidos; Antígenos de Histocompatibilidade Classe I/metabolismo; Aprendizado de Máquina; Peptídeos/metabolismo; Ligação Proteica

Palavras-chave

major histocompatibility complex (MHC); multi-head self-attention; natural language processing (NLP); peptide

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Peptídeos / Antígenos de Histocompatibilidade Classe I Tipo de estudo: Prognostic_studies / Risk_factors_studies Idioma: En Revista: Brief Bioinform Assunto da revista: BIOLOGIA / INFORMATICA MEDICA Ano de publicação: 2022 Tipo de documento: Article País de afiliação: China

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google