Your browser doesn't support javascript.
loading
PhosBERT: A self-supervised learning model for identifying phosphorylation sites in SARS-CoV-2-infected human cells.
Li, Yong; Gao, Ru; Liu, Shan; Zhang, Hongqi; Lv, Hao; Lai, Hongyan.
Affiliation
  • Li Y; Sichuan Vocational College of Health and Rehabilitation, Zigong 643000, Sichuan, China.
  • Gao R; The People's Hospital of Ya 'an, Ya'an 625000, Sichuan, China; The People's Hospital of Wenjiang Chengdu, Chengdu 611130, Sichuan, China.
  • Liu S; The People's Hospital of Wenjiang Chengdu, Chengdu 611130, Sichuan, China.
  • Zhang H; Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China.
  • Lv H; Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China. Electronic address: hao.lyu@uestc.edu.cn.
  • Lai H; Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China. Electronic address: laihy@cqupt.edu.cn.
Methods ; 230: 140-146, 2024 Aug 22.
Article in En | MEDLINE | ID: mdl-39179191
ABSTRACT
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a single-stranded RNA virus, which mainly causes respiratory and enteric diseases and is responsible for the outbreak of coronavirus disease 19 (COVID-19). Numerous studies have demonstrated that SARS-CoV-2 infection will lead to a significant dysregulation of protein post-translational modification profile in human cells. The accurate recognition of phosphorylation sites in host cells will contribute to a deep understanding of the pathogenic mechanisms of SARS-CoV-2 and also help to screen drugs and compounds with antiviral potential. Therefore, there is a need to develop cost-effective and high-precision computational strategies for specifically identifying SARS-CoV-2-infected phosphorylation sites. In this work, we first implemented a custom neural network model (named PhosBERT) on the basis of a pre-trained protein language model of ProtBert, which was a self-supervised learning approach developed on the Bidirectional Encoder Representation from Transformers (BERT) architecture. PhosBERT was then trained and validated on serine (S) and threonine (T) phosphorylation dataset and tyrosine (Y) phosphorylation dataset with 5-fold cross-validation, respectively. Independent validation results showed that PhosBERT could identify S/T phosphorylation sites with high accuracy and AUC (area under the receiver operating characteristic) value of 81.9% and 0.896. The prediction accuracy and AUC value of Y phosphorylation sites reached up to 87.1% and 0.902. It indicated that the proposed model was of good prediction ability and stability and would provide a new approach for studying SARS-CoV-2 phosphorylation sites.
Key words

Full text: 1 Collection: 01-internacional Database: MEDLINE Language: En Journal: Methods Journal subject: BIOQUIMICA Year: 2024 Type: Article Affiliation country: China

Full text: 1 Collection: 01-internacional Database: MEDLINE Language: En Journal: Methods Journal subject: BIOQUIMICA Year: 2024 Type: Article Affiliation country: China