Your browser doesn't support javascript.
loading
Fast and accurate protein intrinsic disorder prediction by using a pretrained language model.
Song, Yidong; Yuan, Qianmu; Chen, Sheng; Chen, Ken; Zhou, Yaoqi; Yang, Yuedong.
Afiliação
  • Song Y; School of Computer Science and Engineering at Sun Yat-sen University, Guangzhou 510000, China.
  • Yuan Q; School of Computer Science and Engineering at Sun Yat-sen University, Guangzhou 510000, China.
  • Chen S; School of Computer Science and Engineering at Sun Yat-sen University, Guangzhou 510000, China.
  • Chen K; School of Computer Science and Engineering at Sun Yat-sen University, Guangzhou 510000, China.
  • Zhou Y; Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen, China.
  • Yang Y; School of Computer Science and Engineering at Sun Yat-sen University, Guangzhou 510000, China.
Brief Bioinform ; 24(4)2023 07 20.
Article em En | MEDLINE | ID: mdl-37204193
ABSTRACT
Determining intrinsically disordered regions of proteins is essential for elucidating protein biological functions and the mechanisms of their associated diseases. As the gap between the number of experimentally determined protein structures and the number of protein sequences continues to grow exponentially, there is a need for developing an accurate and computationally efficient disorder predictor. However, current single-sequence-based methods are of low accuracy, while evolutionary profile-based methods are computationally intensive. Here, we proposed a fast and accurate protein disorder predictor LMDisorder that employed embedding generated by unsupervised pretrained language models as features. We showed that LMDisorder performs best in all single-sequence-based methods and is comparable or better than another language-model-based technique in four independent test sets, respectively. Furthermore, LMDisorder showed equivalent or even better performance than the state-of-the-art profile-based technique SPOT-Disorder2. In addition, the high computation efficiency of LMDisorder enabled proteome-scale analysis of human, showing that proteins with high predicted disorder content were associated with specific biological functions. The datasets, the source codes, and the trained model are available at https//github.com/biomed-AI/LMDisorder.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Software / Proteoma Tipo de estudo: Prognostic_studies / Risk_factors_studies Limite: Humans Idioma: En Revista: Brief Bioinform Assunto da revista: BIOLOGIA / INFORMATICA MEDICA Ano de publicação: 2023 Tipo de documento: Article País de afiliação: China

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Software / Proteoma Tipo de estudo: Prognostic_studies / Risk_factors_studies Limite: Humans Idioma: En Revista: Brief Bioinform Assunto da revista: BIOLOGIA / INFORMATICA MEDICA Ano de publicação: 2023 Tipo de documento: Article País de afiliação: China