Your browser doesn't support javascript.
loading
PLMC: Language Model of Protein Sequences Enhances Protein Crystallization Prediction.
Xiong, Dapeng; U, Kaicheng; Sun, Jianfeng; Cribbs, Adam P.
Afiliação
  • Xiong D; Department of Computational Biology, Cornell University, Ithaca, 14853, USA. dx38@cornell.edu.
  • U K; Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, 14853, USA. dx38@cornell.edu.
  • Sun J; Department of Computational Biology, Cornell University, Ithaca, 14853, USA.
  • Cribbs AP; Botnar Research Centre, University of Oxford, Oxford, OX3 7LD, UK. jianfeng.sun@ndorms.ox.ac.uk.
Interdiscip Sci ; 2024 Aug 19.
Article em En | MEDLINE | ID: mdl-39155325
ABSTRACT
X-ray diffraction crystallography has been most widely used for protein three-dimensional (3D) structure determination for which whether proteins are crystallizable is a central prerequisite. Yet, there are a number of procedures during protein crystallization, including protein material production, purification, and crystal production, which take turns affecting the crystallization outcome. Due to the expensive and laborious nature of this multi-stage process, various computational tools have been developed to predict protein crystallization propensity, which is then used to guide the experimental determination. In this study, we presented a novel deep learning framework, PLMC, to improve multi-stage protein crystallization propensity prediction by leveraging a pre-trained protein language model. To effectively train PLMC, two groups of features of each protein were integrated into a more comprehensive representation, including protein language embeddings from the large-scale protein sequence database and a handcrafted feature set consisting of physicochemical, sequence-based and disordered-related information. These features were further separately embedded for refinement, and then concatenated for the final prediction. Notably, our extensive benchmarking tests demonstrate that PLMC greatly outperforms other state-of-the-art methods by achieving AUC scores of 0.773, 0.893, and 0.913, respectively, at the aforementioned individual stages, and 0.982 at the final crystallization stage. Furthermore, PLMC is shown to be superior for predicting the crystallization of both globular and membrane proteins, as demonstrated by an AUC score of 0.991 for the latter. These results suggest the significant potential of PLMC in assisting researchers with the experimental design of crystallizable protein variants.
Palavras-chave

Texto completo: 1 Bases de dados: MEDLINE Idioma: En Revista: Interdiscip Sci Assunto da revista: BIOLOGIA Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo: 1 Bases de dados: MEDLINE Idioma: En Revista: Interdiscip Sci Assunto da revista: BIOLOGIA Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Estados Unidos