Your browser doesn't support javascript.
loading
Democratizing Protein Language Models with Parameter-Efficient Fine-Tuning.
Sledzieski, Samuel; Kshirsagar, Meghana; Baek, Minkyung; Berger, Bonnie; Dodhia, Rahul; Ferres, Juan Lavista.
Afiliación
  • Sledzieski S; Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge MA 02139, USA.
  • Kshirsagar M; AI for Good Research Lab, Microsoft Corporation, Redmond WA 98052, USA.
  • Baek M; AI for Good Research Lab, Microsoft Corporation, Redmond WA 98052, USA.
  • Berger B; Department of Biological Sciences, Seoul National University, Seoul 08826, South Korea.
  • Dodhia R; Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge MA 02139, USA.
  • Ferres JL; Department of Mathematics, Massachusetts Institute of Technology, Cambridge MA 02139, USA.
bioRxiv ; 2023 Nov 10.
Article en En | MEDLINE | ID: mdl-37986761
Proteomics has been revolutionized by large pre-trained protein language models, which learn unsupervised representations from large corpora of sequences. The parameters of these models are then fine-tuned in a supervised setting to tailor the model to a specific downstream task. However, as model size increases, the computational and memory footprint of fine-tuning becomes a barrier for many research groups. In the field of natural language processing, which has seen a similar explosion in the size of models, these challenges have been addressed by methods for parameter-efficient fine-tuning (PEFT). In this work, we newly bring parameter-efficient fine-tuning methods to proteomics. Using the parameter-efficient method LoRA, we train new models for two important proteomic tasks: predicting protein-protein interactions (PPI) and predicting the symmetry of homooligomers. We show that for homooligomer symmetry prediction, these approaches achieve performance competitive with traditional fine-tuning while requiring reduced memory and using three orders of magnitude fewer parameters. On the PPI prediction task, we surprisingly find that PEFT models actually outperform traditional fine-tuning while using two orders of magnitude fewer parameters. Here, we go even further to show that freezing the parameters of the language model and training only a classification head also outperforms fine-tuning, using five orders of magnitude fewer parameters, and that both of these models outperform state-of-the-art PPI prediction methods with substantially reduced compute. We also demonstrate that PEFT is robust to variations in training hyper-parameters, and elucidate where best practices for PEFT in proteomics differ from in natural language processing. Thus, we provide a blueprint to democratize the power of protein language model tuning to groups which have limited computational resources.

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Idioma: En Revista: BioRxiv Año: 2023 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Idioma: En Revista: BioRxiv Año: 2023 Tipo del documento: Article País de afiliación: Estados Unidos