Your browser doesn't support javascript.
loading
Enhancing efficiency of protein language models with minimal wet-lab data through few-shot learning.
Zhou, Ziyi; Zhang, Liang; Yu, Yuanxi; Wu, Banghao; Li, Mingchen; Hong, Liang; Tan, Pan.
Afiliação
  • Zhou Z; School of Physics and Astronomy, Shanghai Jiao Tong University, Shanghai, 200240, China.
  • Zhang L; Shanghai National Center for Applied Mathematics (SJTU Center) & Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai, 200240, China.
  • Yu Y; School of Physics and Astronomy, Shanghai Jiao Tong University, Shanghai, 200240, China.
  • Wu B; School of Physics and Astronomy, Shanghai Jiao Tong University, Shanghai, 200240, China.
  • Li M; School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China.
  • Hong L; Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China.
  • Tan P; School of Information Science and Engineering, East China University of Science and Technology, Shanghai, 200237, China.
Nat Commun ; 15(1): 5566, 2024 Jul 02.
Article em En | MEDLINE | ID: mdl-38956442
ABSTRACT
Accurately modeling the protein fitness landscapes holds great importance for protein engineering. Pre-trained protein language models have achieved state-of-the-art performance in predicting protein fitness without wet-lab experimental data, but their accuracy and interpretability remain limited. On the other hand, traditional supervised deep learning models require abundant labeled training examples for performance improvements, posing a practical barrier. In this work, we introduce FSFP, a training strategy that can effectively optimize protein language models under extreme data scarcity for fitness prediction. By combining meta-transfer learning, learning to rank, and parameter-efficient fine-tuning, FSFP can significantly boost the performance of various protein language models using merely tens of labeled single-site mutants from the target protein. In silico benchmarks across 87 deep mutational scanning datasets demonstrate FSFP's superiority over both unsupervised and supervised baselines. Furthermore, we successfully apply FSFP to engineer the Phi29 DNA polymerase through wet-lab experiments, achieving a 25% increase in the positive rate. These results underscore the potential of our approach in aiding AI-guided protein engineering.
Assuntos

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Engenharia de Proteínas Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Engenharia de Proteínas Idioma: En Ano de publicação: 2024 Tipo de documento: Article