Evolutionary Probability and Stacked Regressions Enable Data-Driven Protein Engineering with Minimized Experimental Effort.

Illig, Alexander-Maurice; Siedhoff, Niklas E; Davari, Mehdi D; Schwaneberg, Ulrich

Illig, Alexander-Maurice; Siedhoff, Niklas E; Davari, Mehdi D; Schwaneberg, Ulrich.

Afiliação

Illig AM; Institute of Biotechnology, RWTH Aachen University, Worringerweg 3, 52074 Aachen, Germany.
Siedhoff NE; Institute of Biotechnology, RWTH Aachen University, Worringerweg 3, 52074 Aachen, Germany.
Davari MD; Department of Bioorganic Chemistry, Leibniz Institute of Plant Biochemistry, Weinberg 3, 06120 Halle, Germany.
Schwaneberg U; Institute of Biotechnology, RWTH Aachen University, Worringerweg 3, 52074 Aachen, Germany.

J Chem Inf Model ; 64(16): 6350-6360, 2024 Aug 26.

Article em En | MEDLINE | ID: mdl-39088689

ABSTRACT

ABSTRACT

Protein engineering through directed evolution and (semi)rational approaches is routinely applied to optimize protein properties for a broad range of applications in industry and academia. The multitude of possible variants, combined with limited screening throughput, hampers efficient protein engineering. Data-driven strategies have emerged as a powerful tool to model the protein fitness landscape that can be explored in silico, significantly accelerating protein engineering campaigns. However, such methods require a certain amount of data, which often cannot be provided, to generate a reliable model of the fitness landscape. Here, we introduce MERGE, a method that combines direct coupling analysis (DCA) and machine learning (ML). MERGE enables data-driven protein engineering when only limited data are available for training, typically ranging from 50 to 500 labeled sequences. Our method demonstrates remarkable performance in predicting a protein's fitness value and rank based on its sequence across diverse proteins and properties. Notably, MERGE outperforms state-of-the-art methods when only small data sets are available for modeling, requiring fewer computational resources, and proving particularly promising for protein engineers who have access to limited amounts of data.

Assuntos

Aprendizado de Máquina; Engenharia de Proteínas; Proteínas; Engenharia de Proteínas/métodos; Proteínas/química; Proteínas/metabolismo; Probabilidade; Modelos Moleculares

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Engenharia de Proteínas / Proteínas / Aprendizado de Máquina Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google