Pesquisa | BVS Aleitamento Materno

Neural networks to learn protein sequence-function relationships from deep mutational scanning data.

Gelman, Sam; Fahlberg, Sarah A; Heinzelman, Pete; Romero, Philip A; Gitter, Anthony.

Proc Natl Acad Sci U S A ; 118(48)2021 11 30.

Artigo em Inglês | MEDLINE | ID: mdl-34815338

RESUMO

The mapping from protein sequence to function is highly complex, making it challenging to predict how sequence changes will affect a protein's behavior and properties. We present a supervised deep learning framework to learn the sequence-function mapping from deep mutational scanning data and make predictions for new, uncharacterized sequence variants. We test multiple neural network architectures, including a graph convolutional network that incorporates protein structure, to explore how a network's internal representation affects its ability to learn the sequence-function mapping. Our supervised learning approach displays superior performance over physics-based and unsupervised prediction methods. We find that networks that capture nonlinear interactions and share parameters across sequence positions are important for learning the relationship between sequence and function. Further analysis of the trained models reveals the networks' ability to learn biologically meaningful information about protein structure and mechanism. Finally, we demonstrate the models' ability to navigate sequence space and design new proteins beyond the training set. We applied the protein G B1 domain (GB1) models to design a sequence that binds to immunoglobulin G with substantially higher affinity than wild-type GB1.

Assuntos

Sequência de Aminoácidos/genética , Análise de Sequência de Proteína/métodos , Algoritmos , Sequência de Aminoácidos/fisiologia , Fenômenos Bioquímicos , Aprendizado Profundo , Aprendizado de Máquina , Mutação , Redes Neurais de Computação , Proteínas/metabolismo , Relação Estrutura-Atividade

Neural network extrapolation to distant regions of the protein fitness landscape.

Fahlberg, Sarah A; Freschlin, Chase R; Heinzelman, Pete; Romero, Philip A.

bioRxiv ; 2023 Nov 09.

Artigo em Inglês | MEDLINE | ID: mdl-37987009

RESUMO

Machine learning (ML) has transformed protein engineering by constructing models of the underlying sequence-function landscape to accelerate the discovery of new biomolecules. ML-guided protein design requires models, trained on local sequence-function information, to accurately predict distant fitness peaks. In this work, we evaluate neural networks' capacity to extrapolate beyond their training data. We perform model-guided design using a panel of neural network architectures trained on protein G (GB1)-Immunoglobulin G (IgG) binding data and experimentally test thousands of GB1 designs to systematically evaluate the models' extrapolation. We find each model architecture infers markedly different landscapes from the same data, which give rise to unique design preferences. We find simpler models excel in local extrapolation to design high fitness proteins, while more sophisticated convolutional models can venture deep into sequence space to design proteins that fold but are no longer functional. Our findings highlight how each architecture's inductive biases prime them to learn different aspects of the protein fitness landscape.

Machine learning to navigate fitness landscapes for protein engineering.

Freschlin, Chase R; Fahlberg, Sarah A; Romero, Philip A.

Curr Opin Biotechnol ; 75: 102713, 2022 06.

Artigo em Inglês | MEDLINE | ID: mdl-35413604

RESUMO

Machine learning (ML) is revolutionizing our ability to understand and predict the complex relationships between protein sequence, structure, and function. Predictive sequence-function models are enabling protein engineers to efficiently search the sequence space for useful proteins with broad applications in biotechnology. In this review, we highlight the recent advances in applying ML to protein engineering. We discuss supervised learning methods that infer the sequence-function mapping from experimental data and new sequence representation strategies for data-efficient modeling. We then describe the various ways in which ML can be incorporated into protein engineering workflows, including purely in silico searches, ML-assisted directed evolution, and generative models that can learn the underlying distribution of the protein function in a sequence space. ML-driven protein engineering will become increasingly powerful with continued advances in high-throughput data generation, data science, and deep learning.

Assuntos

Aprendizado de Máquina , Engenharia de Proteínas , Sequência de Aminoácidos , Biotecnologia , Engenharia de Proteínas/métodos , Proteínas/química

Machine learning-guided acyl-ACP reductase engineering for improved in vivo fatty alcohol production.

Greenhalgh, Jonathan C; Fahlberg, Sarah A; Pfleger, Brian F; Romero, Philip A.

Nat Commun ; 12(1): 5825, 2021 10 05.

Artigo em Inglês | MEDLINE | ID: mdl-34611172

RESUMO

Alcohol-forming fatty acyl reductases (FARs) catalyze the reduction of thioesters to alcohols and are key enzymes for microbial production of fatty alcohols. Many metabolic engineering strategies utilize FARs to produce fatty alcohols from intracellular acyl-CoA and acyl-ACP pools; however, enzyme activity, especially on acyl-ACPs, remains a significant bottleneck to high-flux production. Here, we engineer FARs with enhanced activity on acyl-ACP substrates by implementing a machine learning (ML)-driven approach to iteratively search the protein fitness landscape. Over the course of ten design-test-learn rounds, we engineer enzymes that produce over twofold more fatty alcohols than the starting natural sequences. We characterize the top sequence and show that it has an enhanced catalytic rate on palmitoyl-ACP. Finally, we analyze the sequence-function data to identify features, like the net charge near the substrate-binding site, that correlate with in vivo activity. This work demonstrates the power of ML to navigate the fitness landscape of traditionally difficult-to-engineer proteins.

Assuntos

Aldeído Oxirredutases/metabolismo , Álcoois Graxos/metabolismo , Aprendizado de Máquina , Aldeído Oxirredutases/genética , Engenharia Metabólica/métodos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA