Search | Brasil - Virtual Health Library

Neural network extrapolation to distant regions of the protein fitness landscape.

Freschlin, Chase R; Fahlberg, Sarah A; Heinzelman, Pete; Romero, Philip A.

Nat Commun ; 15(1): 6405, 2024 Jul 30.

Article in English | MEDLINE | ID: mdl-39080282

ABSTRACT

Machine learning (ML) has transformed protein engineering by constructing models of the underlying sequence-function landscape to accelerate the discovery of new biomolecules. ML-guided protein design requires models, trained on local sequence-function information, to accurately predict distant fitness peaks. In this work, we evaluate neural networks' capacity to extrapolate beyond their training data. We perform model-guided design using a panel of neural network architectures trained on protein G (GB1)-Immunoglobulin G (IgG) binding data and experimentally test thousands of GB1 designs to systematically evaluate the models' extrapolation. We find each model architecture infers markedly different landscapes from the same data, which give rise to unique design preferences. We find simpler models excel in local extrapolation to design high fitness proteins, while more sophisticated convolutional models can venture deep into sequence space to design proteins that fold but are no longer functional. We also find that implementing a simple ensemble of convolutional neural networks enables robust design of high-performing variants in the local landscape. Our findings highlight how each architecture's inductive biases prime them to learn different aspects of the protein fitness landscape and how a simple ensembling approach makes protein engineering more robust.

Subject(s)

Immunoglobulin G , Neural Networks, Computer , Protein Engineering , Protein Engineering/methods , Immunoglobulin G/metabolism , Immunoglobulin G/chemistry , Machine Learning , Protein Binding , Bacterial Proteins/metabolism , Bacterial Proteins/genetics , Bacterial Proteins/chemistry , Models, Molecular

Neural network extrapolation to distant regions of the protein fitness landscape.

Fahlberg, Sarah A; Freschlin, Chase R; Heinzelman, Pete; Romero, Philip A.

bioRxiv ; 2023 Nov 09.

Article in English | MEDLINE | ID: mdl-37987009

ABSTRACT

Machine learning (ML) has transformed protein engineering by constructing models of the underlying sequence-function landscape to accelerate the discovery of new biomolecules. ML-guided protein design requires models, trained on local sequence-function information, to accurately predict distant fitness peaks. In this work, we evaluate neural networks' capacity to extrapolate beyond their training data. We perform model-guided design using a panel of neural network architectures trained on protein G (GB1)-Immunoglobulin G (IgG) binding data and experimentally test thousands of GB1 designs to systematically evaluate the models' extrapolation. We find each model architecture infers markedly different landscapes from the same data, which give rise to unique design preferences. We find simpler models excel in local extrapolation to design high fitness proteins, while more sophisticated convolutional models can venture deep into sequence space to design proteins that fold but are no longer functional. Our findings highlight how each architecture's inductive biases prime them to learn different aspects of the protein fitness landscape.

Inferring protein fitness landscapes from laboratory evolution experiments.

D'Costa, Sameer; Hinds, Emily C; Freschlin, Chase R; Song, Hyebin; Romero, Philip A.

PLoS Comput Biol ; 19(3): e1010956, 2023 03.

Article in English | MEDLINE | ID: mdl-36857380

ABSTRACT

Directed laboratory evolution applies iterative rounds of mutation and selection to explore the protein fitness landscape and provides rich information regarding the underlying relationships between protein sequence, structure, and function. Laboratory evolution data consist of protein sequences sampled from evolving populations over multiple generations and this data type does not fit into established supervised and unsupervised machine learning approaches. We develop a statistical learning framework that models the evolutionary process and can infer the protein fitness landscape from multiple snapshots along an evolutionary trajectory. We apply our modeling approach to dihydrofolate reductase (DHFR) laboratory evolution data and the resulting landscape parameters capture important aspects of DHFR structure and function. We use the resulting model to understand the structure of the fitness landscape and find numerous examples of epistasis but an overall global peak that is evolutionarily accessible from most starting sequences. Finally, we use the model to perform an in silico extrapolation of the DHFR laboratory evolution trajectory and computationally design proteins from future evolutionary rounds.

Subject(s)

Genetic Fitness , Proteins , Genetic Fitness/genetics , Proteins/genetics , Proteins/metabolism , Mutation/genetics , Tetrahydrofolate Dehydrogenase/genetics , Tetrahydrofolate Dehydrogenase/metabolism , Amino Acid Sequence , Evolution, Molecular , Models, Genetic , Epistasis, Genetic

Activation of MAP2K signaling by genetic engineering or HF-rTMS promotes corticospinal axon sprouting and functional regeneration.

Boato, Francesco; Guan, Xiaofei; Zhu, Yanjie; Ryu, Youngjae; Voutounou, Mariel; Rynne, Christopher; Freschlin, Chase R; Zumbo, Paul; Betel, Doron; Matho, Katie; Makarov, Sergey N; Wu, Zhuhao; Son, Young-Jin; Nummenmaa, Aapo; Huang, Josh Z; Edwards, Dylan J; Zhong, Jian.

Sci Transl Med ; 15(677): eabq6885, 2023 01 04.

Article in English | MEDLINE | ID: mdl-36599003

ABSTRACT

Facilitating axon regeneration in the injured central nervous system remains a challenging task. RAF-MAP2K signaling plays a key role in axon elongation during nervous system development. Here, we show that conditional expression of a constitutively kinase-activated BRAF in mature corticospinal neurons elicited the expression of a set of transcription factors previously implicated in the regeneration of zebrafish retinal ganglion cell axons and promoted regeneration and sprouting of corticospinal tract (CST) axons after spinal cord injury in mice. Newly sprouting axon collaterals formed synaptic connections with spinal interneurons, resulting in improved recovery of motor function. Noninvasive suprathreshold high-frequency repetitive transcranial magnetic stimulation (HF-rTMS) activated the BRAF canonical downstream effectors MAP2K1/2 and modulated the expression of a set of regeneration-related transcription factors in a pattern consistent with that induced by BRAF activation. HF-rTMS enabled CST axon regeneration and sprouting, which was abolished in MAP2K1/2 conditional null mice. These data collectively demonstrate a central role of MAP2K signaling in augmenting the growth capacity of mature corticospinal neurons and suggest that HF-rTMS might have potential for treating spinal cord injury by modulating MAP2K signaling.

Subject(s)

Axons , Spinal Cord Injuries , Animals , Mice , Axons/physiology , Genetic Engineering , Nerve Regeneration/physiology , Proto-Oncogene Proteins B-raf/metabolism , Pyramidal Tracts/metabolism , Recovery of Function/physiology , Spinal Cord Injuries/genetics , Spinal Cord Injuries/therapy , Spinal Cord Injuries/metabolism , Transcranial Magnetic Stimulation , Transcription Factors/metabolism , Zebrafish

Machine learning to navigate fitness landscapes for protein engineering.

Freschlin, Chase R; Fahlberg, Sarah A; Romero, Philip A.

Curr Opin Biotechnol ; 75: 102713, 2022 06.

Article in English | MEDLINE | ID: mdl-35413604

ABSTRACT

Machine learning (ML) is revolutionizing our ability to understand and predict the complex relationships between protein sequence, structure, and function. Predictive sequence-function models are enabling protein engineers to efficiently search the sequence space for useful proteins with broad applications in biotechnology. In this review, we highlight the recent advances in applying ML to protein engineering. We discuss supervised learning methods that infer the sequence-function mapping from experimental data and new sequence representation strategies for data-efficient modeling. We then describe the various ways in which ML can be incorporated into protein engineering workflows, including purely in silico searches, ML-assisted directed evolution, and generative models that can learn the underlying distribution of the protein function in a sequence space. ML-driven protein engineering will become increasingly powerful with continued advances in high-throughput data generation, data science, and deep learning.

Subject(s)

Machine Learning , Protein Engineering , Amino Acid Sequence , Biotechnology , Protein Engineering/methods , Proteins/chemistry

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL