Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
Nat Biotechnol ; 2024 Sep 25.
Artigo em Inglês | MEDLINE | ID: mdl-39322764

RESUMO

Protein denoising diffusion probabilistic models are used for the de novo generation of protein backbones but are limited in their ability to guide generation of proteins with sequence-specific attributes and functional properties. To overcome this limitation, we developed ProteinGenerator (PG), a sequence space diffusion model based on RoseTTAFold that simultaneously generates protein sequences and structures. Beginning from a noised sequence representation, PG generates sequence and structure pairs by iterative denoising, guided by desired sequence and structural protein attributes. We designed thermostable proteins with varying amino acid compositions and internal sequence repeats and cage bioactive peptides, such as melittin. By averaging sequence logits between diffusion trajectories with distinct structural constraints, we designed multistate parent-child protein triples in which the same sequence folds to different supersecondary structures when intact in the parent versus split into two child domains. PG design trajectories can be guided by experimental sequence-activity data, providing a general approach for integrated computational and experimental optimization of protein function.

2.
ArXiv ; 2023 May 26.
Artigo em Inglês | MEDLINE | ID: mdl-37292483

RESUMO

Directed evolution of proteins has been the most effective method for protein engineering. However, a new paradigm is emerging, fusing the library generation and screening approaches of traditional directed evolution with computation through the training of machine learning models on protein sequence fitness data. This chapter highlights successful applications of machine learning to protein engineering and directed evolution, organized by the improvements that have been made with respect to each step of the directed evolution cycle. Additionally, we provide an outlook for the future based on the current direction of the field, namely in the development of calibrated models and in incorporating other modalities, such as protein structure.

3.
ACS Synth Biol ; 11(3): 1313-1324, 2022 03 18.
Artigo em Inglês | MEDLINE | ID: mdl-35172576

RESUMO

Widespread availability of protein sequence-fitness data would revolutionize both our biochemical understanding of proteins and our ability to engineer them. Unfortunately, even though thousands of protein variants are generated and evaluated for fitness during a typical protein engineering campaign, most are never sequenced, leaving a wealth of potential sequence-fitness information untapped. Primarily, this is because sequencing is unnecessary for many protein engineering strategies; the added cost and effort of sequencing are thus unjustified. It also results from the fact that, even though many lower-cost sequencing strategies have been developed, they often require at least some access to and experience with sequencing or computational resources, both of which can be barriers to access. Here, we present every variant sequencing (evSeq), a method and collection of tools/standardized components for sequencing a variable region within every variant gene produced during a protein engineering campaign at a cost of cents per variant. evSeq was designed to democratize low-cost sequencing for protein engineers and, indeed, anyone interested in engineering biological systems. Execution of its wet-lab component is simple, requires no sequencing experience to perform, relies only on resources and services typically available to biology labs, and slots neatly into existing protein engineering workflows. Analysis of evSeq data is likewise made simple by its accompanying software (found at github.com/fhalab/evSeq, documentation at fhalab.github.io/evSeq), which can be run on a personal laptop and was designed to be accessible to users with no computational experience. Low-cost and easy-to-use, evSeq makes the collection of extensive protein variant sequence-fitness data practical.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Software , Biologia Computacional/métodos , Análise Custo-Benefício , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Fluxo de Trabalho
4.
Cell Syst ; 12(11): 1026-1045.e7, 2021 11 17.
Artigo em Inglês | MEDLINE | ID: mdl-34416172

RESUMO

Directed evolution of proteins often involves a greedy optimization in which the mutation in the highest-fitness variant identified in each round of single-site mutagenesis is fixed. The efficiency of such a single-step greedy walk depends on the order in which beneficial mutations are identified-the process is path dependent. Here, we investigate and optimize a path-independent machine learning-assisted directed evolution (MLDE) protocol that allows in silico screening of full combinatorial libraries. In particular, we evaluate the importance of different protein encoding strategies, training procedures, models, and training set design strategies on MLDE outcome, finding the most important consideration to be the implementation of strategies that reduce inclusion of minimally informative "holes" (protein variants with zero or extremely low fitness) in training data. When applied to an epistatic, hole-filled, four-site combinatorial fitness landscape, our optimized protocol achieved the global fitness maximum up to 81-fold more frequently than single-step greedy optimization. A record of this paper's transparent peer review process is included in the supplemental information.


Assuntos
Aprendizado de Máquina , Proteínas , Mutagênese , Mutação/genética , Proteínas/genética
5.
Curr Opin Struct Biol ; 69: 11-18, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-33647531

RESUMO

Machine learning (ML) can expedite directed evolution by allowing researchers to move expensive experimental screens in silico. Gathering sequence-function data for training ML models, however, can still be costly. In contrast, raw protein sequence data is widely available. Recent advances in ML approaches use protein sequences to augment limited sequence-function data for directed evolution. We highlight contributions in a growing effort to use sequences to reduce or eliminate the amount of sequence-function data needed for effective in silico screening. We also highlight approaches that use ML models trained on sequences to generate new functional sequence diversity, focusing on strategies that use these generative models to efficiently explore vast regions of protein space.


Assuntos
Aprendizado de Máquina , Proteínas , Sequência de Aminoácidos , Simulação por Computador , Proteínas/genética
6.
ACS Catal ; 10(13): 7112-7116, 2020 Jul 02.
Artigo em Inglês | MEDLINE | ID: mdl-33282460

RESUMO

While biocatalysis is increasingly incorporated into drug development pipelines, it is less commonly used in the early stages of drug discovery. By engineering a protein to produce a chiral motif with a derivatizable functional handle, biocatalysts can be used to help generate diverse building blocks for drug discovery. Here we show the engineering of two variants of Rhodothermus marinus nitric oxide dioxygenase (RmaNOD) to catalyze the formation of cis- and tran- diastereomers of a pinacolboronate-substituted cyclopropane which can be readily derivatized to generate diverse stereopure cyclopropane building blocks.

7.
Proc Natl Acad Sci U S A ; 116(18): 8852-8858, 2019 04 30.
Artigo em Inglês | MEDLINE | ID: mdl-30979809

RESUMO

To reduce experimental effort associated with directed protein evolution and to explore the sequence space encoded by mutating multiple positions simultaneously, we incorporate machine learning into the directed evolution workflow. Combinatorial sequence space can be quite expensive to sample experimentally, but machine-learning models trained on tested variants provide a fast method for testing sequence space computationally. We validated this approach on a large published empirical fitness landscape for human GB1 binding protein, demonstrating that machine learning-guided directed evolution finds variants with higher fitness than those found by other directed evolution approaches. We then provide an example application in evolving an enzyme to produce each of the two possible product enantiomers (i.e., stereodivergence) of a new-to-nature carbene Si-H insertion reaction. The approach predicted libraries enriched in functional enzymes and fixed seven mutations in two rounds of evolution to identify variants for selective catalysis with 93% and 79% ee (enantiomeric excess). By greatly increasing throughput with in silico modeling, machine learning enhances the quality and diversity of sequence solutions for a protein engineering problem.


Assuntos
Técnicas de Química Combinatória/métodos , Evolução Molecular Direcionada , Aprendizado de Máquina , Oxigenases/genética , Rhodothermus/enzimologia , Bibliotecas de Moléculas Pequenas , Sequência de Aminoácidos , Humanos , Modelos Moleculares , Oxigenases/metabolismo , Conformação Proteica
8.
Photosynth Res ; 129(2): 171-82, 2016 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-27276888

RESUMO

Acaryochloris species are a genus of cyanobacteria that utilize chlorophyll (chl) d as their primary chlorophyll molecule during oxygenic photosynthesis. Chl d allows Acaryochloris to harvest red-shifted light, which gives them the ability to live in filtered light environments that are depleted in visible light. Although genomes of multiple Acaryochloris species have been sequenced, their analysis has not revealed how chl d is synthesized. Here, we demonstrate that Acaryochloris sp. CCMEE 5410 cells undergo chlorosis by nitrogen depletion and exhibit robust regeneration of chl d by nitrogen repletion. We performed a time course RNA-Seq experiment to quantify global transcriptomic changes during chlorophyll recovery. We observed upregulation of numerous known chl biosynthesis genes and also identified an oxygenase gene with a similar transcriptional profile as these chl biosynthesis genes, suggesting its possible involvement in chl d biosynthesis. Moreover, our data suggest that multiple prochlorophyte chlorophyll-binding homologs are important during chlorophyll recovery, and light-independent chl synthesis genes are more dominant than the light-dependent gene at the transcription level. Transcriptomic characterization of this organism provides crucial clues toward mechanistic elucidation of chl d biosynthesis.


Assuntos
Clorofila/metabolismo , Cianobactérias/genética , Nitrogênio/metabolismo , Oxigenases/metabolismo , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Cianobactérias/metabolismo , Perfilação da Expressão Gênica , Regulação Bacteriana da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Luz , Nitrogênio/deficiência , Oxigênio/metabolismo , Oxigenases/genética , Fotossíntese , Análise de Sequência de RNA
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA