Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 22
Filtrar
Más filtros













Base de datos
Intervalo de año de publicación
1.
bioRxiv ; 2024 Mar 07.
Artículo en Inglés | MEDLINE | ID: mdl-38496615

RESUMEN

De novo design of complex protein folds using solely computational means remains a significant challenge. Here, we use a robust deep learning pipeline to design complex folds and soluble analogues of integral membrane proteins. Unique membrane topologies, such as those from GPCRs, are not found in the soluble proteome and we demonstrate that their structural features can be recapitulated in solution. Biophysical analyses reveal high thermal stability of the designs and experimental structures show remarkable design accuracy. The soluble analogues were functionalized with native structural motifs, standing as a proof-of-concept for bringing membrane protein functions to the soluble proteome, potentially enabling new approaches in drug discovery. In summary, we designed complex protein topologies and enriched them with functionalities from membrane proteins, with high experimental success rates, leading to a de facto expansion of the functional soluble fold space.

2.
Proc Natl Acad Sci U S A ; 121(13): e2314646121, 2024 Mar 26.
Artículo en Inglés | MEDLINE | ID: mdl-38502697

RESUMEN

The design of protein-protein interfaces using physics-based design methods such as Rosetta requires substantial computational resources and manual refinement by expert structural biologists. Deep learning methods promise to simplify protein-protein interface design and enable its application to a wide variety of problems by researchers from various scientific disciplines. Here, we test the ability of a deep learning method for protein sequence design, ProteinMPNN, to design two-component tetrahedral protein nanomaterials and benchmark its performance against Rosetta. ProteinMPNN had a similar success rate to Rosetta, yielding 13 new experimentally confirmed assemblies, but required orders of magnitude less computation and no manual refinement. The interfaces designed by ProteinMPNN were substantially more polar than those designed by Rosetta, which facilitated in vitro assembly of the designed nanomaterials from independently purified components. Crystal structures of several of the assemblies confirmed the accuracy of the design method at high resolution. Our results showcase the potential of deep learning-based methods to unlock the widespread application of designed protein-protein interfaces and self-assembling protein nanomaterials in biotechnology.


Asunto(s)
Nanoestructuras , Proteínas , Modelos Moleculares , Proteínas/química , Secuencia de Aminoácidos , Biotecnología , Conformación Proteica
3.
Nature ; 627(8005): 898-904, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38480887

RESUMEN

A wooden house frame consists of many different lumber pieces, but because of the regularity of these building blocks, the structure can be designed using straightforward geometrical principles. The design of multicomponent protein assemblies, in comparison, has been much more complex, largely owing to the irregular shapes of protein structures1. Here we describe extendable linear, curved and angled protein building blocks, as well as inter-block interactions, that conform to specified geometric standards; assemblies designed using these blocks inherit their extendability and regular interaction surfaces, enabling them to be expanded or contracted by varying the number of modules, and reinforced with secondary struts. Using X-ray crystallography and electron microscopy, we validate nanomaterial designs ranging from simple polygonal and circular oligomers that can be concentrically nested, up to large polyhedral nanocages and unbounded straight 'train track' assemblies with reconfigurable sizes and geometries that can be readily blueprinted. Because of the complexity of protein structures and sequence-structure relationships, it has not previously been possible to build up large protein assemblies by deliberate placement of protein backbones onto a blank three-dimensional canvas; the simplicity and geometric regularity of our design platform now enables construction of protein nanomaterials according to 'back of an envelope' architectural blueprints.


Asunto(s)
Nanoestructuras , Proteínas , Cristalografía por Rayos X , Nanoestructuras/química , Proteínas/química , Proteínas/metabolismo , Microscopía Electrónica , Reproducibilidad de los Resultados
4.
J Am Chem Soc ; 146(3): 2054-2061, 2024 Jan 24.
Artículo en Inglés | MEDLINE | ID: mdl-38194293

RESUMEN

Natural proteins are highly optimized for function but are often difficult to produce at a scale suitable for biotechnological applications due to poor expression in heterologous systems, limited solubility, and sensitivity to temperature. Thus, a general method that improves the physical properties of native proteins while maintaining function could have wide utility for protein-based technologies. Here, we show that the deep neural network ProteinMPNN, together with evolutionary and structural information, provides a route to increasing protein expression, stability, and function. For both myoglobin and tobacco etch virus (TEV) protease, we generated designs with improved expression, elevated melting temperatures, and improved function. For TEV protease, we identified multiple designs with improved catalytic activity as compared to the parent sequence and previously reported TEV variants. Our approach should be broadly useful for improving the expression, stability, and function of biotechnologically important proteins.


Asunto(s)
Endopeptidasas , Temperatura , Endopeptidasas/metabolismo , Proteínas Recombinantes de Fusión
5.
bioRxiv ; 2023 Nov 02.
Artículo en Inglés | MEDLINE | ID: mdl-37961294

RESUMEN

Despite transformative advances in protein design with deep learning, the design of small-molecule-binding proteins and sensors for arbitrary ligands remains a grand challenge. Here we combine deep learning and physics-based methods to generate a family of proteins with diverse and designable pocket geometries, which we employ to computationally design binders for six chemically and structurally distinct small-molecule targets. Biophysical characterization of the designed binders revealed nanomolar to low micromolar binding affinities and atomic-level design accuracy. The bound ligands are exposed at one edge of the binding pocket, enabling the de novo design of chemically induced dimerization (CID) systems; we take advantage of this to create a biosensor with nanomolar sensitivity for cortisol. Our approach provides a general method to design proteins that bind and sense small molecules for a wide range of analytical, environmental, and biomedical applications.

6.
Nat Struct Mol Biol ; 30(11): 1755-1760, 2023 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-37770718

RESUMEN

In pseudocyclic proteins, such as TIM barrels, ß barrels, and some helical transmembrane channels, a single subunit is repeated in a cyclic pattern, giving rise to a central cavity that can serve as a pocket for ligand binding or enzymatic activity. Inspired by these proteins, we devised a deep-learning-based approach to broadly exploring the space of closed repeat proteins starting from only a specification of the repeat number and length. Biophysical data for 38 structurally diverse pseudocyclic designs produced in Escherichia coli are consistent with the design models, and the three crystal structures we were able to obtain are very close to the designed structures. Docking studies suggest the diversity of folds and central pockets provide effective starting points for designing small-molecule binders and enzymes.


Asunto(s)
Alucinaciones , Proteínas , Humanos , Proteínas/química
7.
Science ; 381(6659): 754-760, 2023 08 18.
Artículo en Inglés | MEDLINE | ID: mdl-37590357

RESUMEN

In nature, proteins that switch between two conformations in response to environmental stimuli structurally transduce biochemical information in a manner analogous to how transistors control information flow in computing devices. Designing proteins with two distinct but fully structured conformations is a challenge for protein design as it requires sculpting an energy landscape with two distinct minima. Here we describe the design of "hinge" proteins that populate one designed state in the absence of ligand and a second designed state in the presence of ligand. X-ray crystallography, electron microscopy, double electron-electron resonance spectroscopy, and binding measurements demonstrate that despite the significant structural differences the two states are designed with atomic level accuracy and that the conformational and binding equilibria are closely coupled.


Asunto(s)
Ingeniería de Proteínas , Cristalografía por Rayos X , Ligandos , Ingeniería de Proteínas/métodos , Conformación Proteica
8.
bioRxiv ; 2023 Aug 04.
Artículo en Inglés | MEDLINE | ID: mdl-37577478

RESUMEN

The design of novel protein-protein interfaces using physics-based design methods such as Rosetta requires substantial computational resources and manual refinement by expert structural biologists. A new generation of deep learning methods promises to simplify protein-protein interface design and enable its application to a wide variety of problems by researchers from various scientific disciplines. Here we test the ability of a deep learning method for protein sequence design, ProteinMPNN, to design two-component tetrahedral protein nanomaterials and benchmark its performance against Rosetta. ProteinMPNN had a similar success rate to Rosetta, yielding 13 new experimentally confirmed assemblies, but required orders of magnitude less computation and no manual refinement. The interfaces designed by ProteinMPNN were substantially more polar than those designed by Rosetta, which facilitated in vitro assembly of the designed nanomaterials from independently purified components. Crystal structures of several of the assemblies confirmed the accuracy of the design method at high resolution. Our results showcase the potential of deep learning-based methods to unlock the widespread application of designed protein-protein interfaces and self-assembling protein nanomaterials in biotechnology.

9.
Nature ; 620(7973): 434-444, 2023 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-37468638

RESUMEN

Advances in DNA sequencing and machine learning are providing insights into protein sequences and structures on an enormous scale1. However, the energetics driving folding are invisible in these structures and remain largely unknown2. The hidden thermodynamics of folding can drive disease3,4, shape protein evolution5-7 and guide protein engineering8-10, and new approaches are needed to reveal these thermodynamics for every sequence and structure. Here we present cDNA display proteolysis, a method for measuring thermodynamic folding stability for up to 900,000 protein domains in a one-week experiment. From 1.8 million measurements in total, we curated a set of around 776,000 high-quality folding stabilities covering all single amino acid variants and selected double mutants of 331 natural and 148 de novo designed protein domains 40-72 amino acids in length. Using this extensive dataset, we quantified (1) environmental factors influencing amino acid fitness, (2) thermodynamic couplings (including unexpected interactions) between protein sites, and (3) the global divergence between evolutionary amino acid usage and protein folding stability. We also examined how our approach could identify stability determinants in designed proteins and evaluate design methods. The cDNA display proteolysis method is fast, accurate and uniquely scalable, and promises to reveal the quantitative rules for how amino acid sequences encode folding stability.


Asunto(s)
Biología , Ingeniería de Proteínas , Pliegue de Proteína , Proteínas , Aminoácidos/genética , Aminoácidos/metabolismo , Biología/métodos , ADN Complementario/genética , Estabilidad Proteica , Proteínas/química , Proteínas/genética , Proteínas/metabolismo , Termodinámica , Proteolisis , Ingeniería de Proteínas/métodos , Dominios Proteicos/genética , Mutación
10.
bioRxiv ; 2023 Jun 09.
Artículo en Inglés | MEDLINE | ID: mdl-37333359

RESUMEN

A wooden house frame consists of many different lumber pieces, but because of the regularity of these building blocks, the structure can be designed using straightforward geometrical principles. The design of multicomponent protein assemblies in comparison has been much more complex, largely due to the irregular shapes of protein structures 1 . Here we describe extendable linear, curved, and angled protein building blocks, as well as inter-block interactions that conform to specified geometric standards; assemblies designed using these blocks inherit their extendability and regular interaction surfaces, enabling them to be expanded or contracted by varying the number of modules, and reinforced with secondary struts. Using X-ray crystallography and electron microscopy, we validate nanomaterial designs ranging from simple polygonal and circular oligomers that can be concentrically nested, up to large polyhedral nanocages and unbounded straight "train track" assemblies with reconfigurable sizes and geometries that can be readily blueprinted. Because of the complexity of protein structures and sequence-structure relationships, it has not been previously possible to build up large protein assemblies by deliberate placement of protein backbones onto a blank 3D canvas; the simplicity and geometric regularity of our design platform now enables construction of protein nanomaterials according to "back of an envelope" architectural blueprints.

11.
Nat Commun ; 14(1): 2625, 2023 05 06.
Artículo en Inglés | MEDLINE | ID: mdl-37149653

RESUMEN

Recently it has become possible to de novo design high affinity protein binding proteins from target structural information alone. There is, however, considerable room for improvement as the overall design success rate is low. Here, we explore the augmentation of energy-based protein binder design using deep learning. We find that using AlphaFold2 or RoseTTAFold to assess the probability that a designed sequence adopts the designed monomer structure, and the probability that this structure binds the target as designed, increases design success rates nearly 10-fold. We find further that sequence design using ProteinMPNN rather than Rosetta considerably increases computational efficiency.


Asunto(s)
Aprendizaje Profundo , Ingeniería de Proteínas , Proteínas/metabolismo , Unión Proteica
12.
Nature ; 614(7949): 774-780, 2023 02.
Artículo en Inglés | MEDLINE | ID: mdl-36813896

RESUMEN

De novo enzyme design has sought to introduce active sites and substrate-binding pockets that are predicted to catalyse a reaction of interest into geometrically compatible native scaffolds1,2, but has been limited by a lack of suitable protein structures and the complexity of native protein sequence-structure relationships. Here we describe a deep-learning-based 'family-wide hallucination' approach that generates large numbers of idealized protein structures containing diverse pocket shapes and designed sequences that encode them. We use these scaffolds to design artificial luciferases that selectively catalyse the oxidative chemiluminescence of the synthetic luciferin substrates diphenylterazine3 and 2-deoxycoelenterazine. The designed active sites position an arginine guanidinium group adjacent to an anion that develops during the reaction in a binding pocket with high shape complementarity. For both luciferin substrates, we obtain designed luciferases with high selectivity; the most active of these is a small (13.9 kDa) and thermostable (with a melting temperature higher than 95 °C) enzyme that has a catalytic efficiency on diphenylterazine (kcat/Km = 106 M-1 s-1) comparable to that of native luciferases, but a much higher substrate specificity. The creation of highly active and specific biocatalysts from scratch with broad applications in biomedicine is a key milestone for computational enzyme design, and our approach should enable generation of a wide range of luciferases and other enzymes.


Asunto(s)
Aprendizaje Profundo , Luciferasas , Biocatálisis , Dominio Catalítico , Estabilidad de Enzimas , Calor , Luciferasas/química , Luciferasas/metabolismo , Luciferinas/metabolismo , Luminiscencia , Oxidación-Reducción , Especificidad por Sustrato
13.
Proc Natl Acad Sci U S A ; 120(9): e2216697120, 2023 02 28.
Artículo en Inglés | MEDLINE | ID: mdl-36802421

RESUMEN

Peptide-binding proteins play key roles in biology, and predicting their binding specificity is a long-standing challenge. While considerable protein structural information is available, the most successful current methods use sequence information alone, in part because it has been a challenge to model the subtle structural changes accompanying sequence substitutions. Protein structure prediction networks such as AlphaFold model sequence-structure relationships very accurately, and we reasoned that if it were possible to specifically train such networks on binding data, more generalizable models could be created. We show that placing a classifier on top of the AlphaFold network and fine-tuning the combined network parameters for both classification and structure prediction accuracy leads to a model with strong generalizable performance on a wide range of Class I and Class II peptide-MHC interactions that approaches the overall performance of the state-of-the-art NetMHCpan sequence-based method. The peptide-MHC optimized model shows excellent performance in distinguishing binding and non-binding peptides to SH3 and PDZ domains. This ability to generalize well beyond the training set far exceeds that of sequence-only models and should be particularly powerful for systems where less experimental data are available.


Asunto(s)
Antígenos de Histocompatibilidad Clase II , Péptidos , Unión Proteica , Péptidos/química , Antígenos de Histocompatibilidad Clase II/metabolismo , Genes MHC Clase II , Dominios PDZ
14.
Bioinformatics ; 39(1)2023 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-36355460

RESUMEN

MOTIVATION: Multiple sequence alignments (MSAs) of homologous sequences contain information on structural and functional constraints and their evolutionary histories. Despite their importance for many downstream tasks, such as structure prediction, MSA generation is often treated as a separate pre-processing step, without any guidance from the application it will be used for. RESULTS: Here, we implement a smooth and differentiable version of the Smith-Waterman pairwise alignment algorithm that enables jointly learning an MSA and a downstream machine learning system in an end-to-end fashion. To demonstrate its utility, we introduce SMURF (Smooth Markov Unaligned Random Field), a new method that jointly learns an alignment and the parameters of a Markov Random Field for unsupervised contact prediction. We find that SMURF learns MSAs that mildly improve contact prediction on a diverse set of protein and RNA families. As a proof of concept, we demonstrate that by connecting our differentiable alignment module to AlphaFold2 and maximizing predicted confidence, we can learn MSAs that improve structure predictions over the initial MSAs. Interestingly, the alignments that improve AlphaFold predictions are self-inconsistent and can be viewed as adversarial. This work highlights the potential of differentiable dynamic programming to improve neural network pipelines that rely on an alignment and the potential dangers of optimizing predictions of protein sequences with methods that are not fully understood. AVAILABILITY AND IMPLEMENTATION: Our code and examples are available at: https://github.com/spetti/SMURF. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Proteínas , Humanos , Alineación de Secuencia , Proteínas/química , Redes Neurales de la Computación , Secuencia de Aminoácidos
15.
bioRxiv ; 2023 Dec 21.
Artículo en Inglés | MEDLINE | ID: mdl-38187589

RESUMEN

A general method for designing proteins to bind and sense any small molecule of interest would be widely useful. Due to the small number of atoms to interact with, binding to small molecules with high affinity requires highly shape complementary pockets, and transducing binding events into signals is challenging. Here we describe an integrated deep learning and energy based approach for designing high shape complementarity binders to small molecules that are poised for downstream sensing applications. We employ deep learning generated psuedocycles with repeating structural units surrounding central pockets; depending on the geometry of the structural unit and repeat number, these pockets span wide ranges of sizes and shapes. For a small molecule target of interest, we extensively sample high shape complementarity pseudocycles to generate large numbers of customized potential binding pockets; the ligand binding poses and the interacting interfaces are then optimized for high affinity binding. We computationally design binders to four diverse molecules, including for the first time polar flexible molecules such as methotrexate and thyroxine, which are expressed at high levels and have nanomolar affinities straight out of the computer. Co-crystal structures are nearly identical to the design models. Taking advantage of the modular repeating structure of pseudocycles and central location of the binding pockets, we constructed low noise nanopore sensors and chemically induced dimerization systems by splitting the binders into domains which assemble into the original pseudocycle pocket upon target molecule addition.

16.
Science ; 377(6604): 387-394, 2022 07 22.
Artículo en Inglés | MEDLINE | ID: mdl-35862514

RESUMEN

The binding and catalytic functions of proteins are generally mediated by a small number of functional residues held in place by the overall protein structure. Here, we describe deep learning approaches for scaffolding such functional sites without needing to prespecify the fold or secondary structure of the scaffold. The first approach, "constrained hallucination," optimizes sequences such that their predicted structures contain the desired functional site. The second approach, "inpainting," starts from the functional site and fills in additional sequence and structure to create a viable protein scaffold in a single forward pass through a specifically trained RoseTTAFold network. We use these two methods to design candidate immunogens, receptor traps, metalloproteins, enzymes, and protein-binding proteins and validate the designs using a combination of in silico and experimental tests.


Asunto(s)
Aprendizaje Profundo , Ingeniería de Proteínas , Proteínas , Sitios de Unión , Catálisis , Unión Proteica , Ingeniería de Proteínas/métodos , Pliegue de Proteína , Estructura Secundaria de Proteína , Proteínas/química
17.
Pac Symp Biocomput ; 27: 34-45, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-34890134

RESUMEN

The established approach to unsupervised protein contact prediction estimates coevolving positions using undirected graphical models. This approach trains a Potts model on a Multiple Sequence Alignment. Increasingly large Transformers are being pretrained on unlabeled, unaligned protein sequence databases and showing competitive performance on protein contact prediction. We argue that attention is a principled model of protein interactions, grounded in real properties of protein family data. We introduce an energy-based attention layer, factored attention, which, in a certain limit, recovers a Potts model, and use it to contrast Potts and Transformers. We show that the Transformer leverages hierarchical signal in protein family databases not captured by single-layer models. This raises the exciting possibility for the development of powerful structured models of protein family databases.


Asunto(s)
Biología Computacional , Proteínas , Atención , Humanos , Proteínas/genética , Alineación de Secuencia
18.
Science ; 373(6557): 871-876, 2021 08 20.
Artículo en Inglés | MEDLINE | ID: mdl-34282049

RESUMEN

DeepMind presented notably accurate predictions at the recent 14th Critical Assessment of Structure Prediction (CASP14) conference. We explored network architectures that incorporate related ideas and obtained the best performance with a three-track network in which information at the one-dimensional (1D) sequence level, the 2D distance map level, and the 3D coordinate level is successively transformed and integrated. The three-track network produces structure predictions with accuracies approaching those of DeepMind in CASP14, enables the rapid solution of challenging x-ray crystallography and cryo-electron microscopy structure modeling problems, and provides insights into the functions of proteins of currently unknown structure. The network also enables rapid generation of accurate protein-protein complex models from sequence information alone, short-circuiting traditional approaches that require modeling of individual subunits followed by docking. We make the method available to the scientific community to speed biological research.


Asunto(s)
Aprendizaje Profundo , Conformación Proteica , Pliegue de Proteína , Proteínas/química , Proteínas ADAM/química , Secuencia de Aminoácidos , Simulación por Computador , Microscopía por Crioelectrón , Cristalografía por Rayos X , Bases de Datos de Proteínas , Proteínas de la Membrana/química , Modelos Moleculares , Complejos Multiproteicos/química , Redes Neurales de la Computación , Subunidades de Proteína/química , Proteínas/fisiología , Receptores Acoplados a Proteínas G/química , Esfingosina N-Aciltransferasa/química
19.
Proteins ; 89(12): 1722-1733, 2021 12.
Artículo en Inglés | MEDLINE | ID: mdl-34331359

RESUMEN

The trRosetta structure prediction method employs deep learning to generate predicted residue-residue distance and orientation distributions from which 3D models are built. We sought to improve the method by incorporating as inputs (in addition to sequence information) both language model embeddings and template information weighted by sequence similarity to the target. We also developed a refinement pipeline that recombines models generated by template-free and template utilizing versions of trRosetta guided by the DeepAccNet accuracy predictor. Both benchmark tests and CASP results show that the new pipeline is a considerable improvement over the original trRosetta, and it is faster and requires less computing resources, completing the entire modeling process in a median < 3 h in CASP14. Our human group improved results with this pipeline primarily by identifying additional homologous sequences for input into the network. We also used the DeepAccNet accuracy predictor to guide Rosetta high-resolution refinement for submissions in the regular and refinement categories; although performance was quite good on a CASP relative scale, the overall improvements were rather modest in part due to missing inter-domain or inter-chain contacts.


Asunto(s)
Biología Computacional/métodos , Aprendizaje Profundo , Estructura Terciaria de Proteína , Proteínas , Programas Informáticos , Humanos , Metagenoma/genética , Proteínas/química , Proteínas/genética , Proteínas/metabolismo , Análisis de Secuencia de Proteína
20.
Nat Commun ; 12(1): 1340, 2021 02 26.
Artículo en Inglés | MEDLINE | ID: mdl-33637700

RESUMEN

We develop a deep learning framework (DeepAccNet) that estimates per-residue accuracy and residue-residue distance signed error in protein models and uses these predictions to guide Rosetta protein structure refinement. The network uses 3D convolutions to evaluate local atomic environments followed by 2D convolutions to provide their global contexts and outperforms other methods that similarly predict the accuracy of protein structure models. Overall accuracy predictions for X-ray and cryoEM structures in the PDB correlate with their resolution, and the network should be broadly useful for assessing the accuracy of both predicted structure models and experimentally determined structures and identifying specific regions likely to be in error. Incorporation of the accuracy predictions at multiple stages in the Rosetta refinement protocol considerably increased the accuracy of the resulting protein structure models, illustrating how deep learning can improve search for global energy minima of biomolecules.


Asunto(s)
Biología Computacional/métodos , Aprendizaje Profundo , Proteínas/química , Algoritmos , Caspasas/química , Modelos Biológicos , Modelos Moleculares , Conformación Proteica , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA