Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 39
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Res Sq ; 2024 Apr 26.
Artículo en Inglés | MEDLINE | ID: mdl-38746169

RESUMEN

The majority of proteins must form higher-order assemblies to perform their biological functions. Despite the importance of protein quaternary structure, there are few machine learning models that can accurately and rapidly predict the symmetry of assemblies involving multiple copies of the same protein chain. Here, we address this gap by training several classes of protein foundation models, including ESM-MSA, ESM2, and RoseTTAFold2, to predict homo-oligomer symmetry. Our best model named Seq2Symm, which utilizes ESM2, outperforms existing template-based and deep learning methods. It achieves an average PR-AUC of 0.48 and 0.44 across homo-oligomer symmetries on two different held-out test sets compared to 0.32 and 0.23 for the template-based method. Because Seq2Symm can rapidly predict homo-oligomer symmetries using a single sequence as input (~ 80,000 proteins/hour), we have applied it to 5 entire proteomes and ~ 3.5 million unlabeled protein sequences to identify patterns in protein assembly complexity across biological kingdoms and species.

2.
bioRxiv ; 2024 Apr 12.
Artículo en Inglés | MEDLINE | ID: mdl-38645026

RESUMEN

Identification of bacterial protein-protein interactions and predicting the structures of the complexes could aid in the understanding of pathogenicity mechanisms and developing treatments for infectious diseases. Here, we developed a deep learning-based pipeline that leverages residue-residue coevolution and protein structure prediction to systematically identify and structurally characterize protein-protein interactions at the proteome-wide scale. Using this pipeline, we searched through 78 million pairs of proteins across 19 human bacterial pathogens and identified 1923 confidently predicted complexes involving essential genes and 256 involving virulence factors. Many of these complexes were not previously known; we experimentally tested 12 such predictions, and half of them were validated. The predicted interactions span core metabolic and virulence pathways ranging from post-transcriptional modification to acid neutralization to outer membrane machinery and should contribute to our understanding of the biology of these important pathogens and the design of drugs to combat them.

3.
J Chem Theory Comput ; 20(7): 2689-2695, 2024 Apr 09.
Artículo en Inglés | MEDLINE | ID: mdl-38547871

RESUMEN

Mapping the ensemble of protein conformations that contribute to function and can be targeted by small molecule drugs remains an outstanding challenge. Here, we explore the use of variational autoencoders for reducing the challenge of dimensionality in the protein structure ensemble generation problem. We convert high-dimensional protein structural data into a continuous, low-dimensional representation, carry out a search in this space guided by a structure quality metric, and then use RoseTTAFold guided by the sampled structural information to generate 3D structures. We use this approach to generate ensembles for the cancer relevant protein K-Ras, train the VAE on a subset of the available K-Ras crystal structures and MD simulation snapshots, and assess the extent of sampling close to crystal structures withheld from training. We find that our latent space sampling procedure rapidly generates ensembles with high structural quality and is able to sample within 1 Å of held-out crystal structures, with a consistency higher than that of MD simulation or AlphaFold2 prediction. The sampled structures sufficiently recapitulate the cryptic pockets in the held-out K-Ras structures to allow for small molecule docking.


Asunto(s)
Proteínas , Proteínas/química , Conformación Proteica , Simulación por Computador
4.
Science ; 384(6693): eadl2528, 2024 Apr 19.
Artículo en Inglés | MEDLINE | ID: mdl-38452047

RESUMEN

Deep-learning methods have revolutionized protein structure prediction and design but are presently limited to protein-only systems. We describe RoseTTAFold All-Atom (RFAA), which combines a residue-based representation of amino acids and DNA bases with an atomic representation of all other groups to model assemblies that contain proteins, nucleic acids, small molecules, metals, and covalent modifications, given their sequences and chemical structures. By fine-tuning on denoising tasks, we developed RFdiffusion All-Atom (RFdiffusionAA), which builds protein structures around small molecules. Starting from random distributions of amino acid residues surrounding target small molecules, we designed and experimentally validated, through crystallography and binding measurements, proteins that bind the cardiac disease therapeutic digoxigenin, the enzymatic cofactor heme, and the light-harvesting molecule bilin.


Asunto(s)
Aminoácidos , Proteínas , Proteínas/química , ADN/química , Cristalografía
5.
Nat Methods ; 21(1): 117-121, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-37996753

RESUMEN

Protein-RNA and protein-DNA complexes play critical roles in biology. Despite considerable recent advances in protein structure prediction, the prediction of the structures of protein-nucleic acid complexes without homology to known complexes is a largely unsolved problem. Here we extend the RoseTTAFold machine learning protein-structure-prediction approach to additionally predict nucleic acid and protein-nucleic acid complexes. We develop a single trained network, RoseTTAFoldNA, that rapidly produces three-dimensional structure models with confidence estimates for protein-DNA and protein-RNA complexes. Here we show that confident predictions have considerably higher accuracy than current state-of-the-art methods. RoseTTAFoldNA should be broadly useful for modeling the structure of naturally occurring protein-nucleic acid complexes, and for designing sequence-specific RNA and DNA-binding proteins.


Asunto(s)
Ácidos Nucleicos , ARN/química , Proteínas de Unión al ADN/química , ADN/química
6.
bioRxiv ; 2023 Nov 10.
Artículo en Inglés | MEDLINE | ID: mdl-37986761

RESUMEN

Proteomics has been revolutionized by large pre-trained protein language models, which learn unsupervised representations from large corpora of sequences. The parameters of these models are then fine-tuned in a supervised setting to tailor the model to a specific downstream task. However, as model size increases, the computational and memory footprint of fine-tuning becomes a barrier for many research groups. In the field of natural language processing, which has seen a similar explosion in the size of models, these challenges have been addressed by methods for parameter-efficient fine-tuning (PEFT). In this work, we newly bring parameter-efficient fine-tuning methods to proteomics. Using the parameter-efficient method LoRA, we train new models for two important proteomic tasks: predicting protein-protein interactions (PPI) and predicting the symmetry of homooligomers. We show that for homooligomer symmetry prediction, these approaches achieve performance competitive with traditional fine-tuning while requiring reduced memory and using three orders of magnitude fewer parameters. On the PPI prediction task, we surprisingly find that PEFT models actually outperform traditional fine-tuning while using two orders of magnitude fewer parameters. Here, we go even further to show that freezing the parameters of the language model and training only a classification head also outperforms fine-tuning, using five orders of magnitude fewer parameters, and that both of these models outperform state-of-the-art PPI prediction methods with substantially reduced compute. We also demonstrate that PEFT is robust to variations in training hyper-parameters, and elucidate where best practices for PEFT in proteomics differ from in natural language processing. Thus, we provide a blueprint to democratize the power of protein language model tuning to groups which have limited computational resources.

8.
Protein Sci ; 32(11): e4780, 2023 11.
Artículo en Inglés | MEDLINE | ID: mdl-37695922

RESUMEN

Predicting the effects of mutations on protein function and stability is an outstanding challenge. Here, we assess the performance of a variant of RoseTTAFold jointly trained for sequence and structure recovery, RFjoint , for mutation effect prediction. Without any further training, we achieve comparable accuracy in predicting mutation effects for a diverse set of protein families using RFjoint to both another zero-shot model (MSA Transformer) and a model that requires specific training on a particular protein family for mutation effect prediction (DeepSequence). Thus, although the architecture of RFjoint was developed to address the protein design problem of scaffolding functional motifs, RFjoint acquired an understanding of the mutational landscapes of proteins during model training that is equivalent to that of recently developed large protein language models. The ability to simultaneously reason over protein structure and sequence could enable even more precise mutation effect predictions following supervised training on the task. These results suggest that RFjoint has a quite broad understanding of protein sequence-structure landscapes, and can be viewed as a joint model for protein sequence and structure which could be broadly useful for protein modeling.


Asunto(s)
Proteínas , Proteínas/genética , Proteínas/química , Mutación , Secuencia de Aminoácidos , Estabilidad Proteica
9.
Nature ; 620(7976): 1089-1100, 2023 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-37433327

RESUMEN

There has been considerable recent progress in designing new proteins using deep-learning methods1-9. Despite this progress, a general deep-learning framework for protein design that enables solution of a wide range of design challenges, including de novo binder design and design of higher-order symmetric architectures, has yet to be described. Diffusion models10,11 have had considerable success in image and language generative modelling but limited success when applied to protein modelling, probably due to the complexity of protein backbone geometry and sequence-structure relationships. Here we show that by fine-tuning the RoseTTAFold structure prediction network on protein structure denoising tasks, we obtain a generative model of protein backbones that achieves outstanding performance on unconditional and topology-constrained protein monomer design, protein binder design, symmetric oligomer design, enzyme active site scaffolding and symmetric motif scaffolding for therapeutic and metal-binding protein design. We demonstrate the power and generality of the method, called RoseTTAFold diffusion (RFdiffusion), by experimentally characterizing the structures and functions of hundreds of designed symmetric assemblies, metal-binding proteins and protein binders. The accuracy of RFdiffusion is confirmed by the cryogenic electron microscopy structure of a designed binder in complex with influenza haemagglutinin that is nearly identical to the design model. In a manner analogous to networks that produce images from user-specified inputs, RFdiffusion enables the design of diverse functional proteins from simple molecular specifications.


Asunto(s)
Aprendizaje Profundo , Proteínas , Dominio Catalítico , Microscopía por Crioelectrón , Glicoproteínas Hemaglutininas del Virus de la Influenza/química , Glicoproteínas Hemaglutininas del Virus de la Influenza/metabolismo , Glicoproteínas Hemaglutininas del Virus de la Influenza/ultraestructura , Unión Proteica , Proteínas/química , Proteínas/metabolismo , Proteínas/ultraestructura
10.
Nat Commun ; 14(1): 2625, 2023 05 06.
Artículo en Inglés | MEDLINE | ID: mdl-37149653

RESUMEN

Recently it has become possible to de novo design high affinity protein binding proteins from target structural information alone. There is, however, considerable room for improvement as the overall design success rate is low. Here, we explore the augmentation of energy-based protein binder design using deep learning. We find that using AlphaFold2 or RoseTTAFold to assess the probability that a designed sequence adopts the designed monomer structure, and the probability that this structure binds the target as designed, increases design success rates nearly 10-fold. We find further that sequence design using ProteinMPNN rather than Rosetta considerably increases computational efficiency.


Asunto(s)
Aprendizaje Profundo , Ingeniería de Proteínas , Proteínas/metabolismo , Unión Proteica
11.
Science ; 380(6642): 266-273, 2023 04 21.
Artículo en Inglés | MEDLINE | ID: mdl-37079676

RESUMEN

As a result of evolutionary selection, the subunits of naturally occurring protein assemblies often fit together with substantial shape complementarity to generate architectures optimal for function in a manner not achievable by current design approaches. We describe a "top-down" reinforcement learning-based design approach that solves this problem using Monte Carlo tree search to sample protein conformers in the context of an overall architecture and specified functional constraints. Cryo-electron microscopy structures of the designed disk-shaped nanopores and ultracompact icosahedra are very close to the computational models. The icosohedra enable very-high-density display of immunogens and signaling molecules, which potentiates vaccine response and angiogenesis induction. Our approach enables the top-down design of complex protein nanomaterials with desired system properties and demonstrates the power of reinforcement learning in protein design.


Asunto(s)
Aprendizaje Automático , Nanoestructuras , Ingeniería de Proteínas , Proteínas , Microscopía por Crioelectrón , Proteínas/química
12.
J Biol Chem ; 299(6): 104744, 2023 06.
Artículo en Inglés | MEDLINE | ID: mdl-37100290

RESUMEN

The outer membrane (OM) of Gram-negative bacteria is an asymmetric bilayer that protects the cell from external stressors, such as antibiotics. The Mla transport system is implicated in the Maintenance of OM Lipid Asymmetry by mediating retrograde phospholipid transport across the cell envelope. Mla uses a shuttle-like mechanism to move lipids between the MlaFEDB inner membrane complex and the MlaA-OmpF/C OM complex, via a periplasmic lipid-binding protein, MlaC. MlaC binds to MlaD and MlaA, but the underlying protein-protein interactions that facilitate lipid transfer are not well understood. Here, we take an unbiased deep mutational scanning approach to map the fitness landscape of MlaC from Escherichia coli, which provides insights into important functional sites. Combining this analysis with AlphaFold2 structure predictions and binding experiments, we map the MlaC-MlaA and MlaC-MlaD protein-protein interfaces. Our results suggest that the MlaD and MlaA binding surfaces on MlaC overlap to a large extent, leading to a model in which MlaC can only bind one of these proteins at a time. Low-resolution cryo-electron microscopy (cryo-EM) maps of MlaC bound to MlaFEDB suggest that at least two MlaC molecules can bind to MlaD at once, in a conformation consistent with AlphaFold2 predictions. These data lead us to a model for MlaC interaction with its binding partners and insights into lipid transfer steps that underlie phospholipid transport between the bacterial inner and OMs.


Asunto(s)
Proteínas de Escherichia coli , Metabolismo de los Lípidos , Proteínas de Transporte de Membrana , Proteínas de la Membrana Bacteriana Externa/genética , Proteínas de la Membrana Bacteriana Externa/metabolismo , Transporte Biológico , Membrana Celular/metabolismo , Microscopía por Crioelectrón , Escherichia coli/metabolismo , Proteínas de Escherichia coli/química , Lípidos de la Membrana/metabolismo , Fosfolípidos/metabolismo , Proteínas de Transporte de Membrana/química , Proteínas de Transporte de Membrana/metabolismo
13.
Proc Natl Acad Sci U S A ; 120(9): e2216697120, 2023 02 28.
Artículo en Inglés | MEDLINE | ID: mdl-36802421

RESUMEN

Peptide-binding proteins play key roles in biology, and predicting their binding specificity is a long-standing challenge. While considerable protein structural information is available, the most successful current methods use sequence information alone, in part because it has been a challenge to model the subtle structural changes accompanying sequence substitutions. Protein structure prediction networks such as AlphaFold model sequence-structure relationships very accurately, and we reasoned that if it were possible to specifically train such networks on binding data, more generalizable models could be created. We show that placing a classifier on top of the AlphaFold network and fine-tuning the combined network parameters for both classification and structure prediction accuracy leads to a model with strong generalizable performance on a wide range of Class I and Class II peptide-MHC interactions that approaches the overall performance of the state-of-the-art NetMHCpan sequence-based method. The peptide-MHC optimized model shows excellent performance in distinguishing binding and non-binding peptides to SH3 and PDZ domains. This ability to generalize well beyond the training set far exceeds that of sequence-only models and should be particularly powerful for systems where less experimental data are available.


Asunto(s)
Antígenos de Histocompatibilidad Clase II , Péptidos , Unión Proteica , Péptidos/química , Antígenos de Histocompatibilidad Clase II/metabolismo , Genes MHC Clase II , Dominios PDZ
14.
Nat Commun ; 14(1): 927, 2023 02 18.
Artículo en Inglés | MEDLINE | ID: mdl-36807264

RESUMEN

To protect themselves from host attack, numerous jumbo bacteriophages establish a phage nucleus-a micron-scale, proteinaceous structure encompassing the replicating phage DNA. Bacteriophage and host proteins associated with replication and transcription are concentrated inside the phage nucleus while other phage and host proteins are excluded, including CRISPR-Cas and restriction endonuclease host defense systems. Here, we show that nucleus fragments isolated from ϕPA3 infected Pseudomonas aeruginosa form a 2-dimensional lattice, having p2 or p4 symmetry. We further demonstrate that recombinantly purified primary Phage Nuclear Enclosure (PhuN) protein spontaneously assembles into similar 2D sheets with p2 and p4 symmetry. We resolve the dominant p2 symmetric state to 3.9 Šby cryo-EM. Our structure reveals a two-domain core, organized into quasi-symmetric tetramers. Flexible loops and termini mediate adaptable inter-tetramer contacts that drive subunit assembly into a lattice and enable the adoption of different symmetric states. While the interfaces between subunits are mostly well packed, two are open, forming channels that likely have functional implications for the transport of proteins, mRNA, and small molecules.


Asunto(s)
Bacteriófagos , Bacteriófagos/genética , Proteínas Virales/metabolismo , Sistemas CRISPR-Cas
15.
Science ; 377(6604): 387-394, 2022 07 22.
Artículo en Inglés | MEDLINE | ID: mdl-35862514

RESUMEN

The binding and catalytic functions of proteins are generally mediated by a small number of functional residues held in place by the overall protein structure. Here, we describe deep learning approaches for scaffolding such functional sites without needing to prespecify the fold or secondary structure of the scaffold. The first approach, "constrained hallucination," optimizes sequences such that their predicted structures contain the desired functional site. The second approach, "inpainting," starts from the functional site and fills in additional sequence and structure to create a viable protein scaffold in a single forward pass through a specifically trained RoseTTAFold network. We use these two methods to design candidate immunogens, receptor traps, metalloproteins, enzymes, and protein-binding proteins and validate the designs using a combination of in silico and experimental tests.


Asunto(s)
Aprendizaje Profundo , Ingeniería de Proteínas , Proteínas , Sitios de Unión , Catálisis , Unión Proteica , Ingeniería de Proteínas/métodos , Pliegue de Proteína , Estructura Secundaria de Proteína , Proteínas/química
17.
Science ; 374(6573): eabm4805, 2021 Dec 10.
Artículo en Inglés | MEDLINE | ID: mdl-34762488

RESUMEN

Protein-protein interactions play critical roles in biology, but the structures of many eukaryotic protein complexes are unknown, and there are likely many interactions not yet identified. We take advantage of advances in proteome-wide amino acid coevolution analysis and deep-learning­based structure modeling to systematically identify and build accurate models of core eukaryotic protein complexes within the Saccharomyces cerevisiae proteome. We use a combination of RoseTTAFold and AlphaFold to screen through paired multiple sequence alignments for 8.3 million pairs of yeast proteins, identify 1505 likely to interact, and build structure models for 106 previously unidentified assemblies and 806 that have not been structurally characterized. These complexes, which have as many as five subunits, play roles in almost all key processes in eukaryotic cells and provide broad insights into biological function.


Asunto(s)
Aprendizaje Profundo , Complejos Multiproteicos/química , Complejos Multiproteicos/metabolismo , Mapeo de Interacción de Proteínas , Proteoma/química , Proteínas de Saccharomyces cerevisiae/química , Proteínas de Saccharomyces cerevisiae/metabolismo , Aciltransferasas/química , Aciltransferasas/metabolismo , Segregación Cromosómica , Biología Computacional , Simulación por Computador , Reparación del ADN , Evolución Molecular , Recombinación Homóloga , Ligasas/química , Ligasas/metabolismo , Proteínas de la Membrana/química , Proteínas de la Membrana/metabolismo , Modelos Moleculares , Biosíntesis de Proteínas , Conformación Proteica , Mapas de Interacción de Proteínas , Proteoma/metabolismo , Ribosomas/metabolismo , Saccharomyces cerevisiae/química , Ubiquitina/química , Ubiquitina/metabolismo
18.
Sci Adv ; 7(35)2021 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-34452907

RESUMEN

The class IB phosphoinositide 3-kinase (PI3K), PI3Kγ, is a master regulator of immune cell function and a promising drug target for both cancer and inflammatory diseases. Critical to PI3Kγ function is the association of the p110γ catalytic subunit to either a p101 or p84 regulatory subunit, which mediates activation by G protein-coupled receptors. Here, we report the cryo-electron microscopy structure of a heterodimeric PI3Kγ complex, p110γ-p101. This structure reveals a unique assembly of catalytic and regulatory subunits that is distinct from other class I PI3K complexes. p101 mediates activation through its Gßγ-binding domain, recruiting the heterodimer to the membrane and allowing for engagement of a secondary Gßγ-binding site in p110γ. Mutations at the p110γ-p101 and p110γ-adaptor binding domain interfaces enhanced Gßγ activation. A nanobody that specifically binds to the p101-Gßγ interface blocks activation, providing a novel tool to study and target p110γ-p101-specific signaling events in vivo.

19.
Proteins ; 89(12): 1824-1833, 2021 12.
Artículo en Inglés | MEDLINE | ID: mdl-34324224

RESUMEN

For CASP14, we developed deep learning-based methods for predicting homo-oligomeric and hetero-oligomeric contacts and used them for oligomer modeling. To build structure models, we developed an oligomer structure generation method that utilizes predicted interchain contacts to guide iterative restrained minimization from random backbone structures. We supplemented this gradient-based fold-and-dock method with template-based and ab initio docking approaches using deep learning-based subunit predictions on 29 assembly targets. These methods produced oligomer models with summed Z-scores 5.5 units higher than the next best group, with the fold-and-dock method having the best relative performance. Over the eight targets for which this method was used, the best of the five submitted models had average oligomer TM-score of 0.71 (average oligomer TM-score of the next best group: 0.64), and explicit modeling of inter-subunit interactions improved modeling of six out of 40 individual domains (ΔGDT-TS > 2.0).


Asunto(s)
Modelos Moleculares , Conformación Proteica , Proteínas , Programas Informáticos , Biología Computacional , Bases de Datos de Proteínas , Aprendizaje Profundo , Unión Proteica , Subunidades de Proteína/química , Subunidades de Proteína/metabolismo , Proteínas/química , Proteínas/metabolismo , Análisis de Secuencia de Proteína
20.
Proteins ; 89(12): 1722-1733, 2021 12.
Artículo en Inglés | MEDLINE | ID: mdl-34331359

RESUMEN

The trRosetta structure prediction method employs deep learning to generate predicted residue-residue distance and orientation distributions from which 3D models are built. We sought to improve the method by incorporating as inputs (in addition to sequence information) both language model embeddings and template information weighted by sequence similarity to the target. We also developed a refinement pipeline that recombines models generated by template-free and template utilizing versions of trRosetta guided by the DeepAccNet accuracy predictor. Both benchmark tests and CASP results show that the new pipeline is a considerable improvement over the original trRosetta, and it is faster and requires less computing resources, completing the entire modeling process in a median < 3 h in CASP14. Our human group improved results with this pipeline primarily by identifying additional homologous sequences for input into the network. We also used the DeepAccNet accuracy predictor to guide Rosetta high-resolution refinement for submissions in the regular and refinement categories; although performance was quite good on a CASP relative scale, the overall improvements were rather modest in part due to missing inter-domain or inter-chain contacts.


Asunto(s)
Biología Computacional/métodos , Aprendizaje Profundo , Estructura Terciaria de Proteína , Proteínas , Programas Informáticos , Humanos , Metagenoma/genética , Proteínas/química , Proteínas/genética , Proteínas/metabolismo , Análisis de Secuencia de Proteína
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...