Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
1.
Proc Natl Acad Sci U S A ; 114(34): 9122-9127, 2017 08 22.
Artigo em Inglês | MEDLINE | ID: mdl-28784799

RESUMO

Residue pairs that directly coevolve in protein families are generally close in protein 3D structures. Here we study the exceptions to this general trend-directly coevolving residue pairs that are distant in protein structures-to determine the origins of evolutionary pressure on spatially distant residues and to understand the sources of error in contact-based structure prediction. Over a set of 4,000 protein families, we find that 25% of directly coevolving residue pairs are separated by more than 5 Å in protein structures and 3% by more than 15 Å. The majority (91%) of directly coevolving residue pairs in the 5-15 Å range are found to be in contact in at least one homologous structure-these exceptions arise from structural variation in the family in the region containing the residues. Thirty-five percent of the exceptions greater than 15 Å are at homo-oligomeric interfaces, 19% arise from family structural variation, and 27% are in repeat proteins likely reflecting alignment errors. Of the remaining long-range exceptions (<1% of the total number of coupled pairs), many can be attributed to close interactions in an oligomeric state. Overall, the results suggest that directly coevolving residue pairs not in repeat proteins are spatially proximal in at least one biologically relevant protein conformation within the family; we find little evidence for direct coupling between residues at spatially separated allosteric and functional sites or for increased direct coupling between residue pairs on putative allosteric pathways connecting them.


Assuntos
Aminoácidos/química , Evolução Molecular , Conformação Proteica , Proteínas/química , Aminoácidos/genética , Aminoácidos/metabolismo , Sítios de Ligação , Cristalografia por Raios X , Bases de Dados de Proteínas , Modelos Moleculares , Ligação Proteica , Domínios Proteicos , Multimerização Proteica , Proteínas/genética , Proteínas/metabolismo
2.
J Biol Chem ; 290(29): 17796-17805, 2015 Jul 17.
Artigo em Inglês | MEDLINE | ID: mdl-25971965

RESUMO

Members of the Zrt and Irt protein (ZIP) family are a central participant in transition metal homeostasis as they function to increase the cytosolic concentration of zinc and/or iron. However, the lack of a crystal structure hinders elucidation of the molecular mechanism of ZIP proteins. Here, we employed GREMLIN, a co-evolution-based contact prediction approach in conjunction with the Rosetta structure prediction program to construct a structural model of the human (h) ZIP4 transporter. The predicted contact data are best fit by modeling hZIP4 as a dimer. Mutagenesis of residues that comprise a central putative hZIP4 transmembrane transition metal coordination site in the structural model alter the kinetics and specificity of hZIP4. Comparison of the hZIP4 dimer model to all known membrane protein structures identifies the 12-transmembrane monomeric Piriformospora indica phosphate transporter (PiPT), a member of the major facilitator superfamily (MFS), as a likely structural homolog.


Assuntos
Proteínas de Transporte de Cátions/química , Proteínas de Transporte de Cátions/metabolismo , Zinco/metabolismo , Animais , Cátions Bivalentes/metabolismo , Células Cultivadas , Cristalografia por Raios X , Humanos , Ferro/metabolismo , Modelos Moleculares , Multimerização Proteica , Xenopus
3.
Proc Natl Acad Sci U S A ; 110(39): 15674-9, 2013 Sep 24.
Artigo em Inglês | MEDLINE | ID: mdl-24009338

RESUMO

Recently developed methods have shown considerable promise in predicting residue-residue contacts in protein 3D structures using evolutionary covariance information. However, these methods require large numbers of evolutionarily related sequences to robustly assess the extent of residue covariation, and the larger the protein family, the more likely that contact information is unnecessary because a reasonable model can be built based on the structure of a homolog. Here we describe a method that integrates sequence coevolution and structural context information using a pseudolikelihood approach, allowing more accurate contact predictions from fewer homologous sequences. We rigorously assess the utility of predicted contacts for protein structure prediction using large and representative sequence and structure databases from recent structure prediction experiments. We find that contact predictions are likely to be accurate when the number of aligned sequences (with sequence redundancy reduced to 90%) is greater than five times the length of the protein, and that accurate predictions are likely to be useful for structure modeling if the aligned sequences are more similar to the protein of interest than to the closest homolog of known structure. These conditions are currently met by 422 of the protein families collected in the Pfam database.


Assuntos
Aminoácidos/química , Biologia Computacional/métodos , Evolução Molecular , Proteínas/química , Proteínas/genética , Algoritmos , Sequência de Aminoácidos , Bases de Dados de Proteínas , Modelos Moleculares
4.
BMC Genomics ; 13 Suppl 1: S5, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22369071

RESUMO

We introduce three algorithms for learning generative models of molecular structures from molecular dynamics simulations. The first algorithm learns a Bayesian-optimal undirected probabilistic model over user-specified covariates (e.g., fluctuations, distances, angles, etc). L1 regularization is used to ensure sparse models and thus reduce the risk of over-fitting the data. The topology of the resulting model reveals important couplings between different parts of the protein, thus aiding in the analysis of molecular motions. The generative nature of the model makes it well-suited to making predictions about the global effects of local structural changes (e.g., the binding of an allosteric regulator). Additionally, the model can be used to sample new conformations. The second algorithm learns a time-varying graphical model where the topology and parameters change smoothly along the trajectory, revealing the conformational sub-states. The last algorithm learns a Markov Chain over undirected graphical models which can be used to study and simulate kinetics. We demonstrate our algorithms on multiple molecular dynamics trajectories.


Assuntos
Simulação de Dinâmica Molecular , Proteínas/química , Algoritmos , Simulação por Computador , Cadeias de Markov , Conformação Proteica
5.
Proteins ; 79(2): 444-62, 2011 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-21120864

RESUMO

Protein-protein interactions are governed by the change in free energy upon binding, ΔG = ΔH - TΔS. These interactions are often marginally stable, so one must examine the balance between the change in enthalpy, ΔH, and the change in entropy, ΔS, when investigating known complexes, characterizing the effects of mutations, or designing optimized variants. To perform a large-scale study into the contribution of conformational entropy to binding free energy, we developed a technique called GOBLIN (Graphical mOdel for BiomoLecular INteractions) that performs physics-based free energy calculations for protein-protein complexes under both side-chain and backbone flexibility. Goblin uses a probabilistic graphical model that exploits conditional independencies in the Boltzmann distribution and employs variational inference techniques that approximate the free energy of binding in only a few minutes. We examined the role of conformational entropy on a benchmark set of more than 700 mutants in eight large, well-studied complexes. Our findings suggest that conformational entropy is important in protein-protein interactions--the root mean square error (RMSE) between calculated and experimentally measured ΔΔGs decreases by 12% when explicit entropic contributions were incorporated. GOBLIN models all atoms of the protein complex and detects changes to the binding entropy along the interface as well as positions distal to the binding interface. Our results also suggest that a variational approach to entropy calculations may be quantitatively more accurate than the knowledge-based approaches used by the well-known programs FOLDX and Rosetta--GOBLIN's RMSEs are 10 and 36% lower than these programs, respectively.


Assuntos
Proteínas/química , Algoritmos , Aminoácidos/química , Animais , Simulação por Computador , Entropia , Humanos , Cadeias de Markov , Modelos Moleculares , Simulação de Dinâmica Molecular , Mutação , Ligação Proteica , Domínios e Motivos de Interação entre Proteínas , Estrutura Quaternária de Proteína , Proteínas/genética , Software
6.
Proteins ; 79(4): 1061-78, 2011 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-21268112

RESUMO

We introduce a new approach to learning statistical models from multiple sequence alignments (MSA) of proteins. Our method, called GREMLIN (Generative REgularized ModeLs of proteINs), learns an undirected probabilistic graphical model of the amino acid composition within the MSA. The resulting model encodes both the position-specific conservation statistics and the correlated mutation statistics between sequential and long-range pairs of residues. Existing techniques for learning graphical models from MSA either make strong, and often inappropriate assumptions about the conditional independencies within the MSA (e.g., Hidden Markov Models), or else use suboptimal algorithms to learn the parameters of the model. In contrast, GREMLIN makes no a priori assumptions about the conditional independencies within the MSA. We formulate and solve a convex optimization problem, thus guaranteeing that we find a globally optimal model at convergence. The resulting model is also generative, allowing for the design of new protein sequences that have the same statistical properties as those in the MSA. We perform a detailed analysis of covariation statistics on the extensively studied WW and PDZ domains and show that our method out-performs an existing algorithm for learning undirected probabilistic graphical models from MSA. We then apply our approach to 71 additional families from the PFAM database and demonstrate that the resulting models significantly out-perform Hidden Markov Models in terms of predictive accuracy.


Assuntos
Modelos Químicos , Dobramento de Proteína , Proteínas/química , Alinhamento de Sequência/métodos , Sequência de Aminoácidos , Área Sob a Curva , Biologia Computacional , Gráficos por Computador , Simulação por Computador , Cadeias de Markov , Modelos Moleculares , Modelos Estatísticos , Domínios PDZ , Análise de Sequência de Proteína , Relação Estrutura-Atividade
7.
Science ; 355(6322): 294-298, 2017 Jan 20.
Artigo em Inglês | MEDLINE | ID: mdl-28104891

RESUMO

Despite decades of work by structural biologists, there are still ~5200 protein families with unknown structure outside the range of comparative modeling. We show that Rosetta structure prediction guided by residue-residue contacts inferred from evolutionary information can accurately model proteins that belong to large families and that metagenome sequence data more than triple the number of protein families with sufficient sequences for accurate modeling. We then integrate metagenome data, contact-based structure matching, and Rosetta structure calculations to generate models for 614 protein families with currently unknown structures; 206 are membrane proteins and 137 have folds not represented in the Protein Data Bank. This approach provides the representative models for large protein families originally envisioned as the goal of the Protein Structure Initiative at a fraction of the cost.


Assuntos
Biologia Computacional/métodos , Metagenoma , Proteínas/química , Algoritmos , Sequência de Aminoácidos , Cristalografia por Raios X , Bases de Dados de Proteínas , Evolução Molecular , Modelos Moleculares , Conformação Proteica , Dobramento de Proteína , Proteínas/genética , Análise de Sequência de Proteína , Software
8.
J Comput Biol ; 22(6): 474-86, 2015 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-25973864

RESUMO

In studying the strength and specificity of interaction between members of two protein families, key questions center on which pairs of possible partners actually interact, how well they interact, and why they interact while others do not. The advent of large-scale experimental studies of interactions between members of a target family and a diverse set of possible interaction partners offers the opportunity to address these questions. We develop here a method, DgSpi (data-driven graphical models of specificity in protein:protein interactions), for learning and using graphical models that explicitly represent the amino acid basis for interaction specificity (why) and extend earlier classification-oriented approaches (which) to predict the ΔG of binding (how well). We demonstrate the effectiveness of our approach in analyzing and predicting interactions between a set of 82 PDZ recognition modules against a panel of 217 possible peptide partners, based on data from MacBeath and colleagues. Our predicted ΔG values are highly predictive of the experimentally measured ones, reaching correlation coefficients of 0.69 in 10-fold cross-validation and 0.63 in leave-one-PDZ-out cross-validation. Furthermore, the model serves as a compact representation of amino acid constraints underlying the interactions, enabling protein-level ΔG predictions to be naturally understood in terms of residue-level constraints. Finally, the model DgSpi readily enables the design of new interacting partners, and we demonstrate that designed ligands are novel and diverse.


Assuntos
Ligação Proteica/genética , Proteínas/genética , Sequência de Aminoácidos , Aminoácidos/genética , Ligantes , Modelos Moleculares , Sensibilidade e Especificidade
9.
Elife ; 4: e09248, 2015 Sep 03.
Artigo em Inglês | MEDLINE | ID: mdl-26335199

RESUMO

The prediction of the structures of proteins without detectable sequence similarity to any protein of known structure remains an outstanding scientific challenge. Here we report significant progress in this area. We first describe de novo blind structure predictions of unprecendented accuracy we made for two proteins in large families in the recent CASP11 blind test of protein structure prediction methods by incorporating residue-residue co-evolution information in the Rosetta structure prediction program. We then describe the use of this method to generate structure models for 58 of the 121 large protein families in prokaryotes for which three-dimensional structures are not available. These models, which are posted online for public access, provide structural information for the over 400,000 proteins belonging to the 58 families and suggest hypotheses about mechanism for the subset for which the function is known, and hypotheses about function for the remainder.


Assuntos
Proteínas de Bactérias/química , Biologia Computacional/métodos , Evolução Molecular , Proteínas de Bactérias/genética , Modelos Moleculares , Conformação Proteica
10.
Elife ; 3: e02030, 2014 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-24842992

RESUMO

Do the amino acid sequence identities of residues that make contact across protein interfaces covary during evolution? If so, such covariance could be used to predict contacts across interfaces and assemble models of biological complexes. We find that residue pairs identified using a pseudo-likelihood-based method to covary across protein-protein interfaces in the 50S ribosomal unit and 28 additional bacterial protein complexes with known structure are almost always in contact in the complex, provided that the number of aligned sequences is greater than the average length of the two proteins. We use this method to make subunit contact predictions for an additional 36 protein complexes with unknown structures, and present models based on these predictions for the tripartite ATP-independent periplasmic (TRAP) transporter, the tripartite efflux system, the pyruvate formate lyase-activating enzyme complex, and the methionine ABC transporter.DOI: http://dx.doi.org/10.7554/eLife.02030.001.


Assuntos
Evolução Biológica , Proteínas/metabolismo , Modelos Moleculares , Ligação Proteica , Conformação Proteica , Proteínas/química
11.
Res Comput Mol Biol ; 8394: 129-143, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25414914

RESUMO

In studying the strength and specificity of interaction between members of two protein families, key questions center on which pairs of possible partners actually interact, how well they interact, and why they interact while others do not. The advent of large-scale experimental studies of interactions between members of a target family and a diverse set of possible interaction partners offers the opportunity to address these questions. We develop here a method, DgSpi (Data-driven Graphical models of Specificity in Protein:protein Interactions), for learning and using graphical models that explicitly represent the amino acid basis for interaction specificity (why) and extend earlier classification-oriented approaches (which) to predict the ΔG of binding (how well). We demonstrate the effectiveness of our approach in analyzing and predicting interactions between a set of 82 PDZ recognition modules, against a panel of 217 possible peptide partners, based on data from MacBeath and colleagues. Our predicted ΔG values are highly predictive of the experimentally measured ones, reaching correlation coefficients of 0.69 in 10-fold cross-validation and 0.63 in leave-one-PDZ-out cross-validation. Furthermore, the model serves as a compact representation of amino acid constraints underlying the interactions, enabling protein-level ΔG predictions to be naturally understood in terms of residue-level constraints. Finally, as a generative model, DgSpi readily enables the design of new interacting partners, and we demonstrate that designed ligands are novel and diverse.

12.
Nat Biotechnol ; 30(6): 543-8, 2012 May 27.
Artigo em Inglês | MEDLINE | ID: mdl-22634563

RESUMO

We show that comprehensive sequence-function maps obtained by deep sequencing can be used to reprogram interaction specificity and to leapfrog over bottlenecks in affinity maturation by combining many individually small contributions not detectable in conventional approaches. We use this approach to optimize two computationally designed inhibitors against H1N1 influenza hemagglutinin and, in both cases, obtain variants with subnanomolar binding affinity. The most potent of these, a 51-residue protein, is broadly cross-reactive against all influenza group 1 hemagglutinins, including human H2, and neutralizes H1N1 viruses with a potency that rivals that of several human monoclonal antibodies, demonstrating that computational design followed by comprehensive energy landscape mapping can generate proteins with potential therapeutic utility.


Assuntos
Antivirais/química , Antivirais/farmacologia , Descoberta de Drogas/métodos , Glicoproteínas de Hemaglutininação de Vírus da Influenza/química , Glicoproteínas de Hemaglutininação de Vírus da Influenza/metabolismo , Vírus da Influenza A Subtipo H1N1/efeitos dos fármacos , Animais , Sobrevivência Celular/efeitos dos fármacos , Biologia Computacional , Cães , Sequenciamento de Nucleotídeos em Larga Escala , Vírus da Influenza A Subtipo H1N1/metabolismo , Células Madin Darby de Rim Canino , Modelos Moleculares , Testes de Neutralização , Ligação Proteica , Eletricidade Estática , Termodinâmica
13.
J Comput Biol ; 15(7): 755-66, 2008 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-18662103

RESUMO

We present a technique for approximating the free energy of protein structures using generalized belief propagation (GBP). The accuracy and utility of these estimates are then demonstrated in two different application domains. First, we show that the entropy component of our free energy estimates can useful in distinguishing native protein structures from decoys-structures with similar internal energy to that of the native structure, but otherwise incorrect. Our method is able to correctly identify the native fold from among a set of decoys with 87.5% accuracy over a total of 48 different immunoglobulin folds. The remaining 12.5% of native structures are ranked among the top four of all structures. Second, we show that our estimates of DeltaDeltaG upon mutation upon mutation for three different data sets have linear correlations of 0.63-0.70 with experimental measurements and statistically significant p-values. Together, these results suggest that GBP is an effective means for computing free energy in all-atom models of protein structures. GBP is also efficient, taking a few minutes to run on a typical sized protein, further suggesting that GBP may be an attractive alternative to more costly molecular dynamic simulations for some tasks.


Assuntos
Algoritmos , Modelos Teóricos , Conformação Proteica , Proteínas/química , Simulação por Computador , Cadeias de Markov , Modelos Moleculares , Dobramento de Proteína , Termodinâmica
14.
Bioinformatics ; 22(2): 172-80, 2006 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-16287932

RESUMO

MOTIVATION: Backbone resonance assignment is a critical bottleneck in studies of protein structure, dynamics and interactions by nuclear magnetic resonance (NMR) spectroscopy. A minimalist approach to assignment, which we call 'contact-based', seeks to dramatically reduce experimental time and expense by replacing the standard suite of through-bond experiments with the through-space (nuclear Overhauser enhancement spectroscopy, NOESY) experiment. In the contact-based approach, spectral data are represented in a graph with vertices for putative residues (of unknown relation to the primary sequence) and edges for hypothesized NOESY interactions, such that observed spectral peaks could be explained if the residues were 'close enough'. Due to experimental ambiguity, several incorrect edges can be hypothesized for each spectral peak. An assignment is derived by identifying consistent patterns of edges (e.g. for alpha-helices and beta-sheets) within a graph and by mapping the vertices to the primary sequence. The key algorithmic challenge is to be able to uncover these patterns even when they are obscured by significant noise. RESULTS: This paper develops, analyzes and applies a novel algorithm for the identification of polytopes representing consistent patterns of edges in a corrupted NOESY graph. Our randomized algorithm aggregates simplices into polytopes and fixes inconsistencies with simple local modifications, called rotations, that maintain most of the structure already uncovered. In characterizing the effects of experimental noise, we employ an NMR-specific random graph model in proving that our algorithm gives optimal performance in expected polynomial time, even when the input graph is significantly corrupted. We confirm this analysis in simulation studies with graphs corrupted by up to 500% noise. Finally, we demonstrate the practical application of the algorithm on several experimental beta-sheet datasets. Our approach is able to eliminate a large majority of noise edges and to uncover large consistent sets of interactions. AVAILABILITY: Our algorithm has been implemented in the platform-independent Python code. The software can be freely obtained for academic use by request from the authors.


Assuntos
Algoritmos , Espectroscopia de Ressonância Magnética/métodos , Reconhecimento Automatizado de Padrão/métodos , Proteínas/análise , Proteínas/química , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Sítios de Ligação , Modelos Químicos , Modelos Moleculares , Modelos Estatísticos , Dados de Sequência Molecular , Ligação Proteica , Estrutura Secundária de Proteína , Proteínas/classificação
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA