Pesquisa | Portal Regional da BVS

1.

Deep generative modeling of the human proteome reveals over a hundred novel genes involved in rare genetic disorders.

Orenbuch, Rose; Kollasch, Aaron W; Spinner, Hansen D; Shearer, Courtney A; Hopf, Thomas A; Franceschi, Dinko; Dias, Mafalda; Frazer, Jonathan; Marks, Debora S.

Res Sq ; 2024 Jan 04.

Artigo em Inglês | MEDLINE | ID: mdl-38260496

RESUMO

Identifying causal mutations accelerates genetic disease diagnosis, and therapeutic development. Missense variants present a bottleneck in genetic diagnoses as their effects are less straightforward than truncations or nonsense mutations. While computational prediction methods are increasingly successful at prediction for variants in known disease genes, they do not generalize well to other genes as the scores are not calibrated across the proteome1-6. To address this, we developed a deep generative model, popEVE, that combines evolutionary information with population sequence data7 and achieves state-of-the-art performance at ranking variants by severity to distinguish patients with severe developmental disorders8 from potentially healthy individuals9. popEVE identifies 442 genes in patients this developmental disorder cohort, including evidence of 123 novel genetic disorders, many without the need for gene-level enrichment and without overestimating the prevalence of pathogenic variants in the population. A majority of these variants are close to interacting partners in 3D complexes. Preliminary analyses on child exomes indicate that popEVE can identify candidate variants without the need for inheritance labels. By placing variants on a unified scale, our model offers a comprehensive perspective on the distribution of fitness effects across the entire proteome and the broader human population. popEVE provides compelling evidence for genetic diagnoses even in exceptionally rare single-patient disorders where conventional techniques relying on repeated observations may not be applicable.

2.

Deep generative modeling of the human proteome reveals over a hundred novel genes involved in rare genetic disorders.

Orenbuch, Rose; Kollasch, Aaron W; Spinner, Hansen D; Shearer, Courtney A; Hopf, Thomas A; Franceschi, Dinko; Dias, Mafalda; Frazer, Jonathan; Marks, Debora S.

medRxiv ; 2023 Nov 28.

Artigo em Inglês | MEDLINE | ID: mdl-38076790

RESUMO

Identifying causal mutations accelerates genetic disease diagnosis, and therapeutic development. Missense variants present a bottleneck in genetic diagnoses as their effects are less straightforward than truncations or nonsense mutations. While computational prediction methods are increasingly successful at prediction for variants in known disease genes, they do not generalize well to other genes as the scores are not calibrated across the proteome. To address this, we developed a deep generative model, popEVE, that combines evolutionary information with population sequence data and achieves state-of-the-art performance at ranking variants by severity to distinguish patients with severe developmental disorders from potentially healthy individuals. popEVE identifies 442 genes in a cohort of developmental disorder cases, including evidence of 119 novel genetic disorders without the need for gene-level enrichment and without overestimating the prevalence of pathogenic variants in the population. By placing variants on a unified scale, our model offers a comprehensive perspective on the distribution of fitness effects across the entire proteome and the broader human population. popEVE provides compelling evidence for genetic diagnoses even in exceptionally rare single-patient disorders where conventional techniques relying on repeated observations may not be applicable. Interactive web viewer and downloads available at pop.evemodel.org.

3.

The EVcouplings Python framework for coevolutionary sequence analysis.

Hopf, Thomas A; Green, Anna G; Schubert, Benjamin; Mersmann, Sophia; Schärfe, Charlotta P I; Ingraham, John B; Toth-Petroczy, Agnes; Brock, Kelly; Riesselman, Adam J; Palmedo, Perry; Kang, Chan; Sheridan, Robert; Draizen, Eli J; Dallago, Christian; Sander, Chris; Marks, Debora S.

Bioinformatics ; 35(9): 1582-1584, 2019 05 01.

Artigo em Inglês | MEDLINE | ID: mdl-30304492

RESUMO

SUMMARY: Coevolutionary sequence analysis has become a commonly used technique for de novo prediction of the structure and function of proteins, RNA, and protein complexes. We present the EVcouplings framework, a fully integrated open-source application and Python package for coevolutionary analysis. The framework enables generation of sequence alignments, calculation and evaluation of evolutionary couplings (ECs), and de novo prediction of structure and mutation effects. The combination of an easy to use, flexible command line interface and an underlying modular Python package makes the full power of coevolutionary analyses available to entry-level and advanced users. AVAILABILITY AND IMPLEMENTATION: https://github.com/debbiemarkslab/evcouplings.

Assuntos

Análise de Sequência , Software , Proteínas , RNA , Alinhamento de Sequência

4.

A Systematic p53 Mutation Library Links Differential Functional Impact to Cancer Mutation Pattern and Evolutionary Conservation.

Kotler, Eran; Shani, Odem; Goldfeld, Guy; Lotan-Pompan, Maya; Tarcic, Ohad; Gershoni, Anat; Hopf, Thomas A; Marks, Debora S; Oren, Moshe; Segal, Eran.

Mol Cell ; 71(5): 873, 2018 09 06.

Artigo em Inglês | MEDLINE | ID: mdl-30193102

5.

A Systematic p53 Mutation Library Links Differential Functional Impact to Cancer Mutation Pattern and Evolutionary Conservation.

Kotler, Eran; Shani, Odem; Goldfeld, Guy; Lotan-Pompan, Maya; Tarcic, Ohad; Gershoni, Anat; Hopf, Thomas A; Marks, Debora S; Oren, Moshe; Segal, Eran.

Mol Cell ; 71(1): 178-190.e8, 2018 07 05.

Artigo em Inglês | MEDLINE | ID: mdl-29979965

RESUMO

The TP53 gene is frequently mutated in human cancer. Research has focused predominantly on six major "hotspot" codons, which account for only â¼30% of cancer-associated p53 mutations. To comprehensively characterize the consequences of the p53 mutation spectrum, we created a synthetically designed library and measured the functional impact of â¼10,000 DNA-binding domain (DBD) p53 variants in human cells in culture and in vivo. Our results highlight the differential outcome of distinct p53 mutations in human patients and elucidate the selective pressure driving p53 conservation throughout evolution. Furthermore, while loss of anti-proliferative functionality largely correlates with the occurrence of cancer-associated p53 mutations, we observe that selective gain-of-function may further favor particular mutants in vivo. Finally, when combined with additional acquired p53 mutations, seemingly neutral TP53 SNPs may modulate phenotypic outcome and, presumably, tumor progression.

Assuntos

Evolução Molecular , Biblioteca Gênica , Mutação , Neoplasias/genética , Proteína Supressora de Tumor p53/genética , Animais , Células HEK293 , Humanos , Camundongos , Camundongos Nus , Neoplasias/metabolismo , Polimorfismo de Nucleotídeo Único , Domínios Proteicos , Proteína Supressora de Tumor p53/metabolismo

6.

Evolutionary couplings and sequence variation effect predict protein binding sites.

Schelling, Maria; Hopf, Thomas A; Rost, Burkhard.

Proteins ; 86(10): 1064-1074, 2018 10.

Artigo em Inglês | MEDLINE | ID: mdl-30020551

RESUMO

Binding small ligands such as ions or macromolecules such as DNA, RNA, and other proteins is one important aspect of the molecular function of proteins. Many binding sites remain without experimental annotations. Predicting binding sites on a per-residue level is challenging, but if 3D structures are known, information about coevolving residue pairs (evolutionary couplings) can predict catalytic residues through mutual information. Here, we predicted protein binding sites from evolutionary couplings derived from a global statistical model using maximum entropy. Additionally, we included information from sequence variation. A simple method using a weighted sum over eight scores substantially outperformed random (F1 = 19.3% ± 0.7% vs F1 = 2% for random). Training a neural network on these eight scores (along with predicted solvent accessibility and conservation in protein families) improved substantially (F1 = 26.2% ±0.8%). Although the machine learning was limited by the small data set and possibly wrong annotations of binding sites, the predicted binding sites formed spatial clusters in the protein. The source code of the binding site predictions is available through GitHub: https://github.com/Rostlab/bindPredict.

Assuntos

Evolução Molecular , Proteínas/química , Sítios de Ligação , Evolução Biológica , DNA/metabolismo , Proteínas de Ligação a DNA/química , Proteínas de Ligação a DNA/genética , Proteínas de Ligação a DNA/metabolismo , Bases de Dados de Proteínas , Entropia , Variação Genética , Humanos , Aprendizado de Máquina , Modelos Biológicos , Modelos Moleculares , Redes Neurais de Computação , Ligação Proteica , Proteínas/genética , Proteínas/metabolismo

7.

Structure of the peptidoglycan polymerase RodA resolved by evolutionary coupling analysis.

Sjodt, Megan; Brock, Kelly; Dobihal, Genevieve; Rohs, Patricia D A; Green, Anna G; Hopf, Thomas A; Meeske, Alexander J; Srisuknimit, Veerasak; Kahne, Daniel; Walker, Suzanne; Marks, Debora S; Bernhardt, Thomas G; Rudner, David Z; Kruse, Andrew C.

Nature ; 556(7699): 118-121, 2018 04 05.

Artigo em Inglês | MEDLINE | ID: mdl-29590088

RESUMO

The shape, elongation, division and sporulation (SEDS) proteins are a large family of ubiquitous and essential transmembrane enzymes with critical roles in bacterial cell wall biology. The exact function of SEDS proteins was for a long time poorly understood, but recent work has revealed that the prototypical SEDS family member RodA is a peptidoglycan polymerase-a role previously attributed exclusively to members of the penicillin-binding protein family. This discovery has made RodA and other SEDS proteins promising targets for the development of next-generation antibiotics. However, little is known regarding the molecular basis of SEDS activity, and no structural data are available for RodA or any homologue thereof. Here we report the crystal structure of Thermus thermophilus RodA at a resolution of 2.9 Å, determined using evolutionary covariance-based fold prediction to enable molecular replacement. The structure reveals a ten-pass transmembrane fold with large extracellular loops, one of which is partially disordered. The protein contains a highly conserved cavity in the transmembrane domain, reminiscent of ligand-binding sites in transmembrane receptors. Mutagenesis experiments in Bacillus subtilis and Escherichia coli show that perturbation of this cavity abolishes RodA function both in vitro and in vivo, indicating that this cavity is catalytically essential. These results provide a framework for understanding bacterial cell wall synthesis and SEDS protein function.

Assuntos

Cristalografia por Raios X/métodos , Nucleotidiltransferases/química , Peptidoglicano/metabolismo , Thermus thermophilus/enzimologia , Bacillus subtilis/genética , Biocatálise , Parede Celular/enzimologia , Parede Celular/metabolismo , Escherichia coli/genética , Modelos Moleculares , Nucleotidiltransferases/metabolismo , Domínios Proteicos , Dobramento de Proteína , Relação Estrutura-Atividade , Thermus thermophilus/genética

8.

Mutation effects predicted from sequence co-variation.

Hopf, Thomas A; Ingraham, John B; Poelwijk, Frank J; Schärfe, Charlotta P I; Springer, Michael; Sander, Chris; Marks, Debora S.

Nat Biotechnol ; 35(2): 128-135, 2017 02.

Artigo em Inglês | MEDLINE | ID: mdl-28092658

RESUMO

Many high-throughput experimental technologies have been developed to assess the effects of large numbers of mutations (variation) on phenotypes. However, designing functional assays for these methods is challenging, and systematic testing of all combinations is impossible, so robust methods to predict the effects of genetic variation are needed. Most prediction methods exploit evolutionary sequence conservation but do not consider the interdependencies of residues or bases. We present EVmutation, an unsupervised statistical method for predicting the effects of mutations that explicitly captures residue dependencies between positions. We validate EVmutation by comparing its predictions with outcomes of high-throughput mutagenesis experiments and measurements of human disease mutations and show that it outperforms methods that do not account for epistasis. EVmutation can be used to assess the quantitative effects of mutations in genes of any organism. We provide pre-computed predictions for â¼7,000 human proteins at http://evmutation.org/.

Assuntos

Sequência Conservada/genética , Análise Mutacional de DNA/métodos , Epistasia Genética/genética , Variação Genética/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Proteoma/genética , Sequência de Aminoácidos/genética , Evolução Molecular , Humanos , Dados de Sequência Molecular , Mutação/genética , Proteoma/química

9.

Structured States of Disordered Proteins from Genomic Sequences.

Toth-Petroczy, Agnes; Palmedo, Perry; Ingraham, John; Hopf, Thomas A; Berger, Bonnie; Sander, Chris; Marks, Debora S.

Cell ; 167(1): 158-170.e12, 2016 Sep 22.

Artigo em Inglês | MEDLINE | ID: mdl-27662088

RESUMO

Protein flexibility ranges from simple hinge movements to functional disorder. Around half of all human proteins contain apparently disordered regions with little 3D or functional information, and many of these proteins are associated with disease. Building on the evolutionary couplings approach previously successful in predicting 3D states of ordered proteins and RNA, we developed a method to predict the potential for ordered states for all apparently disordered proteins with sufficiently rich evolutionary information. The approach is highly accurate (79%) for residue interactions as tested in more than 60 known disordered regions captured in a bound or specific condition. Assessing the potential for structure of more than 1,000 apparently disordered regions of human proteins reveals a continuum of structural order with at least 50% with clear propensity for three- or two-dimensional states. Co-evolutionary constraints reveal hitherto unseen structures of functional importance in apparently disordered proteins.

Assuntos

Proteínas Intrinsicamente Desordenadas/química , Evolução Molecular Direcionada/métodos , Genômica , Humanos , Proteínas Intrinsicamente Desordenadas/genética , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína , Proteoma/química , Proteoma/genética

10.

Protein structure determination by combining sparse NMR data with evolutionary couplings.

Tang, Yuefeng; Huang, Yuanpeng Janet; Hopf, Thomas A; Sander, Chris; Marks, Debora S; Montelione, Gaetano T.

Nat Methods ; 12(8): 751-4, 2015 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-26121406

RESUMO

Accurate determination of protein structure by NMR spectroscopy is challenging for larger proteins, for which experimental data are often incomplete and ambiguous. Evolutionary sequence information together with advances in maximum entropy statistical methods provide a rich complementary source of structural constraints. We have developed a hybrid approach (evolutionary coupling-NMR spectroscopy; EC-NMR) combining sparse NMR data with evolutionary residue-residue couplings and demonstrate accurate structure determination for several proteins 6-41 kDa in size.

Assuntos

Biologia Computacional/métodos , Espectroscopia de Ressonância Magnética/métodos , Proteínas/química , Algoritmos , Cristalografia por Raios X , Evolução Molecular , Humanos , Hidrodinâmica , Imageamento Tridimensional , Modelos Estatísticos , Conformação Molecular , Conformação Proteica , Proteínas Proto-Oncogênicas/química , Proteínas Proto-Oncogênicas p21(ras) , Proteínas ras/química

11.

Amino acid coevolution reveals three-dimensional structure and functional domains of insect odorant receptors.

Hopf, Thomas A; Morinaga, Satoshi; Ihara, Sayoko; Touhara, Kazushige; Marks, Debora S; Benton, Richard.

Nat Commun ; 6: 6077, 2015 Jan 13.

Artigo em Inglês | MEDLINE | ID: mdl-25584517

RESUMO

Insect odorant receptors (ORs) comprise an enormous protein family that translates environmental chemical signals into neuronal electrical activity. These heptahelical receptors are proposed to function as ligand-gated ion channels and/or to act metabotropically as G protein-coupled receptors (GPCRs). Resolving their signalling mechanism has been hampered by the lack of tertiary structural information and primary sequence similarity to other proteins. We use amino acid evolutionary covariation across these ORs to define restraints on structural proximity of residue pairs, which permit de novo generation of three-dimensional models. The validity of our analysis is supported by the location of functionally important residues in highly constrained regions of the protein. Importantly, insect OR models exhibit a distinct transmembrane domain packing arrangement to that of canonical GPCRs, establishing the structural unrelatedness of these receptor families. The evolutionary couplings and models predict odour binding and ion conduction domains, and provide a template for rationale structure-activity dissection.

Assuntos

Aminoácidos/química , Evolução Molecular , Insetos/metabolismo , Receptores Odorantes/química , Aminoácidos/genética , Animais , Receptores Odorantes/genética , Xenopus

12.

Sequence co-evolution gives 3D contacts and structures of protein complexes.

Hopf, Thomas A; Schärfe, Charlotta P I; Rodrigues, João P G L M; Green, Anna G; Kohlbacher, Oliver; Sander, Chris; Bonvin, Alexandre M J J; Marks, Debora S.

Elife ; 32014 Sep 25.

Artigo em Inglês | MEDLINE | ID: mdl-25255213

RESUMO

Protein-protein interactions are fundamental to many biological processes. Experimental screens have identified tens of thousands of interactions, and structural biology has provided detailed functional insight for select 3D protein complexes. An alternative rich source of information about protein interactions is the evolutionary sequence record. Building on earlier work, we show that analysis of correlated evolutionary sequence changes across proteins identifies residues that are close in space with sufficient accuracy to determine the three-dimensional structure of the protein complexes. We evaluate prediction performance in blinded tests on 76 complexes of known 3D structure, predict protein-protein contacts in 32 complexes of unknown structure, and demonstrate how evolutionary couplings can be used to distinguish between interacting and non-interacting protein pairs in a large complex. With the current growth of sequences, we expect that the method can be generalized to genome-wide elucidation of protein-protein interaction networks and used for interaction predictions at residue resolution.

Assuntos

Proteínas de Escherichia coli/química , Escherichia coli/genética , Genoma Bacteriano , Mapeamento de Interação de Proteínas , Bases de Dados de Proteínas , Escherichia coli/metabolismo , Proteínas de Escherichia coli/genética , Proteínas de Escherichia coli/metabolismo , Evolução Molecular , Expressão Gênica , Redes Reguladoras de Genes , Modelos Moleculares , Ligação Proteica , Conformação Proteica

13.

FreeContact: fast and free software for protein contact prediction from residue co-evolution.

Kaján, László; Hopf, Thomas A; Kalas, Matús; Marks, Debora S; Rost, Burkhard.

BMC Bioinformatics ; 15: 85, 2014 Mar 26.

Artigo em Inglês | MEDLINE | ID: mdl-24669753

RESUMO

BACKGROUND: 20 years of improved technology and growing sequences now renders residue-residue contact constraints in large protein families through correlated mutations accurate enough to drive de novo predictions of protein three-dimensional structure. The method EVfold broke new ground using mean-field Direct Coupling Analysis (EVfold-mfDCA); the method PSICOV applied a related concept by estimating a sparse inverse covariance matrix. Both methods (EVfold-mfDCA and PSICOV) are publicly available, but both require too much CPU time for interactive applications. On top, EVfold-mfDCA depends on proprietary software. RESULTS: Here, we present FreeContact, a fast, open source implementation of EVfold-mfDCA and PSICOV. On a test set of 140 proteins, FreeContact was almost eight times faster than PSICOV without decreasing prediction performance. The EVfold-mfDCA implementation of FreeContact was over 220 times faster than PSICOV with negligible performance decrease. EVfold-mfDCA was unavailable for testing due to its dependency on proprietary software. FreeContact is implemented as the free C++ library "libfreecontact", complete with command line tool "freecontact", as well as Perl and Python modules. All components are available as Debian packages. FreeContact supports the BioXSD format for interoperability. CONCLUSIONS: FreeContact provides the opportunity to compute reliable contact predictions in any environment (desktop or cloud).

Assuntos

Biologia Computacional/métodos , Proteínas/química , Análise de Sequência de Proteína/métodos , Software , Algoritmos , Conformação Proteica , Proteínas/genética

14.

Homology-based inference sets the bar high for protein function prediction.

Hamp, Tobias; Kassner, Rebecca; Seemayer, Stefan; Vicedo, Esmeralda; Schaefer, Christian; Achten, Dominik; Auer, Florian; Boehm, Ariane; Braun, Tatjana; Hecht, Maximilian; Heron, Mark; Hönigschmid, Peter; Hopf, Thomas A; Kaufmann, Stefanie; Kiening, Michael; Krompass, Denis; Landerer, Cedric; Mahlich, Yannick; Roos, Manfred; Rost, Burkhard.

BMC Bioinformatics ; 14 Suppl 3: S7, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-23514582

RESUMO

BACKGROUND: Any method that de novo predicts protein function should do better than random. More challenging, it also ought to outperform simple homology-based inference. METHODS: Here, we describe a few methods that predict protein function exclusively through homology. Together, they set the bar or lower limit for future improvements. RESULTS AND CONCLUSIONS: During the development of these methods, we faced two surprises. Firstly, our most successful implementation for the baseline ranked very high at CAFA1. In fact, our best combination of homology-based methods fared only slightly worse than the top-of-the-line prediction method from the Jones group. Secondly, although the concept of homology-based inference is simple, this work revealed that the precise details of the implementation are crucial: not only did the methods span from top to bottom performers at CAFA, but also the reasons for these differences were unexpected. In this work, we also propose a new rigorous measure to compare predicted and experimental annotations. It puts more emphasis on the details of protein function than the other measures employed by CAFA and may best reflect the expectations of users. Clearly, the definition of proper goals remains one major objective for CAFA.

Assuntos

Proteínas/fisiologia , Homologia de Sequência de Aminoácidos , Algoritmos , Proteínas/genética

15.

Protein structure prediction from sequence variation.

Marks, Debora S; Hopf, Thomas A; Sander, Chris.

Nat Biotechnol ; 30(11): 1072-80, 2012 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-23138306

RESUMO

Genomic sequences contain rich evolutionary information about functional constraints on macromolecules such as proteins. This information can be efficiently mined to detect evolutionary couplings between residues in proteins and address the long-standing challenge to compute protein three-dimensional structures from amino acid sequences. Substantial progress has recently been made on this problem owing to the explosive growth in available sequences and the application of global statistical methods. In addition to three-dimensional structure, the improved understanding of covariation may help identify functional residues involved in ligand binding, protein-complex formation and conformational changes. We expect computation of covariation patterns to complement experimental structural biology in elucidating the full spectrum of protein structures, their functional interactions and evolutionary dynamics.

Assuntos

Variação Genética/genética , Modelos Químicos , Modelos Genéticos , Modelos Moleculares , Proteínas/química , Proteínas/genética , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Simulação por Computador , Dados de Sequência Molecular , Conformação Proteica

16.

Three-dimensional structures of membrane proteins from genomic sequencing.

Hopf, Thomas A; Colwell, Lucy J; Sheridan, Robert; Rost, Burkhard; Sander, Chris; Marks, Debora S.

Cell ; 149(7): 1607-21, 2012 Jun 22.

Artigo em Inglês | MEDLINE | ID: mdl-22579045

RESUMO

We show that amino acid covariation in proteins, extracted from the evolutionary sequence record, can be used to fold transmembrane proteins. We use this technique to predict previously unknown 3D structures for 11 transmembrane proteins (with up to 14 helices) from their sequences alone. The prediction method (EVfold_membrane) applies a maximum entropy approach to infer evolutionary covariation in pairs of sequence positions within a protein family and then generates all-atom models with the derived pairwise distance constraints. We benchmark the approach with blinded de novo computation of known transmembrane protein structures from 23 families, demonstrating unprecedented accuracy of the method for large transmembrane proteins. We show how the method can predict oligomerization, functional sites, and conformational changes in transmembrane proteins. With the rapid rise in large-scale sequencing, more accurate and more comprehensive information on evolutionary constraints can be decoded from genetic variation, greatly expanding the repertoire of transmembrane proteins amenable to modeling by this method.

Assuntos

Algoritmos , Proteínas de Membrana/química , Proteínas de Membrana/genética , Sequência de Aminoácidos , Animais , Sequência Conservada , Evolução Molecular , Humanos , Modelos Moleculares , Conformação Proteica , Estrutura Secundária de Proteína , Alinhamento de Sequência , Homologia Estrutural de Proteína

17.

Protein 3D structure computed from evolutionary sequence variation.

Marks, Debora S; Colwell, Lucy J; Sheridan, Robert; Hopf, Thomas A; Pagnani, Andrea; Zecchina, Riccardo; Sander, Chris.

PLoS One ; 6(12): e28766, 2011.

Artigo em Inglês | MEDLINE | ID: mdl-22163331

RESUMO

The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing.In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy.We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues, including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7-4.8 Å C(α)-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org). This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of protein structures, new strategies in protein and drug design, and the identification of functional genetic variants in normal and disease genomes.

Assuntos

Proteínas/química , Animais , Domínio Catalítico , Biologia Computacional/métodos , Desenho de Fármacos , Entropia , Evolução Molecular , Variação Genética , Genoma , Humanos , Imageamento Tridimensional , Modelos Estatísticos , Conformação Proteica , Estrutura Terciária de Proteína , Reprodutibilidade dos Testes , Rodopsina/química , Tripsina/química

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA