Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
1.
Cell ; 167(1): 158-170.e12, 2016 Sep 22.
Artículo en Inglés | MEDLINE | ID: mdl-27662088

RESUMEN

Protein flexibility ranges from simple hinge movements to functional disorder. Around half of all human proteins contain apparently disordered regions with little 3D or functional information, and many of these proteins are associated with disease. Building on the evolutionary couplings approach previously successful in predicting 3D states of ordered proteins and RNA, we developed a method to predict the potential for ordered states for all apparently disordered proteins with sufficiently rich evolutionary information. The approach is highly accurate (79%) for residue interactions as tested in more than 60 known disordered regions captured in a bound or specific condition. Assessing the potential for structure of more than 1,000 apparently disordered regions of human proteins reveals a continuum of structural order with at least 50% with clear propensity for three- or two-dimensional states. Co-evolutionary constraints reveal hitherto unseen structures of functional importance in apparently disordered proteins.


Asunto(s)
Proteínas Intrínsecamente Desordenadas/química , Evolución Molecular Dirigida/métodos , Genómica , Humanos , Proteínas Intrínsecamente Desordenadas/genética , Estructura Secundaria de Proteína , Estructura Terciaria de Proteína , Proteoma/química , Proteoma/genética
2.
Cell ; 149(7): 1607-21, 2012 Jun 22.
Artículo en Inglés | MEDLINE | ID: mdl-22579045

RESUMEN

We show that amino acid covariation in proteins, extracted from the evolutionary sequence record, can be used to fold transmembrane proteins. We use this technique to predict previously unknown 3D structures for 11 transmembrane proteins (with up to 14 helices) from their sequences alone. The prediction method (EVfold_membrane) applies a maximum entropy approach to infer evolutionary covariation in pairs of sequence positions within a protein family and then generates all-atom models with the derived pairwise distance constraints. We benchmark the approach with blinded de novo computation of known transmembrane protein structures from 23 families, demonstrating unprecedented accuracy of the method for large transmembrane proteins. We show how the method can predict oligomerization, functional sites, and conformational changes in transmembrane proteins. With the rapid rise in large-scale sequencing, more accurate and more comprehensive information on evolutionary constraints can be decoded from genetic variation, greatly expanding the repertoire of transmembrane proteins amenable to modeling by this method.


Asunto(s)
Algoritmos , Proteínas de la Membrana/química , Proteínas de la Membrana/genética , Secuencia de Aminoácidos , Animales , Secuencia Conservada , Evolución Molecular , Humanos , Modelos Moleculares , Conformación Proteica , Estructura Secundaria de Proteína , Alineación de Secuencia , Homología Estructural de Proteína
3.
Mol Cell ; 71(1): 178-190.e8, 2018 07 05.
Artículo en Inglés | MEDLINE | ID: mdl-29979965

RESUMEN

The TP53 gene is frequently mutated in human cancer. Research has focused predominantly on six major "hotspot" codons, which account for only ∼30% of cancer-associated p53 mutations. To comprehensively characterize the consequences of the p53 mutation spectrum, we created a synthetically designed library and measured the functional impact of ∼10,000 DNA-binding domain (DBD) p53 variants in human cells in culture and in vivo. Our results highlight the differential outcome of distinct p53 mutations in human patients and elucidate the selective pressure driving p53 conservation throughout evolution. Furthermore, while loss of anti-proliferative functionality largely correlates with the occurrence of cancer-associated p53 mutations, we observe that selective gain-of-function may further favor particular mutants in vivo. Finally, when combined with additional acquired p53 mutations, seemingly neutral TP53 SNPs may modulate phenotypic outcome and, presumably, tumor progression.


Asunto(s)
Evolución Molecular , Biblioteca de Genes , Mutación , Neoplasias/genética , Proteína p53 Supresora de Tumor/genética , Animales , Células HEK293 , Humanos , Ratones , Ratones Desnudos , Neoplasias/metabolismo , Polimorfismo de Nucleótido Simple , Dominios Proteicos , Proteína p53 Supresora de Tumor/metabolismo
4.
Nature ; 556(7699): 118-121, 2018 04 05.
Artículo en Inglés | MEDLINE | ID: mdl-29590088

RESUMEN

The shape, elongation, division and sporulation (SEDS) proteins are a large family of ubiquitous and essential transmembrane enzymes with critical roles in bacterial cell wall biology. The exact function of SEDS proteins was for a long time poorly understood, but recent work has revealed that the prototypical SEDS family member RodA is a peptidoglycan polymerase-a role previously attributed exclusively to members of the penicillin-binding protein family. This discovery has made RodA and other SEDS proteins promising targets for the development of next-generation antibiotics. However, little is known regarding the molecular basis of SEDS activity, and no structural data are available for RodA or any homologue thereof. Here we report the crystal structure of Thermus thermophilus RodA at a resolution of 2.9 Å, determined using evolutionary covariance-based fold prediction to enable molecular replacement. The structure reveals a ten-pass transmembrane fold with large extracellular loops, one of which is partially disordered. The protein contains a highly conserved cavity in the transmembrane domain, reminiscent of ligand-binding sites in transmembrane receptors. Mutagenesis experiments in Bacillus subtilis and Escherichia coli show that perturbation of this cavity abolishes RodA function both in vitro and in vivo, indicating that this cavity is catalytically essential. These results provide a framework for understanding bacterial cell wall synthesis and SEDS protein function.


Asunto(s)
Cristalografía por Rayos X/métodos , Nucleotidiltransferasas/química , Peptidoglicano/metabolismo , Thermus thermophilus/enzimología , Bacillus subtilis/genética , Biocatálisis , Pared Celular/enzimología , Pared Celular/metabolismo , Escherichia coli/genética , Modelos Moleculares , Nucleotidiltransferasas/metabolismo , Dominios Proteicos , Pliegue de Proteína , Relación Estructura-Actividad , Thermus thermophilus/genética
6.
Bioinformatics ; 35(9): 1582-1584, 2019 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-30304492

RESUMEN

SUMMARY: Coevolutionary sequence analysis has become a commonly used technique for de novo prediction of the structure and function of proteins, RNA, and protein complexes. We present the EVcouplings framework, a fully integrated open-source application and Python package for coevolutionary analysis. The framework enables generation of sequence alignments, calculation and evaluation of evolutionary couplings (ECs), and de novo prediction of structure and mutation effects. The combination of an easy to use, flexible command line interface and an underlying modular Python package makes the full power of coevolutionary analyses available to entry-level and advanced users. AVAILABILITY AND IMPLEMENTATION: https://github.com/debbiemarkslab/evcouplings.


Asunto(s)
Análisis de Secuencia , Programas Informáticos , Proteínas , ARN , Alineación de Secuencia
7.
Proteins ; 86(10): 1064-1074, 2018 10.
Artículo en Inglés | MEDLINE | ID: mdl-30020551

RESUMEN

Binding small ligands such as ions or macromolecules such as DNA, RNA, and other proteins is one important aspect of the molecular function of proteins. Many binding sites remain without experimental annotations. Predicting binding sites on a per-residue level is challenging, but if 3D structures are known, information about coevolving residue pairs (evolutionary couplings) can predict catalytic residues through mutual information. Here, we predicted protein binding sites from evolutionary couplings derived from a global statistical model using maximum entropy. Additionally, we included information from sequence variation. A simple method using a weighted sum over eight scores substantially outperformed random (F1 = 19.3% ± 0.7% vs F1 = 2% for random). Training a neural network on these eight scores (along with predicted solvent accessibility and conservation in protein families) improved substantially (F1 = 26.2% ±0.8%). Although the machine learning was limited by the small data set and possibly wrong annotations of binding sites, the predicted binding sites formed spatial clusters in the protein. The source code of the binding site predictions is available through GitHub: https://github.com/Rostlab/bindPredict.


Asunto(s)
Evolución Molecular , Proteínas/química , Sitios de Unión , Evolución Biológica , ADN/metabolismo , Proteínas de Unión al ADN/química , Proteínas de Unión al ADN/genética , Proteínas de Unión al ADN/metabolismo , Bases de Datos de Proteínas , Entropía , Variación Genética , Humanos , Aprendizaje Automático , Modelos Biológicos , Modelos Moleculares , Redes Neurales de la Computación , Unión Proteica , Proteínas/genética , Proteínas/metabolismo
8.
Nat Methods ; 12(8): 751-4, 2015 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-26121406

RESUMEN

Accurate determination of protein structure by NMR spectroscopy is challenging for larger proteins, for which experimental data are often incomplete and ambiguous. Evolutionary sequence information together with advances in maximum entropy statistical methods provide a rich complementary source of structural constraints. We have developed a hybrid approach (evolutionary coupling-NMR spectroscopy; EC-NMR) combining sparse NMR data with evolutionary residue-residue couplings and demonstrate accurate structure determination for several proteins 6-41 kDa in size.


Asunto(s)
Biología Computacional/métodos , Espectroscopía de Resonancia Magnética/métodos , Proteínas/química , Algoritmos , Cristalografía por Rayos X , Evolución Molecular , Humanos , Hidrodinámica , Imagenología Tridimensional , Modelos Estadísticos , Conformación Molecular , Conformación Proteica , Proteínas Proto-Oncogénicas/química , Proteínas Proto-Oncogénicas p21(ras) , Proteínas ras/química
9.
BMC Bioinformatics ; 15: 85, 2014 Mar 26.
Artículo en Inglés | MEDLINE | ID: mdl-24669753

RESUMEN

BACKGROUND: 20 years of improved technology and growing sequences now renders residue-residue contact constraints in large protein families through correlated mutations accurate enough to drive de novo predictions of protein three-dimensional structure. The method EVfold broke new ground using mean-field Direct Coupling Analysis (EVfold-mfDCA); the method PSICOV applied a related concept by estimating a sparse inverse covariance matrix. Both methods (EVfold-mfDCA and PSICOV) are publicly available, but both require too much CPU time for interactive applications. On top, EVfold-mfDCA depends on proprietary software. RESULTS: Here, we present FreeContact, a fast, open source implementation of EVfold-mfDCA and PSICOV. On a test set of 140 proteins, FreeContact was almost eight times faster than PSICOV without decreasing prediction performance. The EVfold-mfDCA implementation of FreeContact was over 220 times faster than PSICOV with negligible performance decrease. EVfold-mfDCA was unavailable for testing due to its dependency on proprietary software. FreeContact is implemented as the free C++ library "libfreecontact", complete with command line tool "freecontact", as well as Perl and Python modules. All components are available as Debian packages. FreeContact supports the BioXSD format for interoperability. CONCLUSIONS: FreeContact provides the opportunity to compute reliable contact predictions in any environment (desktop or cloud).


Asunto(s)
Biología Computacional/métodos , Proteínas/química , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Algoritmos , Conformación Proteica , Proteínas/genética
10.
Res Sq ; 2024 Jan 04.
Artículo en Inglés | MEDLINE | ID: mdl-38260496

RESUMEN

Identifying causal mutations accelerates genetic disease diagnosis, and therapeutic development. Missense variants present a bottleneck in genetic diagnoses as their effects are less straightforward than truncations or nonsense mutations. While computational prediction methods are increasingly successful at prediction for variants in known disease genes, they do not generalize well to other genes as the scores are not calibrated across the proteome1-6. To address this, we developed a deep generative model, popEVE, that combines evolutionary information with population sequence data7 and achieves state-of-the-art performance at ranking variants by severity to distinguish patients with severe developmental disorders8 from potentially healthy individuals9. popEVE identifies 442 genes in patients this developmental disorder cohort, including evidence of 123 novel genetic disorders, many without the need for gene-level enrichment and without overestimating the prevalence of pathogenic variants in the population. A majority of these variants are close to interacting partners in 3D complexes. Preliminary analyses on child exomes indicate that popEVE can identify candidate variants without the need for inheritance labels. By placing variants on a unified scale, our model offers a comprehensive perspective on the distribution of fitness effects across the entire proteome and the broader human population. popEVE provides compelling evidence for genetic diagnoses even in exceptionally rare single-patient disorders where conventional techniques relying on repeated observations may not be applicable.

11.
BMC Bioinformatics ; 14 Suppl 3: S7, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23514582

RESUMEN

BACKGROUND: Any method that de novo predicts protein function should do better than random. More challenging, it also ought to outperform simple homology-based inference. METHODS: Here, we describe a few methods that predict protein function exclusively through homology. Together, they set the bar or lower limit for future improvements. RESULTS AND CONCLUSIONS: During the development of these methods, we faced two surprises. Firstly, our most successful implementation for the baseline ranked very high at CAFA1. In fact, our best combination of homology-based methods fared only slightly worse than the top-of-the-line prediction method from the Jones group. Secondly, although the concept of homology-based inference is simple, this work revealed that the precise details of the implementation are crucial: not only did the methods span from top to bottom performers at CAFA, but also the reasons for these differences were unexpected. In this work, we also propose a new rigorous measure to compare predicted and experimental annotations. It puts more emphasis on the details of protein function than the other measures employed by CAFA and may best reflect the expectations of users. Clearly, the definition of proper goals remains one major objective for CAFA.


Asunto(s)
Proteínas/fisiología , Homología de Secuencia de Aminoácido , Algoritmos , Proteínas/genética
12.
medRxiv ; 2023 Nov 28.
Artículo en Inglés | MEDLINE | ID: mdl-38076790

RESUMEN

Identifying causal mutations accelerates genetic disease diagnosis, and therapeutic development. Missense variants present a bottleneck in genetic diagnoses as their effects are less straightforward than truncations or nonsense mutations. While computational prediction methods are increasingly successful at prediction for variants in known disease genes, they do not generalize well to other genes as the scores are not calibrated across the proteome. To address this, we developed a deep generative model, popEVE, that combines evolutionary information with population sequence data and achieves state-of-the-art performance at ranking variants by severity to distinguish patients with severe developmental disorders from potentially healthy individuals. popEVE identifies 442 genes in a cohort of developmental disorder cases, including evidence of 119 novel genetic disorders without the need for gene-level enrichment and without overestimating the prevalence of pathogenic variants in the population. By placing variants on a unified scale, our model offers a comprehensive perspective on the distribution of fitness effects across the entire proteome and the broader human population. popEVE provides compelling evidence for genetic diagnoses even in exceptionally rare single-patient disorders where conventional techniques relying on repeated observations may not be applicable. Interactive web viewer and downloads available at pop.evemodel.org.

13.
Nat Biotechnol ; 35(2): 128-135, 2017 02.
Artículo en Inglés | MEDLINE | ID: mdl-28092658

RESUMEN

Many high-throughput experimental technologies have been developed to assess the effects of large numbers of mutations (variation) on phenotypes. However, designing functional assays for these methods is challenging, and systematic testing of all combinations is impossible, so robust methods to predict the effects of genetic variation are needed. Most prediction methods exploit evolutionary sequence conservation but do not consider the interdependencies of residues or bases. We present EVmutation, an unsupervised statistical method for predicting the effects of mutations that explicitly captures residue dependencies between positions. We validate EVmutation by comparing its predictions with outcomes of high-throughput mutagenesis experiments and measurements of human disease mutations and show that it outperforms methods that do not account for epistasis. EVmutation can be used to assess the quantitative effects of mutations in genes of any organism. We provide pre-computed predictions for ∼7,000 human proteins at http://evmutation.org/.


Asunto(s)
Secuencia Conservada/genética , Análisis Mutacional de ADN/métodos , Epistasis Genética/genética , Variación Genética/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Proteoma/genética , Secuencia de Aminoácidos/genética , Evolución Molecular , Humanos , Datos de Secuencia Molecular , Mutación/genética , Proteoma/química
14.
Nat Commun ; 6: 6077, 2015 Jan 13.
Artículo en Inglés | MEDLINE | ID: mdl-25584517

RESUMEN

Insect odorant receptors (ORs) comprise an enormous protein family that translates environmental chemical signals into neuronal electrical activity. These heptahelical receptors are proposed to function as ligand-gated ion channels and/or to act metabotropically as G protein-coupled receptors (GPCRs). Resolving their signalling mechanism has been hampered by the lack of tertiary structural information and primary sequence similarity to other proteins. We use amino acid evolutionary covariation across these ORs to define restraints on structural proximity of residue pairs, which permit de novo generation of three-dimensional models. The validity of our analysis is supported by the location of functionally important residues in highly constrained regions of the protein. Importantly, insect OR models exhibit a distinct transmembrane domain packing arrangement to that of canonical GPCRs, establishing the structural unrelatedness of these receptor families. The evolutionary couplings and models predict odour binding and ion conduction domains, and provide a template for rationale structure-activity dissection.


Asunto(s)
Aminoácidos/química , Evolución Molecular , Insectos/metabolismo , Receptores Odorantes/química , Aminoácidos/genética , Animales , Receptores Odorantes/genética , Xenopus
15.
Elife ; 32014 Sep 25.
Artículo en Inglés | MEDLINE | ID: mdl-25255213

RESUMEN

Protein-protein interactions are fundamental to many biological processes. Experimental screens have identified tens of thousands of interactions, and structural biology has provided detailed functional insight for select 3D protein complexes. An alternative rich source of information about protein interactions is the evolutionary sequence record. Building on earlier work, we show that analysis of correlated evolutionary sequence changes across proteins identifies residues that are close in space with sufficient accuracy to determine the three-dimensional structure of the protein complexes. We evaluate prediction performance in blinded tests on 76 complexes of known 3D structure, predict protein-protein contacts in 32 complexes of unknown structure, and demonstrate how evolutionary couplings can be used to distinguish between interacting and non-interacting protein pairs in a large complex. With the current growth of sequences, we expect that the method can be generalized to genome-wide elucidation of protein-protein interaction networks and used for interaction predictions at residue resolution.


Asunto(s)
Proteínas de Escherichia coli/química , Escherichia coli/genética , Genoma Bacteriano , Mapeo de Interacción de Proteínas , Bases de Datos de Proteínas , Escherichia coli/metabolismo , Proteínas de Escherichia coli/genética , Proteínas de Escherichia coli/metabolismo , Evolución Molecular , Expresión Génica , Redes Reguladoras de Genes , Modelos Moleculares , Unión Proteica , Conformación Proteica
16.
Nat Biotechnol ; 30(11): 1072-80, 2012 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-23138306

RESUMEN

Genomic sequences contain rich evolutionary information about functional constraints on macromolecules such as proteins. This information can be efficiently mined to detect evolutionary couplings between residues in proteins and address the long-standing challenge to compute protein three-dimensional structures from amino acid sequences. Substantial progress has recently been made on this problem owing to the explosive growth in available sequences and the application of global statistical methods. In addition to three-dimensional structure, the improved understanding of covariation may help identify functional residues involved in ligand binding, protein-complex formation and conformational changes. We expect computation of covariation patterns to complement experimental structural biology in elucidating the full spectrum of protein structures, their functional interactions and evolutionary dynamics.


Asunto(s)
Variación Genética/genética , Modelos Químicos , Modelos Genéticos , Modelos Moleculares , Proteínas/química , Proteínas/genética , Análisis de Secuencia de Proteína/métodos , Secuencia de Aminoácidos , Simulación por Computador , Datos de Secuencia Molecular , Conformación Proteica
17.
PLoS One ; 6(12): e28766, 2011.
Artículo en Inglés | MEDLINE | ID: mdl-22163331

RESUMEN

The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing.In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy.We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues, including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7-4.8 Å C(α)-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org). This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of protein structures, new strategies in protein and drug design, and the identification of functional genetic variants in normal and disease genomes.


Asunto(s)
Proteínas/química , Animales , Dominio Catalítico , Biología Computacional/métodos , Diseño de Fármacos , Entropía , Evolución Molecular , Variación Genética , Genoma , Humanos , Imagenología Tridimensional , Modelos Estadísticos , Conformación Proteica , Estructura Terciaria de Proteína , Reproducibilidad de los Resultados , Rodopsina/química , Tripsina/química
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA