RESUMO
Over the years, hundreds of enzyme reaction mechanisms have been studied using experimental and simulation methods. This rich literature on biological catalysis is now ripe for use as the foundation of new knowledge-based approaches to investigate enzyme mechanisms. Here, we present a tool able to automatically infer mechanistic paths for a given three-dimensional active site and enzyme reaction, based on a set of catalytic rules compiled from the Mechanism and Catalytic Site Atlas, a database of enzyme mechanisms. EzMechanism (pronounced as 'Easy' Mechanism) is available to everyone through a web user interface. When studying a mechanism, EzMechanism facilitates and improves the generation of hypotheses, by making sure that relevant information is considered, as derived from the literature on both related and unrelated enzymes. We validated EzMechanism on a set of 62 enzymes and have identified paths for further improvement, including the need for additional and more generic catalytic rules.
RESUMO
The surprising decision by Novo Nordisk Foundation (NNF) to discontinue funding for the Center for Protein Research in Copenhagen should prompt discussions about public and private commitment to support basic research.
RESUMO
Enzymes have been shaped by evolution over billions of years to catalyse the chemical reactions that support life on earth. Dispersed in the literature, or organised in online databases, knowledge about enzymes can be structured in distinct dimensions, either related to their quality as biological macromolecules, such as their sequence and structure, or related to their chemical functions, such as the catalytic site, kinetics, mechanism, and overall reaction. The evolution of enzymes can only be understood when each of these dimensions is considered. In addition, many of the properties of enzymes only make sense in the light of evolution. We start this review by outlining the main paradigms of enzyme evolution, including gene duplication and divergence, convergent evolution, and evolution by recombination of domains. In the second part, we overview the current collective knowledge about enzymes, as organised by different types of data and collected in several databases. We also highlight some increasingly powerful computational tools that can be used to close gaps in understanding, in particular for types of data that require laborious experimental protocols. We believe that recent advances in protein structure prediction will be a powerful catalyst for the prediction of binding, mechanism, and ultimately, chemical reactions. A comprehensive mapping of enzyme function and evolution may be attainable in the near future.
Assuntos
Biologia Computacional , Enzimas , Proteínas , Catálise , Domínio Catalítico , Enzimas/genética , Enzimas/metabolismo , Evolução Molecular , Proteínas/genéticaRESUMO
Proteins are essential macromolecules for the maintenance of living systems. Many of them perform their function by interacting with other molecules in regions called binding sites. The identification and characterization of these regions are of fundamental importance to determine protein function, being a fundamental step in processes such as drug design and discovery. However, identifying such binding regions is not trivial due to the drawbacks of experimental methods, which are costly and time-consuming. Here we propose GRaSP-web, a web server that uses GRaSP (Graph-based Residue neighborhood Strategy to Predict binding sites), a residue-centric method based on graphs that uses machine learning to predict putative ligand binding site residues. The method outperformed 6 state-of-the-art residue-centric methods (MCC of 0.61). Also, GRaSP-web is scalable as it takes 10-20 seconds to predict binding sites for a protein complex (the state-of-the-art residue-centric method takes 2-5h on the average). It proved to be consistent in predicting binding sites for bound/unbound structures (MCC 0.61 for both) and for a large dataset of multi-chain proteins (4500 entries, MCC 0.61). GRaSPWeb is freely available at https://grasp.ufv.br.
Assuntos
Aprendizado de Máquina , Proteínas , Proteínas/química , Sítios de Ligação , Ligantes , Domínios Proteicos , Ligação ProteicaRESUMO
Reduced activity of insulin/insulin-like growth factor signaling (IIS) increases healthy lifespan among diverse animal species. Downstream of IIS, multiple evolutionarily conserved transcription factors (TFs) are required; however, distinct TFs are likely responsible for these effects in different tissues. Here we have asked which TFs can extend healthy lifespan within distinct cell types of the adult nervous system in Drosophila Starting from published single-cell transcriptomic data, we report that forkhead (FKH) is endogenously expressed in neurons, whereas forkhead-box-O (FOXO) is expressed in glial cells. Accordingly, we find that neuronal FKH and glial FOXO exert independent prolongevity effects. We have further explored the role of neuronal FKH in a model of Alzheimer's disease-associated neuronal dysfunction, where we find that increased neuronal FKH preserves behavioral function and reduces ubiquitinated protein aggregation. Finally, using transcriptomic profiling, we identify Atg17, a member of the Atg1 autophagy initiation family, as one FKH-dependent target whose neuronal overexpression is sufficient to extend healthy lifespan. Taken together, our results underscore the importance of cell type-specific mapping of TF activity to preserve healthy function with age.
Assuntos
Proteínas de Drosophila/metabolismo , Drosophila melanogaster/crescimento & desenvolvimento , Fatores de Transcrição Forkhead/metabolismo , Regulação da Expressão Gênica no Desenvolvimento , Longevidade , Neuroglia/metabolismo , Neurônios/metabolismo , Animais , Proteínas de Drosophila/genética , Drosophila melanogaster/genética , Drosophila melanogaster/metabolismo , Feminino , Fatores de Transcrição Forkhead/genética , Perfilação da Expressão Gênica , Masculino , Neuroglia/citologia , Neurônios/citologia , TranscriptomaRESUMO
The catalytic residues of an enzyme comprise the amino acids located in the active center responsible for accelerating the enzyme-catalyzed reaction. These residues lower the activation energy of reactions by performing several catalytic functions. Decades of enzymology research has established general themes regarding the roles of specific residues in these catalytic reactions, but it has been more difficult to explore these roles in a more systematic way. Here, we review the data on the catalytic residues of 648 enzymes, as annotated in the Mechanism and Catalytic Site Atlas (M-CSA), and compare our results with those in previous studies. We structured this analysis around three key properties of the catalytic residues: amino acid type, catalytic function, and sequence conservation in homologous proteins. As expected, we observed that catalysis is mostly accomplished by a small set of residues performing a limited number of catalytic functions. Catalytic residues are typically highly conserved, but to a smaller degree in homologues that perform different reactions or are nonenzymes (pseudoenzymes). Cross-analysis yielded further insights revealing which residues perform particular functions and how often. We obtained more detailed specificity rules for certain functions by identifying the chemical group upon which the residue acts. Finally, we show the mutation tolerance of the catalytic residues based on their roles. The characterization of the catalytic residues, their functions, and conservation, as presented here, is key to understanding the impact of mutations in evolution, disease, and enzyme design. The tools developed for this analysis are available at the M-CSA website and allow for user specific analysis of the same data.
Assuntos
Aminoácidos/química , Domínio Catalítico , Enzimas/química , Sequência de Aminoácidos , Aminoácidos/metabolismo , Animais , Biocatálise , Sequência Conservada , Bases de Dados de Proteínas , Enzimas/metabolismo , HumanosRESUMO
MOTIVATION: The discovery of protein-ligand-binding sites is a major step for elucidating protein function and for investigating new functional roles. Detecting protein-ligand-binding sites experimentally is time-consuming and expensive. Thus, a variety of in silico methods to detect and predict binding sites was proposed as they can be scalable, fast and present low cost. RESULTS: We proposed Graph-based Residue neighborhood Strategy to Predict binding sites (GRaSP), a novel residue centric and scalable method to predict ligand-binding site residues. It is based on a supervised learning strategy that models the residue environment as a graph at the atomic level. Results show that GRaSP made compatible or superior predictions when compared with methods described in the literature. GRaSP outperformed six other residue-centric methods, including the one considered as state-of-the-art. Also, our method achieved better results than the method from CAMEO independent assessment. GRaSP ranked second when compared with five state-of-the-art pocket-centric methods, which we consider a significant result, as it was not devised to predict pockets. Finally, our method proved scalable as it took 10-20 s on average to predict the binding site for a protein complex whereas the state-of-the-art residue-centric method takes 2-5 h on average. AVAILABILITY AND IMPLEMENTATION: The source code and datasets are available at https://github.com/charles-abreu/GRaSP. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Proteínas , Software , Sítios de Ligação , Força da Mão , LigantesRESUMO
At first glance, longevity and immunity appear to be different traits that have not much in common except the fact that the immune system promotes survival upon pathogenic infection. Substantial evidence however points to a molecularly intertwined relationship between the immune system and ageing. Although this link is well-known throughout the animal kingdom, its genetic basis is complex and still poorly understood. To address this question, we here provide a compilation of all genes concomitantly known to be involved in immunity and ageing in humans and three well-studied model organisms, the nematode worm Caenorhabditis elegans, the fruit fly Drosophila melanogaster, and the house mouse Mus musculus. By analysing human orthologs among these species, we identified 7 evolutionarily conserved signalling cascades, the insulin/TOR network, three MAPK (ERK, p38, JNK), JAK/STAT, TGF-ß, and Nf-κB pathways that act pleiotropically on ageing and immunity. We review current evidence for these pathways linking immunity and lifespan, and their role in the detrimental dysregulation of the immune system with age, known as immunosenescence. We argue that the phenotypic effects of these pathways are often context-dependent and vary, for example, between tissues, sexes, and types of pathogenic infection. Future research therefore needs to explore a higher temporal, spatial and environmental resolution to fully comprehend the connection between ageing and immunity.
RESUMO
BACKGROUND: Proteases are key drivers in many biological processes, in part due to their specificity towards their substrates. However, depending on the family and molecular function, they can also display substrate promiscuity which can also be essential. Databases compiling specificity matrices derived from experimental assays have provided valuable insights into protease substrate recognition. Despite this, there are still gaps in our knowledge of the structural determinants. Here, we compile a set of protease crystal structures with bound peptide-like ligands to create a protocol for modelling substrates bound to protease structures, and for studying observables associated to the binding recognition. RESULTS: As an application, we modelled a subset of protease-peptide complexes for which experimental cleavage data are available to compare with informational entropies obtained from protease-specificity matrices. The modelled complexes were subjected to conformational sampling using the Backrub method in Rosetta, and multiple observables from the simulations were calculated and compared per peptide position. We found that some of the calculated structural observables, such as the relative accessible surface area and the interaction energy, can help characterize a protease's substrate recognition, giving insights for the potential prediction of novel substrates by combining additional approaches. CONCLUSION: Overall, our approach provides a repository of protease structures with annotated data, and an open source computational protocol to reproduce the modelling and dynamic analysis of the protease-peptide complexes.
Assuntos
Modelos Moleculares , Peptídeo Hidrolases/metabolismo , Peptídeos/química , Peptídeos/metabolismo , Automação , Ligantes , Peptídeo Hidrolases/química , Conformação Proteica , Software , Especificidade por SubstratoRESUMO
MOTIVATION: Cofactors are essential for many enzyme reactions. The Protein Data Bank (PDB) contains >67 000 entries containing enzyme structures, many with bound cofactor or cofactor-like molecules. This work aims to identify and categorize these small molecules in the PDB and make it easier to find them. RESULTS: The Protein Data Bank in Europe (PDBe; pdbe.org) has implemented a pipeline to identify enzyme cofactor and cofactor-like molecules, which are now part of the PDBe weekly release process. AVAILABILITY AND IMPLEMENTATION: Information is made available on the individual PDBe entry pages at pdbe.org and programmatically through the PDBe REST API (pdbe.org/api). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Bases de Dados de Proteínas , Coenzimas , Europa (Continente) , Conformação ProteicaRESUMO
MOTIVATION: Understanding the protein structural context and patterning on proteins of genomic variants can help to separate benign from pathogenic variants and reveal molecular consequences. However, mapping genomic coordinates to protein structures is non-trivial, complicated by alternative splicing and transcript evidence. RESULTS: Here we present VarMap, a web tool for mapping a list of chromosome coordinates to canonical UniProt sequences and associated protein 3D structures, including validation checks, and annotating them with structural information. AVAILABILITY AND IMPLEMENTATION: https://www.ebi.ac.uk/thornton-srv/databases/VarMap. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Genômica , Software , Sequência de Aminoácidos , Bases de Dados de Proteínas , Anotação de Sequência Molecular , ProteínasRESUMO
Advancing age is the dominant risk factor for most of the major killer diseases in developed countries. Hence, ameliorating the effects of ageing may prevent multiple diseases simultaneously. Drugs licensed for human use against specific diseases have proved to be effective in extending lifespan and healthspan in animal models, suggesting that there is scope for drug repurposing in humans. New bioinformatic methods to identify and prioritise potential anti-ageing compounds for humans are therefore of interest. In this study, we first used drug-protein interaction information, to rank 1,147 drugs by their likelihood of targeting ageing-related gene products in humans. Among 19 statistically significant drugs, 6 have already been shown to have pro-longevity properties in animal models (p < 0.001). Using the targets of each drug, we established their association with ageing at multiple levels of biological action including pathways, functions and protein interactions. Finally, combining all the data, we calculated a ranked list of drugs that identified tanespimycin, an inhibitor of HSP-90, as the top-ranked novel anti-ageing candidate. We experimentally validated the pro-longevity effect of tanespimycin through its HSP-90 target in Caenorhabditis elegans.
Assuntos
Envelhecimento/efeitos dos fármacos , Biologia Computacional/métodos , Descoberta de Drogas/métodos , Longevidade/efeitos dos fármacos , Substâncias Protetoras/farmacologia , Envelhecimento/genética , Animais , Caenorhabditis elegans/efeitos dos fármacos , Caenorhabditis elegans/genética , Proteínas de Caenorhabditis elegans/metabolismo , Interações Medicamentosas , Humanos , Substâncias Protetoras/metabolismo , Ligação ProteicaRESUMO
M-CSA (Mechanism and Catalytic Site Atlas) is a database of enzyme active sites and reaction mechanisms that can be accessed at www.ebi.ac.uk/thornton-srv/m-csa. Our objectives with M-CSA are to provide an open data resource for the community to browse known enzyme reaction mechanisms and catalytic sites, and to use the dataset to understand enzyme function and evolution. M-CSA results from the merging of two existing databases, MACiE (Mechanism, Annotation and Classification in Enzymes), a database of enzyme mechanisms, and CSA (Catalytic Site Atlas), a database of catalytic sites of enzymes. We are releasing M-CSA as a new website and underlying database architecture. At the moment, M-CSA contains 961 entries, 423 of these with detailed mechanism information, and 538 with information on the catalytic site residues only. In total, these cover 81% (195/241) of third level EC numbers with a PDB structure, and 30% (840/2793) of fourth level EC numbers with a PDB structure, out of 6028 in total. By searching for close homologues, we are able to extend M-CSA coverage of PDB and UniProtKB to 51 993 structures and to over five million sequences, respectively, of which about 40% and 30% have a conserved active site.
Assuntos
Bases de Dados de Proteínas , Enzimas/química , Enzimas/metabolismo , Biocatálise , Domínio Catalítico , Curadoria de Dados , Humanos , Internet , Interface Usuário-Computador , NavegadorRESUMO
Haploinsufficiency in DYRK1A is associated with a recognizable developmental syndrome, though the mechanism of action of pathogenic missense mutations is currently unclear. Here we present 19 de novo mutations in this gene, including five missense mutations, identified by the Deciphering Developmental Disorder study. Protein structural analysis reveals that the missense mutations are either close to the ATP or peptide binding-sites within the kinase domain, or are important for protein stability, suggesting they lead to a loss of the protein's function mechanism. Furthermore, there is some correlation between the magnitude of the change and the severity of the resultant phenotype. A comparison of the distribution of the pathogenic mutations along the length of DYRK1A with that of natural variants, as found in the ExAC database, confirms that mutations in the N-terminal end of the kinase domain are more disruptive of protein function. In particular, pathogenic mutations occur in significantly closer proximity to the ATP and the substrate peptide than the natural variants. Overall, we suggest that de novo dominant mutations in DYRK1A account for nearly 0.5% of severe developmental disorders due to substantially reduced kinase function.
Assuntos
Transtorno Autístico/genética , Deficiências do Desenvolvimento/genética , Deficiência Intelectual/genética , Proteínas Serina-Treonina Quinases/genética , Proteínas Tirosina Quinases/genética , Transtorno Autístico/patologia , Deficiências do Desenvolvimento/fisiopatologia , Feminino , Haploinsuficiência/genética , Humanos , Deficiência Intelectual/patologia , Masculino , Mutação , Mutação de Sentido Incorreto , Linhagem , Fenótipo , Conformação Proteica , Proteínas Serina-Treonina Quinases/química , Proteínas Tirosina Quinases/química , Relação Estrutura-Atividade , Quinases DyrkRESUMO
Motivation: One goal of synthetic biology is to make new enzymes to generate new products, but identifying the starting enzymes for further investigation is often elusive and relies on expert knowledge, intensive literature searching and trial and error. Results: We present Transform Molecules in Enzyme Reactions, an online computational tool that transforms query substrate molecules into products using enzyme reactions. The most similar native enzyme reactions for each transformation are found, highlighting those that may be of most interest for enzyme design and directed evolution approaches. Availability and implementation: https://www.ebi.ac.uk/thornton-srv/transform-miner.
Assuntos
Enzimas/análise , SoftwareRESUMO
DNA methylation is an important epigenetic modification in many species that is critical for development, and implicated in ageing and many complex diseases, such as cancer. Many cost-effective genome-wide analyses of DNA modifications rely on restriction enzymes capable of digesting genomic DNA at defined sequence motifs. There are hundreds of restriction enzyme families but few are used to date, because no tool is available for the systematic evaluation of restriction enzyme combinations that can enrich for certain sites of interest in a genome. Herein, we present customised Reduced Representation Bisulfite Sequencing (cuRRBS), a novel and easy-to-use computational method that solves this problem. By computing the optimal enzymatic digestions and size selection steps required, cuRRBS generalises the traditional MspI-based Reduced Representation Bisulfite Sequencing (RRBS) protocol to all restriction enzyme combinations. In addition, cuRRBS estimates the fold-reduction in sequencing costs and provides a robustness value for the personalised RRBS protocol, allowing users to tailor the protocol to their experimental needs. Moreover, we show in silico that cuRRBS-defined restriction enzymes consistently out-perform MspI digestion in many biological systems, considering both CpG and CHG contexts. Finally, we have validated the accuracy of cuRRBS predictions for single and double enzyme digestions using two independent experimental datasets.
Assuntos
Biologia Computacional/métodos , Metilação de DNA/genética , Análise de Sequência de DNA/economia , Análise de Sequência de DNA/métodos , Sequenciamento Completo do Genoma/métodos , Animais , Arabidopsis/genética , Sítios de Ligação/genética , Fator de Ligação a CCCTC/genética , Fator de Ligação a CCCTC/metabolismo , Ilhas de CpG/genética , Enzimas de Restrição do DNA/química , Humanos , Células-Tronco Pluripotentes Induzidas/citologia , Camundongos , Fator 1 Nuclear Respiratório/genética , Fator 1 Nuclear Respiratório/metabolismoRESUMO
Isomerization reactions are fundamental in biology, and isomers usually differ in their biological role and pharmacological effects. In this study, we have cataloged the isomerization reactions known to occur in biology using a combination of manual and computational approaches. This method provides a robust basis for comparison and clustering of the reactions into classes. Comparing our results with the Enzyme Commission (EC) classification, the standard approach to represent enzyme function on the basis of the overall chemistry of the catalyzed reaction, expands our understanding of the biochemistry of isomerization. The grouping of reactions involving stereoisomerism is straightforward with two distinct types (racemases/epimerases and cis-trans isomerases), but reactions entailing structural isomerism are diverse and challenging to classify using a hierarchical approach. This study provides an overview of which isomerases occur in nature, how we should describe and classify them, and their diversity.
Assuntos
Evolução Biológica , Isomerases/metabolismo , Biocatálise , Isomerases/química , Isomerismo , Conformação ProteicaRESUMO
We present a generic, multidisciplinary approach for improving our understanding of novel missense variants in recently discovered disease genes exhibiting genetic heterogeneity, by combining clinical and population genetics with protein structural analysis. Using six new de novo missense diagnoses in TBL1XR1 from the Deciphering Developmental Disorders study, together with population variation data, we show that the ß-propeller structure of the ubiquitous WD40 domain provides a convincing way to discriminate between pathogenic and benign variation. Children with likely pathogenic mutations in this gene have severely delayed language development, often accompanied by intellectual disability, autism, dysmorphology and gastrointestinal problems. Amino acids affected by likely pathogenic missense mutations are either crucial for the stability of the fold, forming part of a highly conserved symmetrically repeating hydrogen-bonded tetrad, or located at the top face of the ß-propeller, where 'hotspot' residues affect the binding of ß-catenin to the TBLR1 protein. In contrast, those altered by population variation are significantly less likely to be spatially clustered towards the top face or to be at buried or highly conserved residues. This result is useful not only for interpreting benign and pathogenic missense variants in this gene, but also in other WD40 domains, many of which are associated with disease.
Assuntos
Deficiências do Desenvolvimento/diagnóstico , Deficiências do Desenvolvimento/genética , Heterogeneidade Genética , Mutação de Sentido Incorreto , Proteínas Nucleares/química , Receptores Citoplasmáticos e Nucleares/química , Proteínas Repressoras/química , beta Catenina/química , Sequência de Aminoácidos , Criança , Pré-Escolar , Deficiências do Desenvolvimento/metabolismo , Deficiências do Desenvolvimento/patologia , Feminino , Expressão Gênica , Genética Populacional , Humanos , Ligação de Hidrogênio , Masculino , Modelos Moleculares , Dados de Sequência Molecular , Proteínas Nucleares/genética , Proteínas Nucleares/metabolismo , Prognóstico , Ligação Proteica , Domínios Proteicos , Estrutura Secundária de Proteína , Receptores Citoplasmáticos e Nucleares/genética , Receptores Citoplasmáticos e Nucleares/metabolismo , Proteínas Repressoras/genética , Proteínas Repressoras/metabolismo , Alinhamento de Sequência , beta Catenina/genética , beta Catenina/metabolismoRESUMO
We present EC-BLAST (http://www.ebi.ac.uk/thornton-srv/software/rbl/), an algorithm and Web tool for quantitative similarity searches between enzyme reactions at three levels: bond change, reaction center and reaction structure similarity. It uses bond changes and reaction patterns for all known biochemical reactions derived from atom-atom mapping across each reaction. EC-BLAST has the potential to improve enzyme classification, identify previously uncharacterized or new biochemical transformations, improve the assignment of enzyme function to sequences, and assist in enzyme engineering.