RESUMEN
The generation of an active [FeFe]-hydrogenase requires the synthesis of a complex metal center, the H-cluster, by three dedicated maturases: the radical S-adenosyl-l-methionine (SAM) enzymes HydE and HydG, and the GTPase HydF. A key step of [FeFe]-hydrogenase maturation is the synthesis of the dithiomethylamine (DTMA) bridging ligand, a process recently shown to involve the aminomethyl-lipoyl-H-protein from the glycine cleavage system, whose methylamine group originates from serine and ammonium. Here we use functional assays together with electron paramagnetic resonance and electron-nuclear double resonance spectroscopies to show that serine or aspartate together with their respective ammonia-lyase enzymes can provide the nitrogen for DTMA biosynthesis during in vitro [FeFe]-hydrogenase maturation. We also report bioinformatic analysis of the hyd operon, revealing a strong association with genes encoding ammonia-lyases, suggesting important biochemical and metabolic connections. Together, our results provide evidence that ammonia-lyases play an important role in [FeFe]-hydrogenase maturation by delivering the ammonium required for dithiomethylamine ligand synthesis.
RESUMEN
The catalytic residues of an enzyme comprise the amino acids located in the active center responsible for accelerating the enzyme-catalyzed reaction. These residues lower the activation energy of reactions by performing several catalytic functions. Decades of enzymology research has established general themes regarding the roles of specific residues in these catalytic reactions, but it has been more difficult to explore these roles in a more systematic way. Here, we review the data on the catalytic residues of 648 enzymes, as annotated in the Mechanism and Catalytic Site Atlas (M-CSA), and compare our results with those in previous studies. We structured this analysis around three key properties of the catalytic residues: amino acid type, catalytic function, and sequence conservation in homologous proteins. As expected, we observed that catalysis is mostly accomplished by a small set of residues performing a limited number of catalytic functions. Catalytic residues are typically highly conserved, but to a smaller degree in homologues that perform different reactions or are nonenzymes (pseudoenzymes). Cross-analysis yielded further insights revealing which residues perform particular functions and how often. We obtained more detailed specificity rules for certain functions by identifying the chemical group upon which the residue acts. Finally, we show the mutation tolerance of the catalytic residues based on their roles. The characterization of the catalytic residues, their functions, and conservation, as presented here, is key to understanding the impact of mutations in evolution, disease, and enzyme design. The tools developed for this analysis are available at the M-CSA website and allow for user specific analysis of the same data.
Asunto(s)
Aminoácidos/química , Dominio Catalítico , Enzimas/química , Secuencia de Aminoácidos , Aminoácidos/metabolismo , Animales , Biocatálisis , Secuencia Conservada , Bases de Datos de Proteínas , Enzimas/metabolismo , HumanosRESUMEN
M-CSA (Mechanism and Catalytic Site Atlas) is a database of enzyme active sites and reaction mechanisms that can be accessed at www.ebi.ac.uk/thornton-srv/m-csa. Our objectives with M-CSA are to provide an open data resource for the community to browse known enzyme reaction mechanisms and catalytic sites, and to use the dataset to understand enzyme function and evolution. M-CSA results from the merging of two existing databases, MACiE (Mechanism, Annotation and Classification in Enzymes), a database of enzyme mechanisms, and CSA (Catalytic Site Atlas), a database of catalytic sites of enzymes. We are releasing M-CSA as a new website and underlying database architecture. At the moment, M-CSA contains 961 entries, 423 of these with detailed mechanism information, and 538 with information on the catalytic site residues only. In total, these cover 81% (195/241) of third level EC numbers with a PDB structure, and 30% (840/2793) of fourth level EC numbers with a PDB structure, out of 6028 in total. By searching for close homologues, we are able to extend M-CSA coverage of PDB and UniProtKB to 51 993 structures and to over five million sequences, respectively, of which about 40% and 30% have a conserved active site.
Asunto(s)
Bases de Datos de Proteínas , Enzimas/química , Enzimas/metabolismo , Biocatálisis , Dominio Catalítico , Curaduría de Datos , Humanos , Internet , Interfaz Usuario-Computador , Navegador WebRESUMEN
The tautomerase superfamily (TSF) consists of more than 11,000 nonredundant sequences present throughout the biosphere. Characterized members have attracted much attention because of the unusual and key catalytic role of an N-terminal proline. These few characterized members catalyze a diverse range of chemical reactions, but the full scale of their chemical capabilities and biological functions remains unknown. To gain new insight into TSF structure-function relationships, we performed a global analysis of similarities across the entire superfamily and computed a sequence similarity network to guide classification into distinct subgroups. Our results indicate that TSF members are found in all domains of life, with most being present in bacteria. The eukaryotic members of the cis-3-chloroacrylic acid dehalogenase subgroup are limited to fungal species, whereas the macrophage migration inhibitory factor subgroup has wide eukaryotic representation (including mammals). Unexpectedly, we found that 346 TSF sequences lack Pro-1, of which 85% are present in the malonate semialdehyde decarboxylase subgroup. The computed network also enabled the identification of similarity paths, namely sequences that link functionally diverse subgroups and exhibit transitional structural features that may help explain reaction divergence. A structure-guided comparison of these linker proteins identified conserved transitions between them, and kinetic analysis paralleled these observations. Phylogenetic reconstruction of the linker set was consistent with these findings. Our results also suggest that contemporary TSF members may have evolved from a short 4-oxalocrotonate tautomerase-like ancestor followed by gene duplication and fusion. Our new linker-guided strategy can be used to enrich the discovery of sequence/structure/function transitions in other enzyme superfamilies.
Asunto(s)
Enzimas/química , Enzimas/metabolismo , Eucariontes/enzimología , Familia de Multigenes , Secuencia de Aminoácidos , Animales , Sitios de Unión , Cristalografía por Rayos X , Enzimas/genética , Eucariontes/química , Eucariontes/clasificación , Eucariontes/genética , Evolución Molecular , Humanos , Cinética , Datos de Secuencia Molecular , Filogenia , Plantas/química , Plantas/enzimología , Plantas/genética , Alineación de SecuenciaRESUMEN
InterPro (http://www.ebi.ac.uk/interpro/) is a freely available database used to classify protein sequences into families and to predict the presence of important domains and sites. InterProScan is the underlying software that allows both protein and nucleic acid sequences to be searched against InterPro's predictive models, which are provided by its member databases. Here, we report recent developments with InterPro and its associated software, including the addition of two new databases (SFLD and CDD), and the functionality to include residue-level annotation and prediction of intrinsic disorder. These developments enrich the annotations provided by InterPro, increase the overall number of residues annotated and allow more specific functional inferences.
Asunto(s)
Biología Computacional/métodos , Bases de Datos de Proteínas , Dominios y Motivos de Interacción de Proteínas , Programas Informáticos , Humanos , Anotación de Secuencia Molecular , FilogeniaRESUMEN
We present EC-BLAST (http://www.ebi.ac.uk/thornton-srv/software/rbl/), an algorithm and Web tool for quantitative similarity searches between enzyme reactions at three levels: bond change, reaction center and reaction structure similarity. It uses bond changes and reaction patterns for all known biochemical reactions derived from atom-atom mapping across each reaction. EC-BLAST has the potential to improve enzyme classification, identify previously uncharacterized or new biochemical transformations, improve the assignment of enzyme function to sequences, and assist in enzyme engineering.
Asunto(s)
Algoritmos , Bases de Datos de Proteínas , Enzimas/química , Enzimas/metabolismo , Programas Informáticos , Animales , Fenómenos Bioquímicos , Catálisis , Enzimas/clasificación , Humanos , InternetRESUMEN
UNLABELLED: Extracting chemical features like Atom-Atom Mapping (AAM), Bond Changes (BCs) and Reaction Centres from biochemical reactions helps us understand the chemical composition of enzymatic reactions. Reaction Decoder is a robust command line tool, which performs this task with high accuracy. It supports standard chemical input/output exchange formats i.e. RXN/SMILES, computes AAM, highlights BCs and creates images of the mapped reaction. This aids in the analysis of metabolic pathways and the ability to perform comparative studies of chemical reactions based on these features. AVAILABILITY AND IMPLEMENTATION: This software is implemented in Java, supported on Windows, Linux and Mac OSX, and freely available at https://github.com/asad/ReactionDecoder CONTACT: : asad@ebi.ac.uk or s9asad@gmail.com.
Asunto(s)
Bioquímica/métodos , Biología Computacional/métodos , Redes y Vías Metabólicas , Programas Informáticos , Minería de Datos , Bases de Datos de Compuestos QuímicosRESUMEN
Understanding which are the catalytic residues in an enzyme and what function they perform is crucial to many biology studies, particularly those leading to new therapeutics and enzyme design. The original version of the Catalytic Site Atlas (CSA) (http://www.ebi.ac.uk/thornton-srv/databases/CSA) published in 2004, which catalogs the residues involved in enzyme catalysis in experimentally determined protein structures, had only 177 curated entries and employed a simplistic approach to expanding these annotations to homologous enzyme structures. Here we present a new version of the CSA (CSA 2.0), which greatly expands the number of both curated (968) and automatically annotated catalytic sites in enzyme structures, utilizing a new method for annotation transfer. The curated entries are used, along with the variation in residue type from the sequence comparison, to generate 3D templates of the catalytic sites, which in turn can be used to find catalytic sites in new structures. To ease the transfer of CSA annotations to other resources a new ontology has been developed: the Enzyme Mechanism Ontology, which has permitted the transfer of annotations to Mechanism, Annotation and Classification in Enzymes (MACiE) and UniProt Knowledge Base (UniProtKB) resources. The CSA database schema has been re-designed and both the CSA data and search capabilities are presented in a new modern web interface.
Asunto(s)
Dominio Catalítico , Bases de Datos de Proteínas , Enzimas/química , Ontologías Biológicas , Internet , Análisis de Secuencia de ProteínaRESUMEN
The Structure-Function Linkage Database (SFLD, http://sfld.rbvi.ucsf.edu/) is a manually curated classification resource describing structure-function relationships for functionally diverse enzyme superfamilies. Members of such superfamilies are diverse in their overall reactions yet share a common ancestor and some conserved active site features associated with conserved functional attributes such as a partial reaction. Thus, despite their different functions, members of these superfamilies 'look alike', making them easy to misannotate. To address this complexity and enable rational transfer of functional features to unknowns only for those members for which we have sufficient functional information, we subdivide superfamily members into subgroups using sequence information, and lastly into families, sets of enzymes known to catalyze the same reaction using the same mechanistic strategy. Browsing and searching options in the SFLD provide access to all of these levels. The SFLD offers manually curated as well as automatically classified superfamily sets, both accompanied by search and download options for all hierarchical levels. Additional information includes multiple sequence alignments, tab-separated files of functional and other attributes, and sequence similarity networks. The latter provide a new and intuitively powerful way to visualize functional trends mapped to the context of sequence similarity.
Asunto(s)
Bases de Datos de Proteínas , Enzimas/química , Enzimas/clasificación , Enzimas/metabolismo , Internet , Anotación de Secuencia Molecular , Alineación de Secuencia , Relación Estructura-ActividadRESUMEN
HydE and HydG are radical S-adenosyl-l-methionine enzymes required for the maturation of [FeFe]-hydrogenase (HydA) and produce the nonprotein organic ligands characteristic of its unique catalytic cluster. The catalytic cluster of HydA (the H-cluster) is a typical [4Fe-4S] cubane bridged to a 2Fe-subcluster that contains two carbon monoxides, three cyanides, and a bridging dithiomethylamine as ligands. While recent studies have shed light on the nature of diatomic ligand biosynthesis by HydG, little information exists on the function of HydE. Herein, we present biochemical, spectroscopic, bioinformatic, and molecular modeling data that together map the active site and provide significant insight into the role of HydE in H-cluster biosynthesis. Electron paramagnetic resonance and UV-visible spectroscopic studies demonstrate that reconstituted HydE binds two [4Fe-4S] clusters and copurifies with S-adenosyl-l-methionine. Incorporation of deuterium from D2O into 5'-deoxyadenosine, the cleavage product of S-adenosyl-l-methionine, coupled with molecular docking experiments suggests that the HydE substrate contains a thiol functional group. This information, along with HydE sequence similarity and genome context networks, has allowed us to redefine the presumed mechanism for HydE away from BioB-like sulfur insertion chemistry; these data collectively suggest that the source of the sulfur atoms in the dithiomethylamine bridge of the H-cluster is likely derived from HydE's thiol containing substrate.
Asunto(s)
Clostridium acetobutylicum/enzimología , Dimetilaminas/metabolismo , Hidrogenasas/metabolismo , Proteínas Hierro-Azufre/metabolismo , Procesamiento Proteico-Postraduccional , Azufre/metabolismo , Catálisis , Dominio Catalítico , Desoxiadenosinas/química , Desoxiadenosinas/metabolismo , Deuterio/química , Espectroscopía de Resonancia por Spin del Electrón , Hidrogenasas/química , Hierro/metabolismo , Proteínas Hierro-Azufre/química , Simulación del Acoplamiento Molecular , Espectrofotometría Ultravioleta , Azufre/químicaRESUMEN
As the volume of data relating to proteins increases, researchers rely more and more on the analysis of published data, thus increasing the importance of good access to these data that vary from the supplemental material of individual articles, all the way to major reference databases with professional staff and long-term funding. Specialist protein resources fill an important middle ground, providing interactive web interfaces to their databases for a focused topic or family of proteins, using specialized approaches that are not feasible in the major reference databases. Many are labors of love, run by a single lab with little or no dedicated funding and there are many challenges to building and maintaining them. This perspective arose from a meeting of several specialist protein resources and major reference databases held at the Wellcome Trust Genome Campus (Cambridge, UK) on August 11 and 12, 2014. During this meeting some common key challenges involved in creating and maintaining such resources were discussed, along with various approaches to address them. In laying out these challenges, we aim to inform users about how these issues impact our resources and illustrate ways in which our working together could enhance their accuracy, currency, and overall value.
Asunto(s)
Bases de Datos de Proteínas/normas , Anotación de Secuencia Molecular , Proteínas , Curaduría de DatosRESUMEN
The availability of comprehensive information about enzymes plays an important role in answering questions relevant to interdisciplinary fields such as biochemistry, enzymology, biofuels, bioengineering and drug discovery. At the EMBL European Bioinformatics Institute, we have developed an enzyme portal (http://www.ebi.ac.uk/enzymeportal) to provide this wealth of information on enzymes from multiple in-house resources addressing particular data classes: protein sequence and structure, reactions, pathways and small molecules. The fact that these data reside in separate databases makes information discovery cumbersome. The main goal of the portal is to simplify this process for end users.
Asunto(s)
Bases de Datos de Proteínas , Enzimas/química , Enzimas/metabolismo , Enfermedad , Enzimas/genética , Internet , Conformación Proteica , Interfaz Usuario-ComputadorRESUMEN
MACiE (which stands for Mechanism, Annotation and Classification in Enzymes) is a database of enzyme reaction mechanisms, and can be accessed from http://www.ebi.ac.uk/thornton-srv/databases/MACiE/. This article presents the release of Version 3 of MACiE, which not only extends the dataset to 335 entries, covering 182 of the EC sub-subclasses with a crystal structure available (~90%), but also incorporates greater chemical and structural detail. This version of MACiE represents a shift in emphasis for new entries, from non-homologous representatives covering EC reaction space to enzymes with mechanisms of interest to our users and collaborators with a view to exploring the chemical diversity of life. We present new tools for exploring the data in MACiE and comparing entries as well as new analyses of the data and new searches, many of which can now be accessed via dedicated Perl scripts.
Asunto(s)
Bases de Datos de Proteínas , Enzimas/química , Biocatálisis , Fenómenos Bioquímicos , Dominio Catalítico , Coenzimas/química , Enzimas/clasificación , Internet , Anotación de Secuencia MolecularRESUMEN
FunTree is a new resource that brings together sequence, structure, phylogenetic, chemical and mechanistic information for structurally defined enzyme superfamilies. Gathering together this range of data into a single resource allows the investigation of how novel enzyme functions have evolved within a structurally defined superfamily as well as providing a means to analyse trends across many superfamilies. This is done not only within the context of an enzyme's sequence and structure but also the relationships of their reactions. Developed in tandem with the CATH database, it currently comprises 276 superfamilies covering ~1800 (70%) of sequence assigned enzyme reactions. Central to the resource are phylogenetic trees generated from structurally informed multiple sequence alignments using both domain structural alignments supplemented with domain sequences and whole sequence alignments based on commonality of multi-domain architectures. These trees are decorated with functional annotations such as metabolite similarity as well as annotations from manually curated resources such the catalytic site atlas and MACiE for enzyme mechanisms. The resource is freely available through a web interface: www.ebi.ac.uk/thorton-srv/databases/FunTree.
Asunto(s)
Bases de Datos de Proteínas , Enzimas/química , Enzimas/clasificación , Evolución Biológica , Enzimas/metabolismo , Filogenia , Estructura Terciaria de Proteína , Alineación de Secuencia , Análisis de Secuencia de ProteínaRESUMEN
Precision medicine aims at tailoring treatments to individual patient's characteristics. In this regard, recognizing the significance of sex and gender becomes indispensable for meeting the distinct healthcare needs of diverse populations. To this end, continuing a trend of improving data quality observed since 2014, the European Genome-phenome Archive (EGA) established a policy in 2018 that mandates data providers to declare the sex of donor samples, aiming to enhance data accuracy and prevent imbalance in sex classification. We analyzed sex classification imbalance in human data from EGA and the U.S. counterpart, the database of genotypes and phenotypes (dbGaP). Our findings show a significant decrease in samples classified as unknown in EGA, potentially promoting better sex reporting during data collection. Based on our findings, we raise awareness of sample imbalance problems and provide a list of recommendations for enhancing biomedical research practices.
RESUMEN
In order to understand the evolution of enzyme reactions and to gain an overview of biological catalysis we have combined sequence and structural data to generate phylogenetic trees in an analysis of 276 structurally defined enzyme superfamilies, and used these to study how enzyme functions have evolved. We describe in detail the analysis of two superfamilies to illustrate different paradigms of enzyme evolution. Gathering together data from all the superfamilies supports and develops the observation that they have all evolved to act on a diverse set of substrates, whilst the evolution of new chemistry is much less common. Despite that, by bringing together so much data, we can provide a comprehensive overview of the most common and rare types of changes in function. Our analysis demonstrates on a larger scale than previously studied, that modifications in overall chemistry still occur, with all possible changes at the primary level of the Enzyme Commission (E.C.) classification observed to a greater or lesser extent. The phylogenetic trees map out the evolutionary route taken within a superfamily, as well as all the possible changes within a superfamily. This has been used to generate a matrix of observed exchanges from one enzyme function to another, revealing the scale and nature of enzyme evolution and that some types of exchanges between and within E.C. classes are more prevalent than others. Surprisingly a large proportion (71%) of all known enzyme functions are performed by this relatively small set of 276 superfamilies. This reinforces the hypothesis that relatively few ancient enzymatic domain superfamilies were progenitors for most of the chemistry required for life.
Asunto(s)
Enzimas/química , Enzimas/fisiología , Evolución Molecular , Análisis de Secuencia de Proteína/métodos , Secuencia de Aminoácidos , Datos de Secuencia Molecular , Relación Estructura-ActividadRESUMEN
MOTIVATION: Organic enzyme cofactors are involved in many enzyme reactions. Therefore, the analysis of cofactors is crucial to gain a better understanding of enzyme catalysis. To aid this, we have created the CoFactor database. RESULTS: CoFactor provides a web interface to access hand-curated data extracted from the literature on organic enzyme cofactors in biocatalysis, as well as automatically collected information. CoFactor includes information on the conformational and solvent accessibility variation of the enzyme-bound cofactors, as well as mechanistic and structural information about the hosting enzymes. AVAILABILITY: The database is publicly available and can be accessed at http://www.ebi.ac.uk/thornton-srv/databases/CoFactor.
Asunto(s)
Coenzimas/química , Bases de Datos Factuales , Biocatálisis , Catálisis , Coenzimas/metabolismo , Enzimas/química , Enzimas/metabolismo , Internet , Conformación ProteicaRESUMEN
SUMMARY: Metal-MACiE is a new publicly available web-based database, held in MySQL, which aims to organize the available information on the properties and the roles of metals in the context of the catalytic mechanisms of metalloenzymes. Metal-MACiE, which currently covers 75% of metal-dependent enzyme commission (EC) sub-sub-classes and is continuously growing, exploits the existing MACiE database for the annotation of the reaction mechanisms. The two databases constitute complementary sources of information for enzymology, biochemistry and molecular pharmacology studies. AVAILABILITY: http://www.ebi.ac.uk/thornton-srv/databases/Metal_MACiE/home.html.
Asunto(s)
Biología Computacional/métodos , Metales/química , Programas Informáticos , Biocatálisis , Bases de Datos Factuales , Bases de Datos de Proteínas , Metaloproteínas/química , Metaloproteínas/metabolismo , Metales/metabolismoRESUMEN
Determining the molecular function of enzymes discovered by genome sequencing represents a primary foundation for understanding many aspects of biology. Historically, classification of enzyme reactions has used the enzyme nomenclature system developed to describe the overall reactions performed by biochemically characterized enzymes, irrespective of their associated sequences. In contrast, functional classification and assignment for the millions of protein sequences of unknown function now available is largely done in two computational steps, first by similarity-based assignment of newly obtained sequences to homologous groups, followed by transferring to them the known functions of similar biochemically characterized homologs. Due to the fundamental differences in their etiologies and practice, `how' these chemistry- and evolution-centric functional classification systems relate to each other has been difficult to explore on a large scale. To investigate this issue in a new way, we integrated two published ontologies that had previously described each of these classification systems independently. The resulting infrastructure was then used to compare the functional assignments obtained from each classification system for the well-studied and functionally diverse enolase superfamily. Mapping these function assignments to protein structure and reaction similarity networks shows a profound and complex disconnect between the homology- and chemistry-based classification systems. This conclusion mirrors previous observations suggesting that except for closely related sequences, facile annotation transfer from small numbers of characterized enzymes to the huge number uncharacterized homologs to which they are related is problematic. Our extension of these comparisons to large enzyme superfamilies in a computationally intelligent manner provides a foundation for new directions in protein function prediction for the huge proportion of sequences of unknown function represented in major databases. Interactive sequence, reaction, substrate and product similarity networks computed for this work for the enolase and two other superfamilies are freely available for download from the Structure Function Linkage Database Archive (http://sfld.rbvi.ucsf.edu).
Asunto(s)
Biología Computacional/métodos , Bases de Datos de Proteínas , Enzimas , Enzimas/química , Enzimas/clasificación , Enzimas/fisiología , Anotación de Secuencia Molecular , Relación Estructura-ActividadRESUMEN
MACiE (Mechanism, Annotation and Classification in Enzymes) is a database of enzyme reaction mechanisms, and is publicly available as a web-based data resource. This paper presents the first release of a web-based search tool to explore enzyme reaction mechanisms in MACiE. We also present Version 2 of MACiE, which doubles the dataset available (from Version 1). MACiE can be accessed from http://www.ebi.ac.uk/thornton-srv/databases/MACiE/