Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 103
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Cell ; 163(3): 535-7, 2015 Oct 22.
Article in English | MEDLINE | ID: mdl-26496596

ABSTRACT

Using mutation libraries and deep sequencing, Aakre et al. study the evolution of protein-protein interactions using a toxin-antitoxin model. The results indicate probable trajectories via "intermediate" proteins that are promiscuous, thus avoiding transitions via non-interactions. These results extend observations about other biological interactions and enzyme evolution, suggesting broadly general principles.


Subject(s)
Evolution, Molecular , Mesorhizobium/metabolism , Protein Interaction Maps
2.
Biochemistry ; 60(22): 1776-1786, 2021 06 08.
Article in English | MEDLINE | ID: mdl-34019384

ABSTRACT

The tautomerase superfamily (TSF) is a collection of enzymes and proteins that share a simple ß-α-ß structural scaffold. Most members are constructed from a single-core ß-α-ß motif or two consecutively fused ß-α-ß motifs in which the N-terminal proline (Pro-1) plays a key and unusual role as a catalytic residue. The cumulative evidence suggests that a gene fusion event took place in the evolution of the TSF followed by duplication (of the newly fused gene) to result in the diversification of activity that is seen today. Analysis of the sequence similarity network (SSN) for the TSF identified several linking proteins ("linkers") whose similarity links subgroups of these contemporary proteins that might hold clues about structure-function relationship changes accompanying the emergence of new activities. A previously uncharacterized pair of linkers (designated N1 and N2) was identified in the SSN that connected the 4-oxalocrotonate tautomerase (4-OT) and cis-3-chloroacrylic acid dehalogenase (cis-CaaD) subgroups. N1, in the cis-CaaD subgroup, has the full complement of active site residues for cis-CaaD activity, whereas N2, in the 4-OT subgroup, lacks a key arginine (Arg-39) for canonical 4-OT activity. Kinetic characterization and nuclear magnetic resonance analysis show that N1 has activities observed for other characterized members of the cis-CaaD subgroup with varying degrees of efficiencies. N2 is a modest 4-OT but shows enhanced hydratase activity using allene and acetylene compounds, which might be due to the presence of Arg-8 along with Arg-11. Crystallographic analysis provides a structural context for these observations.


Subject(s)
Hydrolases/chemistry , Isomerases/chemistry , Amino Acid Sequence , Binding Sites/physiology , Catalysis , Catalytic Domain/physiology , Evolution, Molecular , Kinetics , Magnetic Resonance Spectroscopy/methods , Models, Chemical
3.
Nucleic Acids Res ; 47(D1): D351-D360, 2019 01 08.
Article in English | MEDLINE | ID: mdl-30398656

ABSTRACT

The InterPro database (http://www.ebi.ac.uk/interpro/) classifies protein sequences into families and predicts the presence of functionally important domains and sites. Here, we report recent developments with InterPro (version 70.0) and its associated software, including an 18% growth in the size of the database in terms on new InterPro entries, updates to content, the inclusion of an additional entry type, refined modelling of discontinuous domains, and the development of a new programmatic interface and website. These developments extend and enrich the information provided by InterPro, and provide greater flexibility in terms of data access. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB, and discuss how our evaluation of residue coverage may help guide future curation activities.


Subject(s)
Databases, Protein , Molecular Sequence Annotation , Animals , Databases, Genetic , Gene Ontology , Humans , Internet , Multigene Family , Protein Domains/genetics , Sequence Homology, Amino Acid , Software , User-Computer Interface
4.
Biochemistry ; 59(16): 1592-1603, 2020 04 28.
Article in English | MEDLINE | ID: mdl-32242662

ABSTRACT

Tautomerase superfamily (TSF) members are constructed from a single ß-α-ß unit or two consecutively joined ß-α-ß units. This pattern prevails throughout the superfamily consisting of more than 11000 members where homo- or heterohexamers are localized in the 4-oxalocrotonate tautomerase (4-OT) subgroup and trimers are found in the other four subgroups. One exception is a subset of sequences that are double the length of the short 4-OTs in the 4-OT subgroup, where the coded proteins form trimers. Characterization of two members revealed an interesting dichotomy. One is a symmetric trimer, whereas the other is an asymmetric trimer. One monomer is flipped 180° relative to the other two monomers so that three unique protein-protein interfaces are created that are composed of different residues. A bioinformatics analysis of the fused 4-OT subset shows a further division into two clusters with a total of 133 sequences. The analysis showed that members of one cluster (86 sequences) have more salt bridges if the asymmetric trimer forms, whereas the members of the other cluster (47 sequences) have more salt bridges if the symmetric trimer forms. This hypothesis was examined by the kinetic and structural characterization of two proteins within each cluster. As predicted, all four proteins function as 4-OTs, where two assemble into asymmetric trimers (designated R7 and F6) and two form symmetric trimers (designated W0 and Q0). These findings can be extended to the other sequences in the two clusters in the fused 4-OT subset, thereby annotating their oligomer properties and activities.


Subject(s)
Bacterial Proteins/chemistry , Isomerases/chemistry , Protein Structure, Quaternary , Alcaligenaceae/enzymology , Amino Acid Sequence , Binding Sites , Bordetella/enzymology , Burkholderia/enzymology , Burkholderiaceae/enzymology , Computational Biology , Kinetics , Sequence Alignment
5.
Bioinformatics ; 35(3): 442-451, 2019 02 01.
Article in English | MEDLINE | ID: mdl-30084920

ABSTRACT

Motivation: Critical evaluation of methods for protein function prediction shows that data integration improves the performance of methods that predict protein function, but a basic BLAST-based method is still a top contender. We sought to engineer a method that modernizes the classical approach while avoiding pitfalls common to state-of-the-art methods. Results: We present a method for predicting protein function, Effusion, which uses a sequence similarity network to add context for homology transfer, a probabilistic model to account for the uncertainty in labels and function propagation, and the structure of the Gene Ontology (GO) to best utilize sparse input labels and make consistent output predictions. Effusion's model makes it practical to integrate rare experimental data and abundant primary sequence and sequence similarity. We demonstrate Effusion's performance using a critical evaluation method and provide an in-depth analysis. We also dissect the design decisions we used to address challenges for predicting protein function. Finally, we propose directions in which the framework of the method can be modified for additional predictive power. Availability and implementation: The source code for an implementation of Effusion is freely available at https://github.com/babbittlab/effusion. Supplementary information: Supplementary data are available at Bioinformatics online.


Subject(s)
Computational Biology , Proteins/chemistry , Software , Gene Ontology
6.
Proc Natl Acad Sci U S A ; 114(45): E9549-E9558, 2017 11 07.
Article in English | MEDLINE | ID: mdl-29078300

ABSTRACT

Insight regarding how diverse enzymatic functions and reactions have evolved from ancestral scaffolds is fundamental to understanding chemical and evolutionary biology, and for the exploitation of enzymes for biotechnology. We undertook an extensive computational analysis using a unique and comprehensive combination of tools that include large-scale phylogenetic reconstruction to determine the sequence, structural, and functional relationships of the functionally diverse flavin mononucleotide-dependent nitroreductase (NTR) superfamily (>24,000 sequences from all domains of life, 54 structures, and >10 enzymatic functions). Our results suggest an evolutionary model in which contemporary subgroups of the superfamily have diverged in a radial manner from a minimal flavin-binding scaffold. We identified the structural design principle for this divergence: Insertions at key positions in the minimal scaffold that, combined with the fixation of key residues, have led to functional specialization. These results will aid future efforts to delineate the emergence of functional diversity in enzyme superfamilies, provide clues for functional inference for superfamily members of unknown function, and facilitate rational redesign of the NTR scaffold.


Subject(s)
Nitroreductases/genetics , Computational Biology/methods , Evolution, Molecular , Flavin Mononucleotide/genetics , Models, Molecular , Phylogeny
7.
Biochemistry ; 58(22): 2617-2627, 2019 06 04.
Article in English | MEDLINE | ID: mdl-31074977

ABSTRACT

A 4-oxalocrotonate tautomerase (4-OT) trimer has been isolated from Burkholderia lata, and a kinetic, mechanistic, and structural analysis has been performed. The enzyme is the third described oligomer state for 4-OT along with a homo- and heterohexamer. The 4-OT trimer is part of a small subset of sequences (133 sequences) within the 4-OT subgroup of the tautomerase superfamily (TSF). The TSF has two distinct features: members are composed of a single ß-α-ß unit (homo- and heterohexamer) or two consecutively joined ß-α-ß units (trimer) and generally have a catalytic amino-terminal proline. The enzyme, designated as fused 4-OT, functions as a 4-OT where the active site groups (Pro-1, Arg-39, Arg-76, Phe-115, Arg-127) mirror those in the canonical 4-OT from Pseudomonas putida mt-2. Inactivation by 2-oxo-3-pentynoate suggests that Pro-1 of fused 4-OT has a low p Ka enabling the prolyl nitrogen to function as a general base. A remarkable feature of the fused 4-OT is the absence of P3 rotational symmetry in the structure (1.5 Å resolution). The asymmetric arrangement of the trimer is not due to the fusion of the two ß-α-ß building blocks because an engineered "unfused" variant that breaks the covalent bond between the two units (to generate a heterohexamer) assumes the same asymmetric oligomerization state. It remains unknown how the different active site configurations contribute to the observed overall activities and whether the asymmetry has a biological purpose or role in the evolution of TSF members.


Subject(s)
Bacterial Proteins/chemistry , Isomerases/chemistry , Amino Acid Sequence , Bacterial Proteins/genetics , Bacterial Proteins/isolation & purification , Burkholderia/enzymology , Catalytic Domain , Fatty Acids, Unsaturated/chemistry , Isomerases/genetics , Isomerases/isolation & purification , Kinetics , Models, Chemical , Mutation , Protein Structure, Quaternary , Pseudomonas putida/enzymology , Sequence Alignment
8.
J Biol Chem ; 293(7): 2342-2357, 2018 02 16.
Article in English | MEDLINE | ID: mdl-29184004

ABSTRACT

The tautomerase superfamily (TSF) consists of more than 11,000 nonredundant sequences present throughout the biosphere. Characterized members have attracted much attention because of the unusual and key catalytic role of an N-terminal proline. These few characterized members catalyze a diverse range of chemical reactions, but the full scale of their chemical capabilities and biological functions remains unknown. To gain new insight into TSF structure-function relationships, we performed a global analysis of similarities across the entire superfamily and computed a sequence similarity network to guide classification into distinct subgroups. Our results indicate that TSF members are found in all domains of life, with most being present in bacteria. The eukaryotic members of the cis-3-chloroacrylic acid dehalogenase subgroup are limited to fungal species, whereas the macrophage migration inhibitory factor subgroup has wide eukaryotic representation (including mammals). Unexpectedly, we found that 346 TSF sequences lack Pro-1, of which 85% are present in the malonate semialdehyde decarboxylase subgroup. The computed network also enabled the identification of similarity paths, namely sequences that link functionally diverse subgroups and exhibit transitional structural features that may help explain reaction divergence. A structure-guided comparison of these linker proteins identified conserved transitions between them, and kinetic analysis paralleled these observations. Phylogenetic reconstruction of the linker set was consistent with these findings. Our results also suggest that contemporary TSF members may have evolved from a short 4-oxalocrotonate tautomerase-like ancestor followed by gene duplication and fusion. Our new linker-guided strategy can be used to enrich the discovery of sequence/structure/function transitions in other enzyme superfamilies.


Subject(s)
Enzymes/chemistry , Enzymes/metabolism , Eukaryota/enzymology , Multigene Family , Amino Acid Sequence , Animals , Binding Sites , Crystallography, X-Ray , Enzymes/genetics , Eukaryota/chemistry , Eukaryota/classification , Eukaryota/genetics , Evolution, Molecular , Humans , Kinetics , Molecular Sequence Data , Phylogeny , Plants/chemistry , Plants/enzymology , Plants/genetics , Sequence Alignment
9.
Plant Cell ; 28(10): 2632-2650, 2016 10.
Article in English | MEDLINE | ID: mdl-27650333

ABSTRACT

Marchantia polymorpha is a basal terrestrial land plant, which like most liverworts accumulates structurally diverse terpenes believed to serve in deterring disease and herbivory. Previous studies have suggested that the mevalonate and methylerythritol phosphate pathways, present in evolutionarily diverged plants, are also operative in liverworts. However, the genes and enzymes responsible for the chemical diversity of terpenes have yet to be described. In this study, we resorted to a HMMER search tool to identify 17 putative terpene synthase genes from M. polymorpha transcriptomes. Functional characterization identified four diterpene synthase genes phylogenetically related to those found in diverged plants and nine rather unusual monoterpene and sesquiterpene synthase-like genes. The presence of separate monofunctional diterpene synthases for ent-copalyl diphosphate and ent-kaurene biosynthesis is similar to orthologs found in vascular plants, pushing the date of the underlying gene duplication and neofunctionalization of the ancestral diterpene synthase gene family to >400 million years ago. By contrast, the mono- and sesquiterpene synthases represent a distinct class of enzymes, not related to previously described plant terpene synthases and only distantly so to microbial-type terpene synthases. The absence of a Mg2+ binding, aspartate-rich, DDXXD motif places these enzymes in a noncanonical family of terpene synthases.


Subject(s)
Alkyl and Aryl Transferases/metabolism , Marchantia/enzymology , Marchantia/metabolism , Alkyl and Aryl Transferases/genetics , Evolution, Molecular , Marchantia/genetics , Transcriptome/genetics
10.
Nature ; 502(7473): 698-702, 2013 Oct 31.
Article in English | MEDLINE | ID: mdl-24056934

ABSTRACT

Assigning valid functions to proteins identified in genome projects is challenging: overprediction and database annotation errors are the principal concerns. We and others are developing computation-guided strategies for functional discovery with 'metabolite docking' to experimentally derived or homology-based three-dimensional structures. Bacterial metabolic pathways often are encoded by 'genome neighbourhoods' (gene clusters and/or operons), which can provide important clues for functional assignment. We recently demonstrated the synergy of docking and pathway context by 'predicting' the intermediates in the glycolytic pathway in Escherichia coli. Metabolite docking to multiple binding proteins and enzymes in the same pathway increases the reliability of in silico predictions of substrate specificities because the pathway intermediates are structurally similar. Here we report that structure-guided approaches for predicting the substrate specificities of several enzymes encoded by a bacterial gene cluster allowed the correct prediction of the in vitro activity of a structurally characterized enzyme of unknown function (PDB 2PMQ), 2-epimerization of trans-4-hydroxy-L-proline betaine (tHyp-B) and cis-4-hydroxy-D-proline betaine (cHyp-B), and also the correct identification of the catabolic pathway in which Hyp-B 2-epimerase participates. The substrate-liganded pose predicted by virtual library screening (docking) was confirmed experimentally. The enzymatic activities in the predicted pathway were confirmed by in vitro assays and genetic analyses; the intermediates were identified by metabolomics; and repression of the genes encoding the pathway by high salt concentrations was established by transcriptomics, confirming the osmolyte role of tHyp-B. This study establishes the utility of structure-guided functional predictions to enable the discovery of new metabolic pathways.


Subject(s)
Bacteria , Enzymes/chemistry , Enzymes/genetics , Genome, Bacterial/genetics , Metabolic Networks and Pathways/genetics , Molecular Sequence Annotation/methods , Structural Homology, Protein , Bacteria/enzymology , Bacteria/genetics , Bacteria/metabolism , Bacterial Proteins/chemistry , Bacterial Proteins/genetics , Bacterial Proteins/metabolism , Enzymes/metabolism , Gene Expression Profiling , Genes, Bacterial/genetics , Glycolysis , Kinetics , Metabolism , Metabolomics , Models, Molecular , Multigene Family/genetics , Operon , Substrate Specificity
11.
Nature ; 498(7452): 123-6, 2013 Jun 06.
Article in English | MEDLINE | ID: mdl-23676670

ABSTRACT

The identification of novel metabolites and the characterization of their biological functions are major challenges in biology. X-ray crystallography can reveal unanticipated ligands that persist through purification and crystallization. These adventitious protein-ligand complexes provide insights into new activities, pathways and regulatory mechanisms. We describe a new metabolite, carboxy-S-adenosyl-l-methionine (Cx-SAM), its biosynthetic pathway and its role in transfer RNA modification. The structure of CmoA, a member of the SAM-dependent methyltransferase superfamily, revealed a ligand consistent with Cx-SAM in the catalytic site. Mechanistic analyses showed an unprecedented role for prephenate as the carboxyl donor and the involvement of a unique ylide intermediate as the carboxyl acceptor in the CmoA-mediated conversion of SAM to Cx-SAM. A second member of the SAM-dependent methyltransferase superfamily, CmoB, recognizes Cx-SAM and acts as a carboxymethyltransferase to convert 5-hydroxyuridine into 5-oxyacetyl uridine at the wobble position of multiple tRNAs in Gram-negative bacteria, resulting in expanded codon-recognition properties. CmoA and CmoB represent the first documented synthase and transferase for Cx-SAM. These findings reveal new functional diversity in the SAM-dependent methyltransferase superfamily and expand the metabolic and biological contributions of SAM-based biochemistry. These discoveries highlight the value of structural genomics approaches in identifying ligands within the context of their physiologically relevant macromolecular binding partners, and in revealing their functions.


Subject(s)
Escherichia coli Proteins/metabolism , Methyltransferases/metabolism , One-Carbon Group Transferases/metabolism , RNA, Transfer/genetics , RNA, Transfer/metabolism , S-Adenosylmethionine/analogs & derivatives , S-Adenosylmethionine/chemistry , S-Adenosylmethionine/metabolism , Biocatalysis , Biosynthetic Pathways , Catalytic Domain , Crystallography, X-Ray , Cyclohexanecarboxylic Acids/metabolism , Cyclohexenes/metabolism , Escherichia coli/enzymology , Escherichia coli Proteins/chemistry , Escherichia coli Proteins/genetics , Ligands , Methyltransferases/deficiency , Methyltransferases/genetics , Models, Molecular , Molecular Weight , One-Carbon Group Transferases/chemistry , Protein Multimerization , Protein Structure, Secondary , RNA, Bacterial/chemistry , RNA, Bacterial/genetics , RNA, Bacterial/metabolism , RNA, Transfer/chemistry , S-Adenosylmethionine/biosynthesis , Uridine/analogs & derivatives , Uridine/chemistry , Uridine/metabolism
12.
Nucleic Acids Res ; 45(D1): D190-D199, 2017 01 04.
Article in English | MEDLINE | ID: mdl-27899635

ABSTRACT

InterPro (http://www.ebi.ac.uk/interpro/) is a freely available database used to classify protein sequences into families and to predict the presence of important domains and sites. InterProScan is the underlying software that allows both protein and nucleic acid sequences to be searched against InterPro's predictive models, which are provided by its member databases. Here, we report recent developments with InterPro and its associated software, including the addition of two new databases (SFLD and CDD), and the functionality to include residue-level annotation and prediction of intrinsic disorder. These developments enrich the annotations provided by InterPro, increase the overall number of residues annotated and allow more specific functional inferences.


Subject(s)
Computational Biology/methods , Databases, Protein , Protein Interaction Domains and Motifs , Software , Humans , Molecular Sequence Annotation , Phylogeny
13.
Biochemistry ; 57(31): 4651-4662, 2018 08 07.
Article in English | MEDLINE | ID: mdl-30052428

ABSTRACT

The rapidly expanding number of protein sequences found in public databases can improve our understanding of how protein functions evolve. However, our current knowledge of protein function likely represents a small fraction of the diverse repertoire that exists in nature. Integrative computational methods can facilitate the discovery of new protein functions and enzymatic reactions through the observation and investigation of the complex sequence-structure-function relationships within protein superfamilies. Here, we highlight the use of sequence similarity networks (SSNs) to identify previously unexplored sequence and function space. We exemplify this approach using the nitroreductase (NTR) superfamily. We demonstrate that SSN investigations can provide a rapid and effective means to classify groups of proteins, therefore exposing experimentally unexplored sequences that may exhibit novel functionality. Integration of such approaches with systematic experimental characterization will expand our understanding of the functional diversity of enzymes and their associated physiological roles.


Subject(s)
Databases, Protein , Proteins/chemistry , Amino Acid Sequence , Computational Biology/methods , Evolution, Molecular , Nitroreductases/chemistry , Nitroreductases/metabolism , Proteins/metabolism , Structure-Activity Relationship
14.
PLoS Comput Biol ; 13(2): e1005284, 2017 02.
Article in English | MEDLINE | ID: mdl-28187133

ABSTRACT

Peroxiredoxins (Prxs or Prdxs) are a large protein superfamily of antioxidant enzymes that rapidly detoxify damaging peroxides and/or affect signal transduction and, thus, have roles in proliferation, differentiation, and apoptosis. Prx superfamily members are widespread across phylogeny and multiple methods have been developed to classify them. Here we present an updated atlas of the Prx superfamily identified using a novel method called MISST (Multi-level Iterative Sequence Searching Technique). MISST is an iterative search process developed to be both agglomerative, to add sequences containing similar functional site features, and divisive, to split groups when functional site features suggest distinct functionally-relevant clusters. Superfamily members need not be identified initially-MISST begins with a minimal representative set of known structures and searches GenBank iteratively. Further, the method's novelty lies in the manner in which isofunctional groups are selected; rather than use a single or shifting threshold to identify clusters, the groups are deemed isofunctional when they pass a self-identification criterion, such that the group identifies itself and nothing else in a search of GenBank. The method was preliminarily validated on the Prxs, as the Prxs presented challenges of both agglomeration and division. For example, previous sequence analysis clustered the Prx functional families Prx1 and Prx6 into one group. Subsequent expert analysis clearly identified Prx6 as a distinct functionally relevant group. The MISST process distinguishes these two closely related, though functionally distinct, families. Through MISST search iterations, over 38,000 Prx sequences were identified, which the method divided into six isofunctional clusters, consistent with previous expert analysis. The results represent the most complete computational functional analysis of proteins comprising the Prx superfamily. The feasibility of this novel method is demonstrated by the Prx superfamily results, laying the foundation for potential functionally relevant clustering of the universe of protein sequences.


Subject(s)
Databases, Protein , Peroxiredoxins/chemistry , Peroxiredoxins/classification , Protein Interaction Mapping/methods , Sequence Analysis, Protein/methods , Sequence Homology, Amino Acid , Amino Acid Sequence , Binding Sites , Database Management Systems , Enzyme Activation , High-Throughput Screening Assays/methods , Molecular Sequence Data , Multigene Family , Peroxiredoxins/ultrastructure , Protein Binding
15.
Arch Biochem Biophys ; 636: 50-56, 2017 12 15.
Article in English | MEDLINE | ID: mdl-29111295

ABSTRACT

A Pseudomonas sp. UW4 protein (UniProt K9NIA5) of unknown function was identified as similar to 4-oxalocrotonate tautomerase (4-OT)-like and cis-3-chloroacrylic acid dehalogenase (cis-CaaD)-like subgroups of the tautomerase superfamily (TSF). This protein lacks only Tyr-103 of the amino acids critical for cis-CaaD activity (Pro-1, His-28, Arg-70, Arg-73, Tyr-103, Glu-114). As it may represent an important variant of these enzymes, its kinetic and structural properties have been determined. The protein shows tautomerase activity with phenylenolpyruvate, but lacks native 4-OT activity and dehalogenase activity with the isomers of 3-chloroacrylic acid. It shows mostly low-level hydratase activity at pH 7.0, converting 2-oxo-3-pentynoate to acetopyruvate, consistent with cis-CaaD-like behavior. At pH 9.0, this compound results primarily in covalent modification of Pro-1, which is consistent with 4-OT-like behavior. These observations could reflect a pKa for Pro-1 that is closer to that of cis-CaaD (∼9.2) than to 4-OT (∼6.4). A structure of the native enzyme, at 2.6 Å resolution, highlights differences at the active site from those of 4-OT and cis-CaaD that add to our understanding of how contemporary TSF reactions and mechanisms may have diverged from a common 4-OT-like ancestor.


Subject(s)
Bacterial Proteins/chemistry , Hydrolases/chemistry , Pseudomonas/enzymology , Crystallography, X-Ray , Kinetics , Protein Domains
16.
PLoS Biol ; 12(4): e1001843, 2014 Apr.
Article in English | MEDLINE | ID: mdl-24756107

ABSTRACT

The cytosolic glutathione transferase (cytGST) superfamily comprises more than 13,000 nonredundant sequences found throughout the biosphere. Their key roles in metabolism and defense against oxidative damage have led to thousands of studies over several decades. Despite this attention, little is known about the physiological reactions they catalyze and most of the substrates used to assay cytGSTs are synthetic compounds. A deeper understanding of relationships across the superfamily could provide new clues about their functions. To establish a foundation for expanded classification of cytGSTs, we generated similarity-based subgroupings for the entire superfamily. Using the resulting sequence similarity networks, we chose targets that broadly covered unknown functions and report here experimental results confirming GST-like activity for 82 of them, along with 37 new 3D structures determined for 27 targets. These new data, along with experimentally known GST reactions and structures reported in the literature, were painted onto the networks to generate a global view of their sequence-structure-function relationships. The results show how proteins of both known and unknown function relate to each other across the entire superfamily and reveal that the great majority of cytGSTs have not been experimentally characterized or annotated by canonical class. A mapping of taxonomic classes across the superfamily indicates that many taxa are represented in each subgroup and highlights challenges for classification of superfamily sequences into functionally relevant classes. Experimental determination of disulfide bond reductase activity in many diverse subgroups illustrate a theme common for many reaction types. Finally, sequence comparison between an enzyme that catalyzes a reductive dechlorination reaction relevant to bioremediation efforts with some of its closest homologs reveals differences among them likely to be associated with evolution of this unusual reaction. Interactive versions of the networks, associated with functional and other types of information, can be downloaded from the Structure-Function Linkage Database (SFLD; http://sfld.rbvi.ucsf.edu).


Subject(s)
Glutathione Transferase/genetics , Glutathione Transferase/ultrastructure , Models, Molecular , Amino Acid Sequence , Base Sequence , Binding Sites , Computational Biology , Databases, Protein , Glutathione/chemistry , Protein Structure, Tertiary , Sequence Alignment , Structure-Activity Relationship
17.
Nucleic Acids Res ; 43(9): 4602-13, 2015 May 19.
Article in English | MEDLINE | ID: mdl-25855808

ABSTRACT

Enzyme-mediated modifications at the wobble position of tRNAs are essential for the translation of the genetic code. We report the genetic, biochemical and structural characterization of CmoB, the enzyme that recognizes the unique metabolite carboxy-S-adenosine-L-methionine (Cx-SAM) and catalyzes a carboxymethyl transfer reaction resulting in formation of 5-oxyacetyluridine at the wobble position of tRNAs. CmoB is distinctive in that it is the only known member of the SAM-dependent methyltransferase (SDMT) superfamily that utilizes a naturally occurring SAM analog as the alkyl donor to fulfill a biologically meaningful function. Biochemical and genetic studies define the in vitro and in vivo selectivity for Cx-SAM as alkyl donor over the vastly more abundant SAM. Complementary high-resolution structures of the apo- and Cx-SAM bound CmoB reveal the determinants responsible for this remarkable discrimination. Together, these studies provide mechanistic insight into the enzymatic and non-enzymatic feature of this alkyl transfer reaction which affords the broadened specificity required for tRNAs to recognize multiple synonymous codons.


Subject(s)
Escherichia coli Proteins/chemistry , Methyltransferases/chemistry , RNA, Transfer/metabolism , S-Adenosylmethionine/analogs & derivatives , Binding Sites , Escherichia coli Proteins/genetics , Escherichia coli Proteins/metabolism , Ligands , Methyltransferases/genetics , Methyltransferases/metabolism , Mutation , RNA, Transfer/chemistry , S-Adenosylmethionine/chemistry , Thermodynamics
18.
BMC Bioinformatics ; 17(1): 458, 2016 Nov 11.
Article in English | MEDLINE | ID: mdl-27835946

ABSTRACT

BACKGROUND: Development of automatable processes for clustering proteins into functionally relevant groups is a critical hurdle as an increasing number of sequences are deposited into databases. Experimental function determination is exceptionally time-consuming and can't keep pace with the identification of protein sequences. A tool, DASP (Deacon Active Site Profiler), was previously developed to identify protein sequences with active site similarity to a query set. Development of two iterative, automatable methods for clustering proteins into functionally relevant groups exposed algorithmic limitations to DASP. RESULTS: The accuracy and efficiency of DASP was significantly improved through six algorithmic enhancements implemented in two stages: DASP2 and DASP3. Validation demonstrated DASP3 provides greater score separation between true positives and false positives than earlier versions. In addition, DASP3 shows similar performance to previous versions in clustering protein structures into isofunctional groups (validated against manual curation), but DASP3 gathers and clusters protein sequences into isofunctional groups more efficiently than DASP and DASP2. CONCLUSIONS: DASP algorithmic enhancements resulted in improved efficiency and accuracy of identifying proteins that contain active site features similar to those of the query set. These enhancements provide incremental improvement in structure database searches and initial sequence database searches; however, the enhancements show significant improvement in iterative sequence searches, suggesting DASP3 is an appropriate tool for the iterative processes required for clustering proteins into isofunctional groups.


Subject(s)
Algorithms , Sequence Analysis, Protein/methods , Amino Acid Motifs , Amino Acid Sequence , Catalytic Domain , Cluster Analysis , Databases, Protein , Proteins/chemistry
19.
Nucleic Acids Res ; 42(Database issue): D521-30, 2014 Jan.
Article in English | MEDLINE | ID: mdl-24271399

ABSTRACT

The Structure-Function Linkage Database (SFLD, http://sfld.rbvi.ucsf.edu/) is a manually curated classification resource describing structure-function relationships for functionally diverse enzyme superfamilies. Members of such superfamilies are diverse in their overall reactions yet share a common ancestor and some conserved active site features associated with conserved functional attributes such as a partial reaction. Thus, despite their different functions, members of these superfamilies 'look alike', making them easy to misannotate. To address this complexity and enable rational transfer of functional features to unknowns only for those members for which we have sufficient functional information, we subdivide superfamily members into subgroups using sequence information, and lastly into families, sets of enzymes known to catalyze the same reaction using the same mechanistic strategy. Browsing and searching options in the SFLD provide access to all of these levels. The SFLD offers manually curated as well as automatically classified superfamily sets, both accompanied by search and download options for all hierarchical levels. Additional information includes multiple sequence alignments, tab-separated files of functional and other attributes, and sequence similarity networks. The latter provide a new and intuitively powerful way to visualize functional trends mapped to the context of sequence similarity.


Subject(s)
Databases, Protein , Enzymes/chemistry , Enzymes/classification , Enzymes/metabolism , Internet , Molecular Sequence Annotation , Sequence Alignment , Structure-Activity Relationship
20.
Proc Natl Acad Sci U S A ; 110(36): E3381-7, 2013 Sep 03.
Article in English | MEDLINE | ID: mdl-23959887

ABSTRACT

Although the universe of protein structures is vast, these innumerable structures can be categorized into a finite number of folds. New functions commonly evolve by elaboration of existing scaffolds, for example, via domain insertions. Thus, understanding structural diversity of a protein fold evolving via domain insertions is a fundamental challenge. The haloalkanoic dehalogenase superfamily serves as an excellent model system wherein a variable cap domain accessorizes the ubiquitous Rossmann-fold core domain. Here, we determine the impact of the cap-domain insertion on the sequence and structure divergence of the core domain. Through quantitative analysis on a unique dataset of 154 core-domain-only and cap-domain-only structures, basic principles of their evolution have been uncovered. The relationship between sequence and structure divergence of the core domain is shown to be monotonic and independent of the corresponding type of domain insert, reflecting the robustness of the Rossmann fold to mutation. However, core domains with the same cap type share greater similarity at the sequence and structure levels, suggesting interplay between the cap and core domains. Notably, results reveal that the variance in structure maps to α-helices flanking the central ß-sheet and not to the domain-domain interface. Collectively, these results hint at intramolecular coevolution where the fold diverges differentially in the context of an accessory domain, a feature that might also apply to other multidomain superfamilies.


Subject(s)
Hydrolases/chemistry , Protein Structure, Secondary , Protein Structure, Tertiary , Evolution, Molecular , Genetic Variation , Hydrolases/classification , Hydrolases/genetics , Models, Molecular , Mutagenesis, Insertional , Phylogeny , Principal Component Analysis , Protein Folding
SELECTION OF CITATIONS
SEARCH DETAIL