Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 103
Filtrar
1.
Biochemistry ; 60(22): 1776-1786, 2021 06 08.
Artículo en Inglés | MEDLINE | ID: mdl-34019384

RESUMEN

The tautomerase superfamily (TSF) is a collection of enzymes and proteins that share a simple ß-α-ß structural scaffold. Most members are constructed from a single-core ß-α-ß motif or two consecutively fused ß-α-ß motifs in which the N-terminal proline (Pro-1) plays a key and unusual role as a catalytic residue. The cumulative evidence suggests that a gene fusion event took place in the evolution of the TSF followed by duplication (of the newly fused gene) to result in the diversification of activity that is seen today. Analysis of the sequence similarity network (SSN) for the TSF identified several linking proteins ("linkers") whose similarity links subgroups of these contemporary proteins that might hold clues about structure-function relationship changes accompanying the emergence of new activities. A previously uncharacterized pair of linkers (designated N1 and N2) was identified in the SSN that connected the 4-oxalocrotonate tautomerase (4-OT) and cis-3-chloroacrylic acid dehalogenase (cis-CaaD) subgroups. N1, in the cis-CaaD subgroup, has the full complement of active site residues for cis-CaaD activity, whereas N2, in the 4-OT subgroup, lacks a key arginine (Arg-39) for canonical 4-OT activity. Kinetic characterization and nuclear magnetic resonance analysis show that N1 has activities observed for other characterized members of the cis-CaaD subgroup with varying degrees of efficiencies. N2 is a modest 4-OT but shows enhanced hydratase activity using allene and acetylene compounds, which might be due to the presence of Arg-8 along with Arg-11. Crystallographic analysis provides a structural context for these observations.


Asunto(s)
Hidrolasas/química , Isomerasas/química , Secuencia de Aminoácidos , Sitios de Unión/fisiología , Catálisis , Dominio Catalítico/fisiología , Evolución Molecular , Cinética , Espectroscopía de Resonancia Magnética/métodos , Modelos Químicos
2.
Science ; 371(6533)2021 03 05.
Artículo en Inglés | MEDLINE | ID: mdl-33674467

RESUMEN

The mechanisms that underly the adaptation of enzyme activities and stabilities to temperature are fundamental to our understanding of molecular evolution and how enzymes work. Here, we investigate the molecular and evolutionary mechanisms of enzyme temperature adaption, combining deep mechanistic studies with comprehensive sequence analyses of thousands of enzymes. We show that temperature adaptation in ketosteroid isomerase (KSI) arises primarily from one residue change with limited, local epistasis, and we establish the underlying physical mechanisms. This residue change occurs in diverse KSI backgrounds, suggesting parallel adaptation to temperature. We identify residues associated with organismal growth temperature across 1005 diverse bacterial enzyme families, suggesting widespread parallel adaptation to temperature. We assess the residue properties, molecular interactions, and interaction networks that appear to underly temperature adaptation.


Asunto(s)
Adaptación Fisiológica , Proteínas Bacterianas/química , Evolución Molecular , Esteroide Isomerasas/química , Sustitución de Aminoácidos , Proteínas Bacterianas/genética , Estabilidad de Enzimas , Mutación , Esteroide Isomerasas/genética , Temperatura
3.
Database (Oxford) ; 20202020 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-32449511

RESUMEN

Determining the molecular function of enzymes discovered by genome sequencing represents a primary foundation for understanding many aspects of biology. Historically, classification of enzyme reactions has used the enzyme nomenclature system developed to describe the overall reactions performed by biochemically characterized enzymes, irrespective of their associated sequences. In contrast, functional classification and assignment for the millions of protein sequences of unknown function now available is largely done in two computational steps, first by similarity-based assignment of newly obtained sequences to homologous groups, followed by transferring to them the known functions of similar biochemically characterized homologs. Due to the fundamental differences in their etiologies and practice, `how' these chemistry- and evolution-centric functional classification systems relate to each other has been difficult to explore on a large scale. To investigate this issue in a new way, we integrated two published ontologies that had previously described each of these classification systems independently. The resulting infrastructure was then used to compare the functional assignments obtained from each classification system for the well-studied and functionally diverse enolase superfamily. Mapping these function assignments to protein structure and reaction similarity networks shows a profound and complex disconnect between the homology- and chemistry-based classification systems. This conclusion mirrors previous observations suggesting that except for closely related sequences, facile annotation transfer from small numbers of characterized enzymes to the huge number uncharacterized homologs to which they are related is problematic. Our extension of these comparisons to large enzyme superfamilies in a computationally intelligent manner provides a foundation for new directions in protein function prediction for the huge proportion of sequences of unknown function represented in major databases. Interactive sequence, reaction, substrate and product similarity networks computed for this work for the enolase and two other superfamilies are freely available for download from the Structure Function Linkage Database Archive (http://sfld.rbvi.ucsf.edu).


Asunto(s)
Biología Computacional/métodos , Bases de Datos de Proteínas , Enzimas , Enzimas/química , Enzimas/clasificación , Enzimas/fisiología , Anotación de Secuencia Molecular , Relación Estructura-Actividad
4.
Biochemistry ; 59(16): 1592-1603, 2020 04 28.
Artículo en Inglés | MEDLINE | ID: mdl-32242662

RESUMEN

Tautomerase superfamily (TSF) members are constructed from a single ß-α-ß unit or two consecutively joined ß-α-ß units. This pattern prevails throughout the superfamily consisting of more than 11000 members where homo- or heterohexamers are localized in the 4-oxalocrotonate tautomerase (4-OT) subgroup and trimers are found in the other four subgroups. One exception is a subset of sequences that are double the length of the short 4-OTs in the 4-OT subgroup, where the coded proteins form trimers. Characterization of two members revealed an interesting dichotomy. One is a symmetric trimer, whereas the other is an asymmetric trimer. One monomer is flipped 180° relative to the other two monomers so that three unique protein-protein interfaces are created that are composed of different residues. A bioinformatics analysis of the fused 4-OT subset shows a further division into two clusters with a total of 133 sequences. The analysis showed that members of one cluster (86 sequences) have more salt bridges if the asymmetric trimer forms, whereas the members of the other cluster (47 sequences) have more salt bridges if the symmetric trimer forms. This hypothesis was examined by the kinetic and structural characterization of two proteins within each cluster. As predicted, all four proteins function as 4-OTs, where two assemble into asymmetric trimers (designated R7 and F6) and two form symmetric trimers (designated W0 and Q0). These findings can be extended to the other sequences in the two clusters in the fused 4-OT subset, thereby annotating their oligomer properties and activities.


Asunto(s)
Proteínas Bacterianas/química , Isomerasas/química , Estructura Cuaternaria de Proteína , Alcaligenaceae/enzimología , Secuencia de Aminoácidos , Sitios de Unión , Bordetella/enzimología , Burkholderia/enzimología , Burkholderiaceae/enzimología , Biología Computacional , Cinética , Alineación de Secuencia
5.
Methods Enzymol ; 620: 315-347, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31072492

RESUMEN

Integrative computational methods can facilitate the discovery of new protein functions and enzymatic reactions by enabling the observation and investigation of complex sequence-structure-function and evolutionary relationships within protein superfamilies. Here, we highlight the use of sequence similarity networks (SSNs) and phylogenetic reconstructions to map the functional divergence and evolutionary history of protein superfamilies. We exemplify this approach using the nitroreductase (NTR) flavoenzyme superfamily, demonstrating that SSN investigations can provide a rapid and effective means to classify groups of proteins, expose sequence similarity relationships across the global scale of a protein superfamily, and efficiently support detailed phylogenetic analyses. Integration of such approaches with systematic experimental characterization will expand our understanding of the functional diversity of enzymes, their evolution, and their associated physiological roles.


Asunto(s)
Biología Computacional/métodos , Nitrorreductasas/química , Bases de Datos de Proteínas , Evolución Molecular , Modelos Moleculares , Nitrorreductasas/genética , Nitrorreductasas/metabolismo , Filogenia , Análisis de Secuencia de Proteína
6.
Biochemistry ; 58(22): 2617-2627, 2019 06 04.
Artículo en Inglés | MEDLINE | ID: mdl-31074977

RESUMEN

A 4-oxalocrotonate tautomerase (4-OT) trimer has been isolated from Burkholderia lata, and a kinetic, mechanistic, and structural analysis has been performed. The enzyme is the third described oligomer state for 4-OT along with a homo- and heterohexamer. The 4-OT trimer is part of a small subset of sequences (133 sequences) within the 4-OT subgroup of the tautomerase superfamily (TSF). The TSF has two distinct features: members are composed of a single ß-α-ß unit (homo- and heterohexamer) or two consecutively joined ß-α-ß units (trimer) and generally have a catalytic amino-terminal proline. The enzyme, designated as fused 4-OT, functions as a 4-OT where the active site groups (Pro-1, Arg-39, Arg-76, Phe-115, Arg-127) mirror those in the canonical 4-OT from Pseudomonas putida mt-2. Inactivation by 2-oxo-3-pentynoate suggests that Pro-1 of fused 4-OT has a low p Ka enabling the prolyl nitrogen to function as a general base. A remarkable feature of the fused 4-OT is the absence of P3 rotational symmetry in the structure (1.5 Å resolution). The asymmetric arrangement of the trimer is not due to the fusion of the two ß-α-ß building blocks because an engineered "unfused" variant that breaks the covalent bond between the two units (to generate a heterohexamer) assumes the same asymmetric oligomerization state. It remains unknown how the different active site configurations contribute to the observed overall activities and whether the asymmetry has a biological purpose or role in the evolution of TSF members.


Asunto(s)
Proteínas Bacterianas/química , Isomerasas/química , Secuencia de Aminoácidos , Proteínas Bacterianas/genética , Proteínas Bacterianas/aislamiento & purificación , Burkholderia/enzimología , Dominio Catalítico , Ácidos Grasos Insaturados/química , Isomerasas/genética , Isomerasas/aislamiento & purificación , Cinética , Modelos Químicos , Mutación , Estructura Cuaternaria de Proteína , Pseudomonas putida/enzimología , Alineación de Secuencia
7.
Bioinformatics ; 35(3): 442-451, 2019 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-30084920

RESUMEN

Motivation: Critical evaluation of methods for protein function prediction shows that data integration improves the performance of methods that predict protein function, but a basic BLAST-based method is still a top contender. We sought to engineer a method that modernizes the classical approach while avoiding pitfalls common to state-of-the-art methods. Results: We present a method for predicting protein function, Effusion, which uses a sequence similarity network to add context for homology transfer, a probabilistic model to account for the uncertainty in labels and function propagation, and the structure of the Gene Ontology (GO) to best utilize sparse input labels and make consistent output predictions. Effusion's model makes it practical to integrate rare experimental data and abundant primary sequence and sequence similarity. We demonstrate Effusion's performance using a critical evaluation method and provide an in-depth analysis. We also dissect the design decisions we used to address challenges for predicting protein function. Finally, we propose directions in which the framework of the method can be modified for additional predictive power. Availability and implementation: The source code for an implementation of Effusion is freely available at https://github.com/babbittlab/effusion. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional , Proteínas/química , Programas Informáticos , Ontología de Genes
8.
Nucleic Acids Res ; 47(D1): D351-D360, 2019 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-30398656

RESUMEN

The InterPro database (http://www.ebi.ac.uk/interpro/) classifies protein sequences into families and predicts the presence of functionally important domains and sites. Here, we report recent developments with InterPro (version 70.0) and its associated software, including an 18% growth in the size of the database in terms on new InterPro entries, updates to content, the inclusion of an additional entry type, refined modelling of discontinuous domains, and the development of a new programmatic interface and website. These developments extend and enrich the information provided by InterPro, and provide greater flexibility in terms of data access. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB, and discuss how our evaluation of residue coverage may help guide future curation activities.


Asunto(s)
Bases de Datos de Proteínas , Anotación de Secuencia Molecular , Animales , Bases de Datos Genéticas , Ontología de Genes , Humanos , Internet , Familia de Multigenes , Dominios Proteicos/genética , Homología de Secuencia de Aminoácido , Programas Informáticos , Interfaz Usuario-Computador
9.
Methods Enzymol ; 606: 1-71, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30097089

RESUMEN

The radical SAM superfamily contains over 100,000 homologous enzymes that catalyze a remarkably broad range of reactions required for life, including metabolism, nucleic acid modification, and biogenesis of cofactors. While the highly conserved SAM-binding motif responsible for formation of the key 5'-deoxyadenosyl radical intermediate is a key structural feature that simplifies identification of superfamily members, our understanding of their structure-function relationships is complicated by the modular nature of their structures, which exhibit varied and complex domain architectures. To gain new insight about these relationships, we classified the entire set of sequences into similarity-based subgroups that could be visualized using sequence similarity networks. This superfamily-wide analysis reveals important features that had not previously been appreciated from studies focused on one or a few members. Functional information mapped to the networks indicates which members have been experimentally or structurally characterized, their known reaction types, and their phylogenetic distribution. Despite the biological importance of radical SAM chemistry, the vast majority of superfamily members have never been experimentally characterized in any way, suggesting that many new reactions remain to be discovered. In addition to 20 subgroups with at least one known function, we identified additional subgroups made up entirely of sequences of unknown function. Importantly, our results indicate that even general reaction types fail to track well with our sequence similarity-based subgroupings, raising major challenges for function prediction for currently identified and new members that continue to be discovered. Interactive similarity networks and other data from this analysis are available from the Structure-Function Linkage Database.


Asunto(s)
Enzimas/clasificación , Radicales Libres/metabolismo , Dominios Proteicos/genética , S-Adenosilmetionina/metabolismo , Secuencia de Aminoácidos/genética , Biología Computacional , Enzimas/química , Enzimas/genética , Enzimas/metabolismo , Evolución Molecular , Radicales Libres/química , Filogenia , S-Adenosilmetionina/química , Alineación de Secuencia , Relación Estructura-Actividad
10.
Biochemistry ; 57(31): 4651-4662, 2018 08 07.
Artículo en Inglés | MEDLINE | ID: mdl-30052428

RESUMEN

The rapidly expanding number of protein sequences found in public databases can improve our understanding of how protein functions evolve. However, our current knowledge of protein function likely represents a small fraction of the diverse repertoire that exists in nature. Integrative computational methods can facilitate the discovery of new protein functions and enzymatic reactions through the observation and investigation of the complex sequence-structure-function relationships within protein superfamilies. Here, we highlight the use of sequence similarity networks (SSNs) to identify previously unexplored sequence and function space. We exemplify this approach using the nitroreductase (NTR) superfamily. We demonstrate that SSN investigations can provide a rapid and effective means to classify groups of proteins, therefore exposing experimentally unexplored sequences that may exhibit novel functionality. Integration of such approaches with systematic experimental characterization will expand our understanding of the functional diversity of enzymes and their associated physiological roles.


Asunto(s)
Bases de Datos de Proteínas , Proteínas/química , Secuencia de Aminoácidos , Biología Computacional/métodos , Evolución Molecular , Nitrorreductasas/química , Nitrorreductasas/metabolismo , Proteínas/metabolismo , Relación Estructura-Actividad
12.
J Biol Chem ; 293(7): 2342-2357, 2018 02 16.
Artículo en Inglés | MEDLINE | ID: mdl-29184004

RESUMEN

The tautomerase superfamily (TSF) consists of more than 11,000 nonredundant sequences present throughout the biosphere. Characterized members have attracted much attention because of the unusual and key catalytic role of an N-terminal proline. These few characterized members catalyze a diverse range of chemical reactions, but the full scale of their chemical capabilities and biological functions remains unknown. To gain new insight into TSF structure-function relationships, we performed a global analysis of similarities across the entire superfamily and computed a sequence similarity network to guide classification into distinct subgroups. Our results indicate that TSF members are found in all domains of life, with most being present in bacteria. The eukaryotic members of the cis-3-chloroacrylic acid dehalogenase subgroup are limited to fungal species, whereas the macrophage migration inhibitory factor subgroup has wide eukaryotic representation (including mammals). Unexpectedly, we found that 346 TSF sequences lack Pro-1, of which 85% are present in the malonate semialdehyde decarboxylase subgroup. The computed network also enabled the identification of similarity paths, namely sequences that link functionally diverse subgroups and exhibit transitional structural features that may help explain reaction divergence. A structure-guided comparison of these linker proteins identified conserved transitions between them, and kinetic analysis paralleled these observations. Phylogenetic reconstruction of the linker set was consistent with these findings. Our results also suggest that contemporary TSF members may have evolved from a short 4-oxalocrotonate tautomerase-like ancestor followed by gene duplication and fusion. Our new linker-guided strategy can be used to enrich the discovery of sequence/structure/function transitions in other enzyme superfamilies.


Asunto(s)
Enzimas/química , Enzimas/metabolismo , Eucariontes/enzimología , Familia de Multigenes , Secuencia de Aminoácidos , Animales , Sitios de Unión , Cristalografía por Rayos X , Enzimas/genética , Eucariontes/química , Eucariontes/clasificación , Eucariontes/genética , Evolución Molecular , Humanos , Cinética , Datos de Secuencia Molecular , Filogenia , Plantas/química , Plantas/enzimología , Plantas/genética , Alineación de Secuencia
13.
Arch Biochem Biophys ; 636: 50-56, 2017 12 15.
Artículo en Inglés | MEDLINE | ID: mdl-29111295

RESUMEN

A Pseudomonas sp. UW4 protein (UniProt K9NIA5) of unknown function was identified as similar to 4-oxalocrotonate tautomerase (4-OT)-like and cis-3-chloroacrylic acid dehalogenase (cis-CaaD)-like subgroups of the tautomerase superfamily (TSF). This protein lacks only Tyr-103 of the amino acids critical for cis-CaaD activity (Pro-1, His-28, Arg-70, Arg-73, Tyr-103, Glu-114). As it may represent an important variant of these enzymes, its kinetic and structural properties have been determined. The protein shows tautomerase activity with phenylenolpyruvate, but lacks native 4-OT activity and dehalogenase activity with the isomers of 3-chloroacrylic acid. It shows mostly low-level hydratase activity at pH 7.0, converting 2-oxo-3-pentynoate to acetopyruvate, consistent with cis-CaaD-like behavior. At pH 9.0, this compound results primarily in covalent modification of Pro-1, which is consistent with 4-OT-like behavior. These observations could reflect a pKa for Pro-1 that is closer to that of cis-CaaD (∼9.2) than to 4-OT (∼6.4). A structure of the native enzyme, at 2.6 Å resolution, highlights differences at the active site from those of 4-OT and cis-CaaD that add to our understanding of how contemporary TSF reactions and mechanisms may have diverged from a common 4-OT-like ancestor.


Asunto(s)
Proteínas Bacterianas/química , Hidrolasas/química , Pseudomonas/enzimología , Cristalografía por Rayos X , Cinética , Dominios Proteicos
14.
Proc Natl Acad Sci U S A ; 114(45): E9549-E9558, 2017 11 07.
Artículo en Inglés | MEDLINE | ID: mdl-29078300

RESUMEN

Insight regarding how diverse enzymatic functions and reactions have evolved from ancestral scaffolds is fundamental to understanding chemical and evolutionary biology, and for the exploitation of enzymes for biotechnology. We undertook an extensive computational analysis using a unique and comprehensive combination of tools that include large-scale phylogenetic reconstruction to determine the sequence, structural, and functional relationships of the functionally diverse flavin mononucleotide-dependent nitroreductase (NTR) superfamily (>24,000 sequences from all domains of life, 54 structures, and >10 enzymatic functions). Our results suggest an evolutionary model in which contemporary subgroups of the superfamily have diverged in a radial manner from a minimal flavin-binding scaffold. We identified the structural design principle for this divergence: Insertions at key positions in the minimal scaffold that, combined with the fixation of key residues, have led to functional specialization. These results will aid future efforts to delineate the emergence of functional diversity in enzyme superfamilies, provide clues for functional inference for superfamily members of unknown function, and facilitate rational redesign of the NTR scaffold.


Asunto(s)
Nitrorreductasas/genética , Biología Computacional/métodos , Evolución Molecular , Mononucleótido de Flavina/genética , Modelos Moleculares , Filogenia
16.
Database (Oxford) ; 2017(1)2017 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-28365730

RESUMEN

With ever-increasing amounts of sequence data available in both the primary literature and sequence repositories, there is a bottleneck in annotating molecular function to a sequence. This article describes the biocuration process and methods used in the structure-function linkage database (SFLD) to help address some of the challenges. We discuss how the hierarchy within the SFLD allows us to infer detailed functional properties for functionally diverse enzyme superfamilies in which all members are homologous, conserve an aspect of their chemical function and have associated conserved structural features that enable the chemistry. Also presented is the Enzyme Structure-Function Ontology (ESFO), which has been designed to capture the relationships between enzyme sequence, structure and function that underlie the SFLD and is used to guide the biocuration processes within the SFLD. Database URL: http://sfld.rbvi.ucsf.edu/.


Asunto(s)
Bases de Datos de Proteínas , Enzimas/química , Enzimas/genética , Ontología de Genes , Anotación de Secuencia Molecular , Homología Estructural de Proteína , Relación Estructura-Actividad
17.
PLoS Comput Biol ; 13(2): e1005284, 2017 02.
Artículo en Inglés | MEDLINE | ID: mdl-28187133

RESUMEN

Peroxiredoxins (Prxs or Prdxs) are a large protein superfamily of antioxidant enzymes that rapidly detoxify damaging peroxides and/or affect signal transduction and, thus, have roles in proliferation, differentiation, and apoptosis. Prx superfamily members are widespread across phylogeny and multiple methods have been developed to classify them. Here we present an updated atlas of the Prx superfamily identified using a novel method called MISST (Multi-level Iterative Sequence Searching Technique). MISST is an iterative search process developed to be both agglomerative, to add sequences containing similar functional site features, and divisive, to split groups when functional site features suggest distinct functionally-relevant clusters. Superfamily members need not be identified initially-MISST begins with a minimal representative set of known structures and searches GenBank iteratively. Further, the method's novelty lies in the manner in which isofunctional groups are selected; rather than use a single or shifting threshold to identify clusters, the groups are deemed isofunctional when they pass a self-identification criterion, such that the group identifies itself and nothing else in a search of GenBank. The method was preliminarily validated on the Prxs, as the Prxs presented challenges of both agglomeration and division. For example, previous sequence analysis clustered the Prx functional families Prx1 and Prx6 into one group. Subsequent expert analysis clearly identified Prx6 as a distinct functionally relevant group. The MISST process distinguishes these two closely related, though functionally distinct, families. Through MISST search iterations, over 38,000 Prx sequences were identified, which the method divided into six isofunctional clusters, consistent with previous expert analysis. The results represent the most complete computational functional analysis of proteins comprising the Prx superfamily. The feasibility of this novel method is demonstrated by the Prx superfamily results, laying the foundation for potential functionally relevant clustering of the universe of protein sequences.


Asunto(s)
Bases de Datos de Proteínas , Peroxirredoxinas/química , Peroxirredoxinas/clasificación , Mapeo de Interacción de Proteínas/métodos , Análisis de Secuencia de Proteína/métodos , Homología de Secuencia de Aminoácido , Secuencia de Aminoácidos , Sitios de Unión , Sistemas de Administración de Bases de Datos , Activación Enzimática , Ensayos Analíticos de Alto Rendimiento/métodos , Datos de Secuencia Molecular , Familia de Multigenes , Peroxirredoxinas/ultraestructura , Unión Proteica
18.
Protein Sci ; 26(4): 677-699, 2017 04.
Artículo en Inglés | MEDLINE | ID: mdl-28054422

RESUMEN

Protein function identification remains a significant problem. Solving this problem at the molecular functional level would allow mechanistic determinant identification-amino acids that distinguish details between functional families within a superfamily. Active site profiling was developed to identify mechanistic determinants. DASP and DASP2 were developed as tools to search sequence databases using active site profiling. Here, TuLIP (Two-Level Iterative clustering Process) is introduced as an iterative, divisive clustering process that utilizes active site profiling to separate structurally characterized superfamily members into functionally relevant clusters. Underlying TuLIP is the observation that functionally relevant families (curated by Structure-Function Linkage Database, SFLD) self-identify in DASP2 searches; clusters containing multiple functional families do not. Each TuLIP iteration produces candidate clusters, each evaluated to determine if it self-identifies using DASP2. If so, it is deemed a functionally relevant group. Divisive clustering continues until each structure is either a functionally relevant group member or a singlet. TuLIP is validated on enolase and glutathione transferase structures, superfamilies well-curated by SFLD. Correlation is strong; small numbers of structures prevent statistically significant analysis. TuLIP-identified enolase clusters are used in DASP2 GenBank searches to identify sequences sharing functional site features. Analysis shows a true positive rate of 96%, false negative rate of 4%, and maximum false positive rate of 4%. F-measure and performance analysis on the enolase search results and comparison to GEMMA and SCI-PHY demonstrate that TuLIP avoids the over-division problem of these methods. Mechanistic determinants for enolase families are evaluated and shown to correlate well with literature results.


Asunto(s)
Bases de Datos de Proteínas , Glutatión Transferasa/química , Glutatión Transferasa/genética , Fosfopiruvato Hidratasa/química , Fosfopiruvato Hidratasa/genética , Análisis de Secuencia de Proteína/métodos
19.
Nucleic Acids Res ; 45(D1): D190-D199, 2017 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-27899635

RESUMEN

InterPro (http://www.ebi.ac.uk/interpro/) is a freely available database used to classify protein sequences into families and to predict the presence of important domains and sites. InterProScan is the underlying software that allows both protein and nucleic acid sequences to be searched against InterPro's predictive models, which are provided by its member databases. Here, we report recent developments with InterPro and its associated software, including the addition of two new databases (SFLD and CDD), and the functionality to include residue-level annotation and prediction of intrinsic disorder. These developments enrich the annotations provided by InterPro, increase the overall number of residues annotated and allow more specific functional inferences.


Asunto(s)
Biología Computacional/métodos , Bases de Datos de Proteínas , Dominios y Motivos de Interacción de Proteínas , Programas Informáticos , Humanos , Anotación de Secuencia Molecular , Filogenia
20.
Methods Mol Biol ; 1446: 111-132, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-27812939

RESUMEN

The Gene Ontology (GO) (Ashburner et al., Nat Genet 25(1):25-29, 2000) is a powerful tool in the informatics arsenal of methods for evaluating annotations in a protein dataset. From identifying the nearest well annotated homologue of a protein of interest to predicting where misannotation has occurred to knowing how confident you can be in the annotations assigned to those proteins is critical. In this chapter we explore what makes an enzyme unique and how we can use GO to infer aspects of protein function based on sequence similarity. These can range from identification of misannotation or other errors in a predicted function to accurate function prediction for an enzyme of entirely unknown function. Although GO annotation applies to any gene products, we focus here a describing our approach for hierarchical classification of enzymes in the Structure-Function Linkage Database (SFLD) (Akiva et al., Nucleic Acids Res 42(Database issue):D521-530, 2014) as a guide for informed utilisation of annotation transfer based on GO terms.


Asunto(s)
Enzimas/clasificación , Enzimas/genética , Ontología de Genes , Anotación de Secuencia Molecular/métodos , Animales , Biología Computacional/métodos , Bases de Datos de Proteínas , Enzimas/metabolismo , Humanos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...