Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 319
1.
BMC Bioinformatics ; 25(1): 50, 2024 Jan 30.
Article En | MEDLINE | ID: mdl-38291384

BACKGROUND: Enzymes play an irreplaceable and important role in maintaining the lives of living organisms. The Enzyme Commission (EC) number of an enzyme indicates its essential functions. Correct identification of the first digit (family class) of the EC number for a given enzyme is a hot topic in the past twenty years. Several previous methods adopted functional domain composition to represent enzymes. However, it would lead to dimension disaster, thereby reducing the efficiency of the methods. On the other hand, most previous methods can only deal with enzymes belonging to one family class. In fact, several enzymes belong to two or more family classes. RESULTS: In this study, a fast and efficient multi-label classifier, named PredictEFC, was designed. To construct this classifier, a novel feature extraction scheme was designed for processing functional domain information of enzymes, which counting the distribution of each functional domain entry across seven family classes in the training dataset. Based on this scheme, each training or test enzyme was encoded into a 7-dimenion vector by fusing its functional domain information and above statistical results. Random k-labelsets (RAKEL) was adopted to build the classifier, where random forest was selected as the base classification algorithm. The two tenfold cross-validation results on the training dataset shown that the accuracy of PredictEFC can reach 0.8493 and 0.8370. The independent test on two datasets indicated the accuracy values of 0.9118 and 0.8777. CONCLUSION: The performance of PredictEFC was slightly lower than the classifier directly using functional domain composition. However, its efficiency was sharply improved. The running time was less than one-tenth of the time of the classifier directly using functional domain composition. In additional, the utility of PredictEFC was superior to the classifiers using traditional dimensionality reduction methods and some previous methods, and this classifier can be transplanted for predicting enzyme family classes of other species. Finally, a web-server available at http://124.221.158.221/ was set up for easy usage.


Algorithms , Enzymes , Enzymes/classification
2.
IEEE Trans Nanobioscience ; 22(4): 967-977, 2023 10.
Article En | MEDLINE | ID: mdl-37159315

In this article, a set of abstract chemical reactions has been employed to construct a novel nonlinear biomolecular controller, i.e, the Brink controller (BC) with direct positive autoregulation (DPAR) (namely BC-DPAR controller). In comparison to dual rail representation-based controllers such as the quasi sliding mode (QSM) controller, the BC-DPAR controller directly reduces the number of CRNs required for realizing an ultrasensitive input-output response because it does not involve the subtraction module, reducing the complexity of DNA implementations. Then, the action mechanism and steady-state condition constraints of two nonlinear controllers, BC-DPAR controller and QSM controller, are investigated further. Considering the mapping relationship between CRNs and DNA implementation, a CRNs-based enzymatic reaction process with delay is constructed, and a DNA strand displacement (DSD) scheme representing time delay is proposed. The BC-DPAR controller, when compared to the QSM controller, can reduce the number of abstract chemical reactions and DSD reactions required by 33.3% and 31.8%, respectively. Finally, an enzymatic reaction scheme with BC-DPAR controller is designed using DSD reactions. According to the findings, the enzymatic reaction process's output substance can approach the target level at a quasi-steady state in both delay-free and non-zero delay conditions, but the target level can only be achieved during a finite-time period, mainly due to the fuel stand depletion.


DNA , Enzymes , DNA/chemistry , Enzymes/classification
3.
J Biol Chem ; 298(10): 102435, 2022 10.
Article En | MEDLINE | ID: mdl-36041629

Natural proteins are often only slightly more stable in the native state than the denatured state, and an increase in environmental temperature can easily shift the balance toward unfolding. Therefore, the engineering of proteins to improve protein stability is an area of intensive research. Thermostable proteins are required to withstand industrial process conditions, for increased shelf-life of protein therapeutics, for developing robust 'biobricks' for synthetic biology applications, and for research purposes (e.g., structure determination). In addition, thermostability buffers the often destabilizing effects of mutations introduced to improve other properties. Rational design approaches to engineering thermostability require structural information, but even with advanced computational methods, it is challenging to predict or parameterize all the relevant structural factors with sufficient precision to anticipate the results of a given mutation. Directed evolution is an alternative when structures are unavailable but requires extensive screening of mutant libraries. Recently, however, bioinspired approaches based on phylogenetic analyses have shown great promise. Leveraging the rapid expansion in sequence data and bioinformatic tools, ancestral sequence reconstruction can generate highly stable folds for novel applications in industrial chemistry, medicine, and synthetic biology. This review provides an overview of the factors important for successful inference of thermostable proteins by ancestral sequence reconstruction and what it can reveal about the determinants of stability in proteins.


Directed Molecular Evolution , Enzymes , Protein Engineering , Proteins , Enzyme Stability , Phylogeny , Protein Engineering/methods , Protein Stability , Proteins/chemistry , Proteins/classification , Proteins/genetics , Temperature , Directed Molecular Evolution/methods , Enzymes/chemistry , Enzymes/classification , Enzymes/genetics
4.
Nucleic Acids Res ; 50(D1): D571-D577, 2022 01 07.
Article En | MEDLINE | ID: mdl-34850161

Thirty years have elapsed since the emergence of the classification of carbohydrate-active enzymes in sequence-based families that became the CAZy database over 20 years ago, freely available for browsing and download at www.cazy.org. In the era of large scale sequencing and high-throughput Biology, it is important to examine the position of this specialist database that is deeply rooted in human curation. The three primary tasks of the CAZy curators are (i) to maintain and update the family classification of this class of enzymes, (ii) to classify sequences newly released by GenBank and the Protein Data Bank and (iii) to capture and present functional information for each family. The CAZy website is updated once a month. Here we briefly summarize the increase in novel families and the annotations conducted during the last 8 years. We present several important changes that facilitate taxonomic navigation, and allow to download the entirety of the annotations. Most importantly we highlight the considerable amount of work that accompanies the analysis and report of biochemical data from the literature.


Carbohydrates/chemistry , Databases, Nucleic Acid , Databases, Protein , Enzymes/chemistry , Carbohydrates/classification , Enzyme Activation/genetics , Enzymes/classification , Humans
5.
Drug Discov Today ; 27(1): 117-133, 2022 01.
Article En | MEDLINE | ID: mdl-34537332

Enzyme-based therapeutics (EBTs) have the potential to tap into an almost unmeasurable amount of enzyme biodiversity and treat myriad conditions. Although EBTs were some of the first biologics used clinically, the rate of development of newer EBTs has lagged behind that of other biologics. Here, we review the history of EBTs, and discuss the state of each class of EBT, their potential clinical advantages, and the unique challenges to their development. Additionally, we discuss key remaining technical barriers that, if addressed, could increase the diversity and rate of the development of EBTs.


Drug Discovery/methods , Enzyme Replacement Therapy , Enzyme Therapy , Enzymes , Drug Development/methods , Enzyme Replacement Therapy/methods , Enzyme Replacement Therapy/trends , Enzyme Therapy/methods , Enzyme Therapy/trends , Enzymes/classification , Enzymes/pharmacology , Humans
6.
Comput Biol Chem ; 94: 107558, 2021 Oct.
Article En | MEDLINE | ID: mdl-34481129

Classifying proteins into their respective enzyme class is an interesting question for researchers for a variety of reasons. The open source Protein Data Bank (PDB) contains more than 1,60,000 structures, with more being added everyday. This paper proposes an attention-based bidirectional-LSTM model (ABLE) trained on over sampled data generated by SMOTE to analyse and classify a protein into one of the six enzyme classes or a negative class using only the primary structure of the protein described as a string by the FASTA sequence as an input. We achieve the highest F1-score of 0.834 using our proposed model on a dataset of proteins from the PDB. We baseline our model against eighteen other machine learning and deep learning networks, including CNN, LSTM, Bi-LSTM, GRU, and the state-of-the-art DeepEC model. We conduct experiments with two different oversampling techniques, SMOTE and ADASYN. To corroborate the obtained results, we perform extensive experimentation and statistical testing.


Enzymes/chemistry , Machine Learning , Neural Networks, Computer , Enzymes/classification , Enzymes/metabolism
7.
PLoS Comput Biol ; 17(9): e1009446, 2021 09.
Article En | MEDLINE | ID: mdl-34555022

Only a small fraction of genes deposited to databases have been experimentally characterised. The majority of proteins have their function assigned automatically, which can result in erroneous annotations. The reliability of current annotations in public databases is largely unknown; experimental attempts to validate the accuracy within individual enzyme classes are lacking. In this study we performed an overview of functional annotations to the BRENDA enzyme database. We first applied a high-throughput experimental platform to verify functional annotations to an enzyme class of S-2-hydroxyacid oxidases (EC 1.1.3.15). We chose 122 representative sequences of the class and screened them for their predicted function. Based on the experimental results, predicted domain architecture and similarity to previously characterised S-2-hydroxyacid oxidases, we inferred that at least 78% of sequences in the enzyme class are misannotated. We experimentally confirmed four alternative activities among the misannotated sequences and showed that misannotation in the enzyme class increased over time. Finally, we performed a computational analysis of annotations to all enzyme classes in the BRENDA database, and showed that nearly 18% of all sequences are annotated to an enzyme class while sharing no similarity or domain architecture to experimentally characterised representatives. We showed that even well-studied enzyme classes of industrial relevance are affected by the problem of functional misannotation.


Alcohol Oxidoreductases/classification , Databases, Protein/statistics & numerical data , Molecular Sequence Annotation/statistics & numerical data , Alcohol Oxidoreductases/chemistry , Alcohol Oxidoreductases/genetics , Animals , Computational Biology , Enzymes/chemistry , Enzymes/classification , Enzymes/genetics , Humans , Models, Molecular , Protein Domains , Sequence Homology, Amino Acid
8.
Protein Sci ; 30(9): 1935-1945, 2021 09.
Article En | MEDLINE | ID: mdl-34118089

Enzymes are critical proteins in every organism. They speed up essential chemical reactions, help fight diseases, and have a wide use in the pharmaceutical and manufacturing industries. Wet lab experiments to figure out an enzyme's function are time consuming and expensive. Therefore, the need for computational approaches to address this problem are becoming necessary. Usually, an enzyme is extremely specific in performing its function. However, there exist enzymes that can perform multiple functions. A multi-functional enzyme has vast potential as it reduces the need to discover/use different enzymes for different functions. We propose an approach to predict a multi-functional enzyme's function up to the most specific fourth level of the hierarchy of the Enzyme Commission (EC) number. Previous studies can only predict the function of the enzyme till level 1. Using a dataset of 2,583 multi-functional enzymes, we achieved a hierarchical subset accuracy of 71.4% and a Macro F1 Score of 96.1% at the fourth level. The robustness of the network was further tested on a multi-functional isoforms dataset. Our method is broadly applicable and may be used to discover better enzymes. The web-server can be freely accessed at http://hecnet.cbrlab.org/.


Deep Learning , Enzymes/chemistry , Enzymes/classification , Biocatalysis , Datasets as Topic , Enzymes/metabolism , Structure-Activity Relationship , Terminology as Topic
9.
Genome Biol Evol ; 13(4)2021 04 05.
Article En | MEDLINE | ID: mdl-33682003

Cobalamin is a cofactor present in essential metabolic pathways in animals and one of the water-soluble vitamins. It is a complex compound synthesized solely by prokaryotes. Cobalamin dependence is scattered across the tree of life. In particular, fungi and plants were deemed devoid of cobalamin. We demonstrate that cobalamin is utilized by all non-Dikarya fungi lineages. This observation is supported by the genomic presence of both B12-dependent enzymes and cobalamin modifying enzymes. Fungal cobalamin-dependent enzymes are highly similar to their animal homologs. Phylogenetic analyses support a scenario of vertical inheritance of the cobalamin usage with several losses. Cobalamin usage was probably lost in Mucorinae and at the base of Dikarya which groups most of the model organisms and which hindered B12-dependent metabolism discovery in fungi. Our results indicate that cobalamin dependence was a widely distributed trait at least in Opisthokonta, across diverse microbial eukaryotes and was likely present in the LECA.


Fungi/enzymology , Vitamin B 12/metabolism , Enzymes/classification , Enzymes/genetics , Fungal Proteins/classification , Fungal Proteins/genetics , Fungi/classification , Fungi/genetics , Genome, Fungal , Metabolic Networks and Pathways/genetics , Phylogeny
10.
Comput Math Methods Med ; 2021: 6683051, 2021.
Article En | MEDLINE | ID: mdl-33488764

Metabolic pathway is an important type of biological pathways. It produces essential molecules and energies to maintain the life of living organisms. Each metabolic pathway consists of a chain of chemical reactions, which always need enzymes to participate in. Thus, chemicals and enzymes are two major components for each metabolic pathway. Although several metabolic pathways have been uncovered, the metabolic pathway system is still far from complete. Some hidden chemicals or enzymes are not discovered in a certain metabolic pathway. Besides the traditional experiments to detect hidden chemicals or enzymes, an alternative pipeline is to design efficient computational methods. In this study, we proposed a powerful multilabel classifier, called iMPTCE-Hnetwork, to uniformly assign chemicals and enzymes to metabolic pathway types reported in KEGG. Such classifier adopted the embedding features derived from a heterogeneous network, which defined chemicals and enzymes as nodes and the interactions between chemicals and enzymes as edges, through a powerful network embedding algorithm, Mashup. The popular RAndom k-labELsets (RAKEL) algorithm was employed to construct the classifier, which incorporated the support vector machine (polynomial kernel) as the basic classifier. The ten-fold cross-validation results indicated that such a classifier had good performance with accuracy higher than 0.800 and exact match higher than 0.750. Several comparisons were done to indicate the superiority of the iMPTCE-Hnetwork.


Algorithms , Metabolic Networks and Pathways , Classification/methods , Computational Biology , Databases, Protein , Enzymes/classification , Humans , Support Vector Machine
11.
Nucleic Acids Res ; 49(D1): D1122-D1129, 2021 01 08.
Article En | MEDLINE | ID: mdl-33068433

Inhibitors that form covalent bonds with their targets have traditionally been considered highly adventurous due to their potential off-target effects and toxicity concerns. However, with the clinical validation and approval of many covalent inhibitors during the past decade, design and discovery of novel covalent inhibitors have attracted increasing attention. A large amount of scattered experimental data for covalent inhibitors have been reported, but a resource by integrating the experimental information for covalent inhibitor discovery is still lacking. In this study, we presented Covalent Inhibitor Database (CovalentInDB), the largest online database that provides the structural information and experimental data for covalent inhibitors. CovalentInDB contains 4511 covalent inhibitors (including 68 approved drugs) with 57 different reactive warheads for 280 protein targets. The crystal structures of some of the proteins bound with a covalent inhibitor are provided to visualize the protein-ligand interactions around the binding site. Each covalent inhibitor is annotated with the structure, warhead, experimental bioactivity, physicochemical properties, etc. Moreover, CovalentInDB provides the covalent reaction mechanism and the corresponding experimental verification methods for each inhibitor towards its target. High-quality datasets are downloadable for users to evaluate and develop computational methods for covalent drug design. CovalentInDB is freely accessible at http://cadd.zju.edu.cn/cidb/.


Databases, Factual , Drugs, Investigational/chemistry , Enzyme Inhibitors/chemistry , Enzymes/chemistry , Prescription Drugs/chemistry , Binding Sites , Datasets as Topic , Drugs, Investigational/classification , Drugs, Investigational/therapeutic use , Enzyme Inhibitors/therapeutic use , Enzymes/classification , Enzymes/metabolism , Humans , Internet , Molecular Docking Simulation , Prescription Drugs/classification , Prescription Drugs/therapeutic use , Protein Binding , Protein Conformation, alpha-Helical , Protein Conformation, beta-Strand , Protein Interaction Domains and Motifs , Software , Thermodynamics
12.
Nucleic Acids Res ; 49(D1): D639-D643, 2021 01 08.
Article En | MEDLINE | ID: mdl-33152079

Microorganisms produce natural products that are frequently used in the development of antibacterial, antiviral, and anticancer drugs, pesticides, herbicides, or fungicides. In recent years, genome mining has evolved into a prominent method to access this potential. antiSMASH is one of the most popular tools for this task. Here, we present version 3 of the antiSMASH database, providing a means to access and query precomputed antiSMASH-5.2-detected biosynthetic gene clusters from representative, publicly available, high-quality microbial genomes via an interactive graphical user interface. In version 3, the database contains 147 517 high quality BGC regions from 388 archaeal, 25 236 bacterial and 177 fungal genomes and is available at https://antismash-db.secondarymetabolites.org/.


Data Mining , Databases as Topic , Enzymes/classification , Biosynthetic Pathways/genetics , Multigene Family , Search Engine
13.
Anal Chem ; 93(2): 737-744, 2021 01 19.
Article En | MEDLINE | ID: mdl-33284580

Quantification of multiple disease-related microRNAs (miRNAs) is of great significance for clinical diagnosis. Based on the simultaneous multiple element detection ability of inductively coupled plasma-mass spectrometry (ICP-MS) and good specificity of multicomponent nucleic acid enzymes (MNAzymes), a novel and simple method based on the MNAzyme amplification strategy and lanthanide labeling coupled with ICP-MS detection was proposed for the sensitive and simultaneous detection of three miRNAs (miRNA-21, miRNA-155, and miRNA-10b). Specifically, a probe consisting of streptavidin-modified magnetic beads (SA-MBs) and three DNA substrates labeled with lanthanide tags (159Tb/165Ho/175Lu) was constructed. In the presence of target miRNAs, three pairs of MNAzymes were assembled where each pair was hybridized with the corresponding miRNA, and then the substrates on the SA-MBs were cleaved by the activated MNAzymes, continuously releasing the fragment with lanthanide tags. The released lanthanide tags in the supernatant were collected after magnetic separation and analyzed by ICP-MS, realizing the simultaneous quantification of multiple miRNAs. The correlation of the lanthanide tag signal with the miRNA concentration fitted well in a linear model in the range of 50-1000 pmol L-1 (miRNA-21) and 50-2000 pmol L-1 (miRNA-155 and miRNA-10b). The limits of detection for three miRNAs were 11-20 pmol L-1, with the relative standard deviations of 2.2-2.7%. The recoveries of target miRNAs in the human serum and HepG-2 cells were in the range of 87.2-111% and 93.3-111%, respectively. Overall, the method is ideal for the simultaneous quantification of multiple miRNAs with advantages of low spectral interference, high sensitivity, good selectivity, and strong resistance to the complex matrix.


Enzymes/metabolism , Lanthanoid Series Elements/chemistry , Mass Spectrometry/methods , MicroRNAs/chemistry , Catalysis , Chelating Agents , DNA Probes , Enzymes/classification , Humans , Magnesium , Organometallic Compounds
14.
Nucleic Acids Res ; 49(D1): D1233-D1243, 2021 01 08.
Article En | MEDLINE | ID: mdl-33045737

Drug-metabolizing enzymes (DMEs) are critical determinant of drug safety and efficacy, and the interactome of DMEs has attracted extensive attention. There are 3 major interaction types in an interactome: microbiome-DME interaction (MICBIO), xenobiotics-DME interaction (XEOTIC) and host protein-DME interaction (HOSPPI). The interaction data of each type are essential for drug metabolism, and the collective consideration of multiple types has implication for the future practice of precision medicine. However, no database was designed to systematically provide the data of all types of DME interactions. Here, a database of the Interactome of Drug-Metabolizing Enzymes (INTEDE) was therefore constructed to offer these interaction data. First, 1047 unique DMEs (448 host and 599 microbial) were confirmed, for the first time, using their metabolizing drugs. Second, for these newly confirmed DMEs, all types of their interactions (3359 MICBIOs between 225 microbial species and 185 DMEs; 47 778 XEOTICs between 4150 xenobiotics and 501 DMEs; 7849 HOSPPIs between 565 human proteins and 566 DMEs) were comprehensively collected and then provided, which enabled the crosstalk analysis among multiple types. Because of the huge amount of accumulated data, the INTEDE made it possible to generalize key features for revealing disease etiology and optimizing clinical treatment. INTEDE is freely accessible at: https://idrblab.org/intede/.


Databases, Factual , Drugs, Investigational/metabolism , Enzymes/metabolism , Inactivation, Metabolic/genetics , Prescription Drugs/metabolism , Protein Processing, Post-Translational , Xenobiotics/metabolism , Bacteria/enzymology , DNA Methylation , Enzymes/classification , Fungi/enzymology , Histones/genetics , Histones/metabolism , Humans , Internet , Metabolic Clearance Rate , Microbiota/genetics , RNA, Long Noncoding/genetics , RNA, Long Noncoding/metabolism , Software
15.
Stud Hist Philos Sci ; 84: 37-45, 2020 12.
Article En | MEDLINE | ID: mdl-33218464

This paper investigates the case of enzyme classification to evaluate different ideals for regulating values in science. I show that epistemic and non-epistemic considerations are inevitably and untraceably entangled in enzyme classification, and argue that this has significant implications for the two main kinds of views on values in science, namely, Epistemic Priority Views and Joint Satisfaction Views. More precisely, I argue that the case of enzyme classification poses a problem for the usability and descriptive accuracy of these two views. The paper ends by suggesting that these two views provide different but complementary perspectives, and that both are useful for evaluating values in science.


Enzymes/classification
16.
Biomed Res Int ; 2020: 9235920, 2020.
Article En | MEDLINE | ID: mdl-32596396

Enzymes are proteins that can efficiently catalyze specific biochemical reactions, and they are widely present in the human body. Developing an efficient method to identify human enzymes is vital to select enzymes from the vast number of human proteins and to investigate their functions. Nevertheless, only a limited amount of research has been conducted on the classification of human enzymes and nonenzymes. In this work, we developed a support vector machine- (SVM-) based predictor to classify human enzymes using the amino acid composition (AAC), the composition of k-spaced amino acid pairs (CKSAAP), and selected informative amino acid pairs through the use of a feature selection technique. A training dataset including 1117 human enzymes and 2099 nonenzymes and a test dataset including 684 human enzymes and 1270 nonenzymes were constructed to train and test the proposed model. The results of jackknife cross-validation showed that the overall accuracy was 76.46% for the training set and 76.21% for the test set, which are higher than the 72.6% achieved in previous research. Furthermore, various feature extraction methods and mainstream classifiers were compared in this task, and informative feature parameters of k-spaced amino acid pairs were selected and compared. The results suggest that our classifier can be used in human enzyme identification effectively and efficiently and can help to understand their functions and develop new drugs.


Amino Acids/chemistry , Enzymes/chemistry , Proteins/chemistry , Algorithms , Computational Biology , Databases, Protein , Enzymes/classification , Humans , Proteins/classification , Support Vector Machine
17.
Nucleic Acids Res ; 48(W1): W110-W115, 2020 07 02.
Article En | MEDLINE | ID: mdl-32406917

The CUPP platform includes a web server for functional annotation and sub-grouping of carbohydrate active enzymes (CAZymes) based on a novel peptide-based similarity assessment algorithm, i.e. protein grouping according to Conserved Unique Peptide Patterns (CUPP). This online platform is open to all users and there is no login requirement. The web server allows the user to perform genome-based annotation of carbohydrate active enzymes to CAZy families, CAZy subfamilies, CUPP groups and EC numbers (function) via assessment of peptide-motifs by CUPP. The web server is intended for functional annotation assessment of the CAZy inventory of prokaryotic and eukaryotic organisms from genomic DNA (up to 30MB compressed) or directly from amino acid sequences (up to 10MB compressed). The custom query sequences are assessed using the CUPP annotation algorithm, and the outcome is displayed in interactive summary result pages of CAZymes. The results displayed allow for inspection of members of the individual CUPP groups and include information about experimentally characterized members. The web server and the other resources on the CUPP platform can be accessed from https://cupp.info.


Carbohydrate Metabolism , Enzymes/chemistry , Enzymes/genetics , Molecular Sequence Annotation , Software , Algorithms , Enzymes/classification , Enzymes/metabolism , Internet , Peptides/chemistry , Sequence Analysis, DNA , Sequence Analysis, Protein
18.
Database (Oxford) ; 20202020 01 01.
Article En | MEDLINE | ID: mdl-32449511

Determining the molecular function of enzymes discovered by genome sequencing represents a primary foundation for understanding many aspects of biology. Historically, classification of enzyme reactions has used the enzyme nomenclature system developed to describe the overall reactions performed by biochemically characterized enzymes, irrespective of their associated sequences. In contrast, functional classification and assignment for the millions of protein sequences of unknown function now available is largely done in two computational steps, first by similarity-based assignment of newly obtained sequences to homologous groups, followed by transferring to them the known functions of similar biochemically characterized homologs. Due to the fundamental differences in their etiologies and practice, `how' these chemistry- and evolution-centric functional classification systems relate to each other has been difficult to explore on a large scale. To investigate this issue in a new way, we integrated two published ontologies that had previously described each of these classification systems independently. The resulting infrastructure was then used to compare the functional assignments obtained from each classification system for the well-studied and functionally diverse enolase superfamily. Mapping these function assignments to protein structure and reaction similarity networks shows a profound and complex disconnect between the homology- and chemistry-based classification systems. This conclusion mirrors previous observations suggesting that except for closely related sequences, facile annotation transfer from small numbers of characterized enzymes to the huge number uncharacterized homologs to which they are related is problematic. Our extension of these comparisons to large enzyme superfamilies in a computationally intelligent manner provides a foundation for new directions in protein function prediction for the huge proportion of sequences of unknown function represented in major databases. Interactive sequence, reaction, substrate and product similarity networks computed for this work for the enolase and two other superfamilies are freely available for download from the Structure Function Linkage Database Archive (http://sfld.rbvi.ucsf.edu).


Computational Biology/methods , Databases, Protein , Enzymes , Enzymes/chemistry , Enzymes/classification , Enzymes/physiology , Molecular Sequence Annotation , Structure-Activity Relationship
19.
Proc Natl Acad Sci U S A ; 117(10): 5310-5318, 2020 03 10.
Article En | MEDLINE | ID: mdl-32079722

The ubiquity of phospho-ligands suggests that phosphate binding emerged at the earliest stage of protein evolution. To evaluate this hypothesis and unravel its details, we identified all phosphate-binding protein lineages in the Evolutionary Classification of Protein Domains database. We found at least 250 independent evolutionary lineages that bind small molecule cofactors and metabolites with phosphate moieties. For many lineages, phosphate binding emerged later as a niche functionality, but for the oldest protein lineages, phosphate binding was the founding function. Across some 4 billion y of protein evolution, side-chain binding, in which the phosphate moiety does not interact with the backbone at all, emerged most frequently. However, in the oldest lineages, and most characteristically in αßα sandwich enzyme domains, N-helix binding sites dominate, where the phosphate moiety sits atop the N terminus of an α-helix. This discrepancy is explained by the observation that N-helix binding is uniquely realized by short, contiguous sequences with reduced amino acid diversity, foremost Gly, Ser, and Thr. The latter two amino acids preferentially interact with both the backbone amide and the side-chain hydroxyl (bidentate interaction) to promote binding by short sequences. We conclude that the first αßα sandwich domains emerged from shorter and simpler polypeptides that bound phospho-ligands via N-helix sites.


Enzymes/chemistry , Enzymes/classification , Evolution, Molecular , Phosphate-Binding Proteins/chemistry , Phosphate-Binding Proteins/classification , Amino Acid Sequence , Binding Sites , Databases, Protein , Ligands , Protein Binding , Protein Domains
20.
Int J Mol Sci ; 20(21)2019 Oct 29.
Article En | MEDLINE | ID: mdl-31671806

The Enzyme Classification (EC) number is a numerical classification scheme for enzymes, established using the chemical reactions they catalyze. This classification is based on the recommendation of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology. Six enzyme classes were recognised in the first Enzyme Classification and Nomenclature List, reported by the International Union of Biochemistry in 1961. However, a new enzyme group was recently added as the six existing EC classes could not describe enzymes involved in the movement of ions or molecules across membranes. Such enzymes are now classified in the new EC class of translocases (EC 7). Several computational methods have been developed in order to predict the EC number. However, due to this new change, all such methods are now outdated and need updating. In this work, we developed a new multi-task quantitative structure-activity relationship (QSAR) method aimed at predicting all 7 EC classes and subclasses. In so doing, we developed an alignment-free model based on artificial neural networks that proved to be very successful.


Enzymes/chemistry , Enzymes/classification , Quantitative Structure-Activity Relationship , Algorithms , Computational Biology/methods , Databases, Factual , Enzymes/metabolism , Linear Models , Machine Learning , Nonlinear Dynamics , Peptidyl Transferases , Proteins/chemistry , Proteins/genetics , Sensitivity and Specificity
...