Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 17 de 17
Filter
Add more filters










Publication year range
1.
bioRxiv ; 2024 Apr 10.
Article in English | MEDLINE | ID: mdl-38586025

ABSTRACT

In eukaryotes, protein kinase signaling is regulated by a diverse array of post-translational modifications (PTMs), including phosphorylation of Ser/Thr residues and oxidation of cysteine (Cys) residues. While regulation by activation segment phosphorylation of Ser/Thr residues is well understood, relatively little is known about how oxidation of cysteine residues modulate catalysis. In this study, we investigate redox regulation of the AMPK-related Brain-selective kinases (BRSK) 1 and 2, and detail how broad catalytic activity is directly regulated through reversible oxidation and reduction of evolutionarily conserved Cys residues within the catalytic domain. We show that redox-dependent control of BRSKs is a dynamic and multilayered process involving oxidative modifications of several Cys residues, including the formation of intramolecular disulfide bonds involving a pair of Cys residues near the catalytic HRD motif and a highly conserved T-Loop Cys with a BRSK-specific Cys within an unusual CPE motif at the end of the activation segment. Consistently, mutation of the CPE-Cys increases catalytic activity in vitro and drives phosphorylation of the BRSK substrate Tau in cells. Molecular modeling and molecular dynamics simulations indicate that oxidation of the CPE-Cys destabilizes a conserved salt bridge network critical for allosteric activation. The occurrence of spatially proximal Cys amino acids in diverse Ser/Thr protein kinase families suggests that disulfide mediated control of catalytic activity may be a prevalent mechanism for regulation within the broader AMPK family.

2.
Nat Commun ; 14(1): 6804, 2023 10 26.
Article in English | MEDLINE | ID: mdl-37884510

ABSTRACT

The necroptosis pathway is a lytic, pro-inflammatory mode of cell death that is widely implicated in human disease, including renal, pulmonary, gut and skin inflammatory pathologies. The precise mechanism of the terminal steps in the pathway, where the RIPK3 kinase phosphorylates and triggers a conformation change and oligomerization of the terminal pathway effector, MLKL, are only emerging. Here, we structurally identify RIPK3-mediated phosphorylation of the human MLKL activation loop as a cue for MLKL pseudokinase domain dimerization. MLKL pseudokinase domain dimerization subsequently drives formation of elongated homotetramers. Negative stain electron microscopy and modelling support nucleation of the MLKL tetramer assembly by a central coiled coil formed by the extended, ~80 Å brace helix that connects the pseudokinase and executioner four-helix bundle domains. Mutational data assert MLKL tetramerization as an essential prerequisite step to enable the release and reorganization of four-helix bundle domains for membrane permeabilization and cell death.


Subject(s)
Protein Kinases , Receptor-Interacting Protein Serine-Threonine Kinases , Humans , Phosphorylation , Necrosis , Protein Kinases/metabolism , Dimerization , Cell Death , Receptor-Interacting Protein Serine-Threonine Kinases/metabolism , Apoptosis
3.
Elife ; 122023 10 26.
Article in English | MEDLINE | ID: mdl-37883155

ABSTRACT

Catalytic signaling outputs of protein kinases are dynamically regulated by an array of structural mechanisms, including allosteric interactions mediated by intrinsically disordered segments flanking the conserved catalytic domain. The doublecortin-like kinases (DCLKs) are a family of microtubule-associated proteins characterized by a flexible C-terminal autoregulatory 'tail' segment that varies in length across the various human DCLK isoforms. However, the mechanism whereby these isoform-specific variations contribute to unique modes of autoregulation is not well understood. Here, we employ a combination of statistical sequence analysis, molecular dynamics simulations, and in vitro mutational analysis to define hallmarks of DCLK family evolutionary divergence, including analysis of splice variants within the DCLK1 sub-family, which arise through alternative codon usage and serve to 'supercharge' the inhibitory potential of the DCLK1 C-tail. We identify co-conserved motifs that readily distinguish DCLKs from all other calcium calmodulin kinases (CAMKs), and a 'Swiss Army' assembly of distinct motifs that tether the C-terminal tail to conserved ATP and substrate-binding regions of the catalytic domain to generate a scaffold for autoregulation through C-tail dynamics. Consistently, deletions and mutations that alter C-terminal tail length or interfere with co-conserved interactions within the catalytic domain alter intrinsic protein stability, nucleotide/inhibitor binding, and catalytic activity, suggesting isoform-specific regulation of activity through alternative splicing. Our studies provide a detailed framework for investigating kinome-wide regulation of catalytic output through cis-regulatory events mediated by intrinsically disordered segments, opening new avenues for the design of mechanistically divergent DCLK1 modulators, stabilizers, or degraders.


Subject(s)
Biological Evolution , Protein Serine-Threonine Kinases , Humans , Protein Isoforms/genetics , Protein Serine-Threonine Kinases/genetics , Alternative Splicing , Calcium, Dietary , Doublecortin-Like Kinases
4.
bioRxiv ; 2023 Jul 18.
Article in English | MEDLINE | ID: mdl-37034755

ABSTRACT

Catalytic signaling outputs of protein kinases are dynamically regulated by an array of structural mechanisms, including allosteric interactions mediated by intrinsically disordered segments flanking the conserved catalytic domain. The Doublecortin Like Kinases (DCLKs) are a family of microtubule-associated proteins characterized by a flexible C-terminal autoregulatory 'tail' segment that varies in length across the various human DCLK isoforms. However, the mechanism whereby these isoform-specific variations contribute to unique modes of autoregulation is not well understood. Here, we employ a combination of statistical sequence analysis, molecular dynamics simulations and in vitro mutational analysis to define hallmarks of DCLK family evolutionary divergence, including analysis of splice variants within the DCLK1 sub-family, which arise through alternative codon usage and serve to 'supercharge' the inhibitory potential of the DCLK1 C-tail. We identify co-conserved motifs that readily distinguish DCLKs from all other Calcium Calmodulin Kinases (CAMKs), and a 'Swiss-army' assembly of distinct motifs that tether the C-terminal tail to conserved ATP and substrate-binding regions of the catalytic domain to generate a scaffold for auto-regulation through C-tail dynamics. Consistently, deletions and mutations that alter C-terminal tail length or interfere with co-conserved interactions within the catalytic domain alter intrinsic protein stability, nucleotide/inhibitor-binding, and catalytic activity, suggesting isoform-specific regulation of activity through alternative splicing. Our studies provide a detailed framework for investigating kinome-wide regulation of catalytic output through cis-regulatory events mediated by intrinsically disordered segments, opening new avenues for the design of mechanistically-divergent DCLK1 modulators, stabilizers or degraders.

5.
Microbiol Spectr ; 11(3): e0325222, 2023 06 15.
Article in English | MEDLINE | ID: mdl-36995217

ABSTRACT

Pneumococcal pneumonia remains a WHO high-priority disease despite multivalent conjugate vaccines administered in clinical practice worldwide. A protein-based, serotype-independent vaccine has long-promised comprehensive coverage of most clinical isolates of the pneumococcus. Along with numerous pneumococcal surface protein immunogens, the pneumococcal serine-rich repeat protein (PsrP) has been investigated as a potential vaccine target due to its surface exposure and functions toward bacterial virulence and lung infection. Three critical criteria for its vaccine potential - the clinical prevalence, serotype distribution, and sequence homology of PsrP - have yet to be well characterized. Here, we used genomes of 13,454 clinically isolated pneumococci from the Global Pneumococcal Sequencing project to investigate PsrP presence among isolates, distribution among serotypes, and interrogate its homology as a protein across species. These isolates represent all age groups, countries worldwide, and types of pneumococcal infection. We found PsrP present in at least 50% of all isolates across all determined serotypes and nontypeable (NT) clinical isolates. Using a combination of peptide matching and HMM profiles built on full-length and individual PsrP domains, we identified novel variants that expand PsrP diversity and prevalence. We also observed sequence variability in its basic region (BR) between isolates and serotypes. PsrP has a strong vaccine potential due to its breadth of coverage, especially in nonvaccine serotypes (NVTs) when exploiting its regions of conservation in vaccine design. IMPORTANCE An updated outlook on PsrP prevalence and serotype distribution sheds new light on the comprehensiveness of a PsrP-based protein vaccine. The protein is present in all vaccine serotypes and highly present in the next wave of potentially disease-causing serotypes not included in the current multivalent conjugate vaccines. Furthermore, PsrP is strongly correlated with clinical isolates harboring pneumococcal disease as opposed to pneumococcal carriage. PsrP is also highly present in strains and serotypes from Africa, where the need for a protein-based vaccine is the greatest, giving new reasoning to pursue PsrP as a protein vaccine.


Subject(s)
Pneumococcal Infections , Streptococcus pneumoniae , Humans , Vaccines, Conjugate , Prevalence , Pneumococcal Infections/epidemiology , Pneumococcal Infections/prevention & control , Pneumococcal Infections/microbiology , Pneumococcal Vaccines
7.
Brief Bioinform ; 24(1)2023 01 19.
Article in English | MEDLINE | ID: mdl-36642409

ABSTRACT

Protein language models, trained on millions of biologically observed sequences, generate feature-rich numerical representations of protein sequences. These representations, called sequence embeddings, can infer structure-functional properties, despite protein language models being trained on primary sequence alone. While sequence embeddings have been applied toward tasks such as structure and function prediction, applications toward alignment-free sequence classification have been hindered by the lack of studies to derive, quantify and evaluate relationships between protein sequence embeddings. Here, we develop workflows and visualization methods for the classification of protein families using sequence embedding derived from protein language models. A benchmark of manifold visualization methods reveals that Neighbor Joining (NJ) embedding trees are highly effective in capturing global structure while achieving similar performance in capturing local structure compared with popular dimensionality reduction techniques such as t-SNE and UMAP. The statistical significance of hierarchical clusters on a tree is evaluated by resampling embeddings using a variational autoencoder (VAE). We demonstrate the application of our methods in the classification of two well-studied enzyme superfamilies, phosphatases and protein kinases. Our embedding-based classifications remain consistent with and extend upon previously published sequence alignment-based classifications. We also propose a new hierarchical classification for the S-Adenosyl-L-Methionine (SAM) enzyme superfamily which has been difficult to classify using traditional alignment-based approaches. Beyond applications in sequence classification, our results further suggest NJ trees are a promising general method for visualizing high-dimensional data sets.


Subject(s)
Amino Acid Sequence , Proteins , Cluster Analysis , Proteins/chemistry , Sequence Alignment
8.
Nat Plants ; 8(11): 1289-1303, 2022 11.
Article in English | MEDLINE | ID: mdl-36357524

ABSTRACT

Rhamnogalacturonan I (RG-I) is a major plant cell wall pectic polysaccharide defined by its repeating disaccharide backbone structure of [4)-α-D-GalA-(1,2)-α-L-Rha-(1,]. A family of RG-I:Rhamnosyltransferases (RRT) has previously been identified, but synthesis of the RG-I backbone has not been demonstrated in vitro because the identity of Rhamnogalacturonan I:Galaturonosyltransferase (RG-I:GalAT) was unknown. Here a putative glycosyltransferase, At1g28240/MUCI70, is shown to be an RG-I:GalAT. The name RGGAT1 is proposed to reflect the catalytic activity of this enzyme. When incubated together with the rhamnosyltransferase RRT4, the combined activities of RGGAT1 and RRT4 result in elongation of RG-I acceptors in vitro into a polymeric product. RGGAT1 is a member of a new GT family categorized as GT116, which does not group into existing GT-A clades and is phylogenetically distinct from the GALACTURONOSYLTRANSFERASE (GAUT) family of GalA transferases that synthesize the backbone of the pectin homogalacturonan. RGGAT1 has a predicted GT-A fold structure but employs a metal-independent catalytic mechanism that is rare among glycosyltransferases with this fold type. The identification of RGGAT1 and the 8-member Arabidopsis GT116 family provides a new avenue for studying the mechanism of RG-I synthesis and the function of RG-I in plants.


Subject(s)
Arabidopsis , Pectins , Polymerization , Pectins/metabolism , Arabidopsis/metabolism , Glycosyltransferases/metabolism , Polysaccharides/metabolism
9.
J Biol Chem ; 298(8): 102212, 2022 08.
Article in English | MEDLINE | ID: mdl-35780833

ABSTRACT

Hydrophobic cores are fundamental structural properties of proteins typically associated with protein folding and stability; however, how the hydrophobic core shapes protein evolution and function is poorly understood. Here, we investigated the role of conserved hydrophobic cores in fold-A glycosyltransferases (GT-As), a large superfamily of enzymes that catalyze formation of glycosidic linkages between diverse donor and acceptor substrates through distinct catalytic mechanisms (inverting versus retaining). Using hidden Markov models and protein structural alignments, we identify similarities in the phosphate-binding cassette (PBC) of GT-As and unrelated nucleotide-binding proteins, such as UDP-sugar pyrophosphorylases. We demonstrate that GT-As have diverged from other nucleotide-binding proteins through structural elaboration of the PBC and its unique hydrophobic tethering to the F-helix, which harbors the catalytic base (xED-Asp). While the hydrophobic tethering is conserved across diverse GT-A fold enzymes, some families, such as B3GNT2, display variations in tethering interactions and core packing. We evaluated the structural and functional impact of these core variations through experimental mutational analysis and molecular dynamics simulations and find that some of the core mutations (T336I in B3GNT2) increase catalytic efficiency by modulating the conformational occupancy of the catalytic base between "D-in" and acceptor-accessible "D-out" conformation. Taken together, our studies support a model of evolution in which the GT-A core evolved progressively through elaboration upon an ancient PBC found in diverse nucleotide-binding proteins, and malleability of this core provided the structural framework for evolving new catalytic and substrate-binding functions in extant GT-A fold enzymes.


Subject(s)
Glycosyltransferases , Protein Folding , Glycosyltransferases/metabolism , Humans , Molecular Conformation , Molecular Dynamics Simulation , Nucleotides
10.
Mol Biol Evol ; 38(12): 5625-5639, 2021 12 09.
Article in English | MEDLINE | ID: mdl-34515793

ABSTRACT

The emergence of multicellularity is strongly correlated with the expansion of tyrosine kinases, a conserved family of signaling enzymes that regulates pathways essential for cell-to-cell communication. Although tyrosine kinases have been classified from several model organisms, a molecular-level understanding of tyrosine kinase evolution across all holozoans is currently lacking. Using a hierarchical sequence constraint-based classification of diverse holozoan tyrosine kinases, we construct a new phylogenetic tree that identifies two ancient clades of cytoplasmic and receptor tyrosine kinases separated by the presence of an extended insert segment in the kinase domain connecting the D and E-helices. Present in nearly all receptor tyrosine kinases, this fast-evolving insertion imparts diverse functionalities, such as post-translational modification sites and regulatory interactions. Eph and EGFR receptor tyrosine kinases are two exceptions which lack this insert, each forming an independent lineage characterized by unique functional features. We also identify common constraints shared across multiple tyrosine kinase families which warrant the designation of three new subgroups: Src module (SrcM), insulin receptor kinase-like (IRKL), and fibroblast, platelet-derived, vascular, and growth factor receptors (FPVR). Subgroup-specific constraints reflect shared autoinhibitory interactions involved in kinase conformational regulation. Conservation analyses describe how diverse tyrosine kinase signaling functions arose through the addition of family-specific motifs upon subgroup-specific features and coevolving protein domains. We propose the oldest tyrosine kinases, IRKL, SrcM, and Csk, originated from unicellular premetazoans and were coopted for complex multicellular functions. The increased frequency of oncogenic variants in more recent tyrosine kinases suggests that lineage-specific functionalities are selectively altered in human cancers.


Subject(s)
Evolution, Molecular , Protein-Tyrosine Kinases , Tyrosine , Phosphorylation , Phylogeny , Protein-Tyrosine Kinases/genetics , Protein-Tyrosine Kinases/metabolism , Receptor Protein-Tyrosine Kinases/genetics , Receptor Protein-Tyrosine Kinases/metabolism , Signal Transduction , Tyrosine/metabolism
11.
BMC Bioinformatics ; 22(1): 446, 2021 Sep 18.
Article in English | MEDLINE | ID: mdl-34537014

ABSTRACT

BACKGROUND: Protein kinases are among the largest druggable family of signaling proteins, involved in various human diseases, including cancers and neurodegenerative disorders. Despite their clinical relevance, nearly 30% of the 545 human protein kinases remain highly understudied. Comparative genomics is a powerful approach for predicting and investigating the functions of understudied kinases. However, an incomplete knowledge of kinase orthologs across fully sequenced kinomes severely limits the application of comparative genomics approaches for illuminating understudied kinases. Here, we introduce KinOrtho, a query- and graph-based orthology inference method that combines full-length and domain-based approaches to map one-to-one kinase orthologs across 17 thousand species. RESULTS: Using multiple metrics, we show that KinOrtho performed better than existing methods in identifying kinase orthologs across evolutionarily divergent species and eliminated potential false positives by flagging sequences without a proper kinase domain for further evaluation. We demonstrate the advantage of using domain-based approaches for identifying domain fusion events, highlighting a case between an understudied serine/threonine kinase TAOK1 and a metabolic kinase PIK3C2A with high co-expression in human cells. We also identify evolutionary fission events involving the understudied OBSCN kinase domains, further highlighting the value of domain-based orthology inference approaches. Using KinOrtho-defined orthologs, Gene Ontology annotations, and machine learning, we propose putative biological functions of several understudied kinases, including the role of TP53RK in cell cycle checkpoint(s), the involvement of TSSK3 and TSSK6 in acrosomal vesicle localization, and potential functions for the ULK4 pseudokinase in neuronal development. CONCLUSIONS: In sum, KinOrtho presents a novel query-based tool to identify one-to-one orthologous relationships across thousands of proteomes that can be applied to any protein family of interest. We exploit KinOrtho here to identify kinase orthologs and show that its well-curated kinome ortholog set can serve as a valuable resource for illuminating understudied kinases, and the KinOrtho framework can be extended to any protein-family of interest.


Subject(s)
Biological Evolution , Genomics , Humans , Molecular Sequence Annotation , Protein Kinases/genetics , Protein Serine-Threonine Kinases , Proteins
12.
Glycobiology ; 31(11): 1472-1477, 2021 12 18.
Article in English | MEDLINE | ID: mdl-34351427

ABSTRACT

Glycosyltransferases (GTs) play a central role in sustaining all forms of life through the biosynthesis of complex carbohydrates. Despite significant strides made in recent years to establish computational resources, databases and tools to understand the nature and role of carbohydrates and related glycoenzymes, a data analytics framework that connects the sequence-structure-function relationships to the evolution of GTs is currently lacking. This hinders the characterization of understudied GTs and the synthetic design of GTs for medical and biotechnology applications. Here, we present GTXplorer as an integrated platform that presents evolutionary information of GTs adopting a GT-A fold in an intuitive format enabling in silico investigation through comparative sequence analysis to derive informed hypotheses about their function. The tree view mode provides an overview of the evolutionary relationships of GT-A families and allows users to select phylogenetically relevant families for comparisons. The selected families can then be compared in the alignment view at the residue level using annotated weblogo stacks of the GT-A core specific to the selected clade, family, or subfamily. All data are easily accessible and can be downloaded for further analysis. GTXplorer can be accessed at https://vulcan.cs.uga.edu/gtxplorer/ or from GitHub at https://github.com/esbgkannan/GTxplorer to deploy locally. By packaging multiple data streams into an accessible, user-friendly format, GTXplorer presents the first evolutionary data analytics platform for comparative glycomics.


Subject(s)
Computational Biology , Glycosyltransferases/chemistry , Biocatalysis , Carbohydrates/biosynthesis , Carbohydrates/chemistry , Glycomics , Glycosyltransferases/metabolism , Protein Folding
14.
J Biol Chem ; 297(1): 100843, 2021 07.
Article in English | MEDLINE | ID: mdl-34058199

ABSTRACT

Peters Plus Syndrome (PTRPLS OMIM #261540) is a severe congenital disorder of glycosylation where patients have multiple structural anomalies, including Peters anomaly of the eye (anterior segment dysgenesis), disproportionate short stature, brachydactyly, dysmorphic facial features, developmental delay, and variable additional abnormalities. PTRPLS patients and some Peters Plus-like (PTRPLS-like) patients (who only have a subset of PTRPLS phenotypes) have mutations in the gene encoding ß1,3-glucosyltransferase (B3GLCT). B3GLCT catalyzes the transfer of glucose to O-linked fucose on thrombospondin type-1 repeats. Most B3GLCT substrate proteins belong to the ADAMTS superfamily and play critical roles in extracellular matrix. We sought to determine whether the PTRPLS or PTRPLS-like mutations abrogated B3GLCT activity. B3GLCT has two putative active sites, one in the N-terminal region and the other in the C-terminal glycosyltransferase domain. Using sequence analysis and in vitro activity assays, we demonstrated that the C-terminal domain catalyzes transfer of glucose to O-linked fucose. We also generated a homology model of B3GLCT and identified D421 as the catalytic base. PTRPLS and PTRPLS-like mutations were individually introduced into B3GLCT, and the mutated enzymes were evaluated using in vitro enzyme assays and cell-based functional assays. Our results demonstrated that PTRPLS mutations caused loss of B3GLCT enzymatic activity and/or significantly reduced protein stability. In contrast, B3GLCT with PTRPLS-like mutations retained enzymatic activity, although some showed a minor destabilizing effect. Overall, our data supports the hypothesis that loss of glucose from B3GLCT substrate proteins is responsible for the defects observed in PTRPLS patients, but not for those observed in PTRPLS-like patients.


Subject(s)
Cleft Lip/enzymology , Cleft Lip/genetics , Cornea/abnormalities , Galactosyltransferases/genetics , Galactosyltransferases/metabolism , Glucosyltransferases/genetics , Glucosyltransferases/metabolism , Growth Disorders/enzymology , Growth Disorders/genetics , Limb Deformities, Congenital/enzymology , Limb Deformities, Congenital/genetics , Mutation/genetics , ADAMTS Proteins/metabolism , Amino Acid Motifs , Amino Acid Sequence , Biocatalysis , Cornea/enzymology , Enzyme Stability , Fucose/metabolism , Galactosyltransferases/chemistry , Glucose/metabolism , Glucosyltransferases/chemistry , HEK293 Cells , Humans , Kinetics , Models, Molecular , Protein Domains , Repetitive Sequences, Amino Acid , Structural Homology, Protein
15.
BMC Bioinformatics ; 21(1): 520, 2020 Nov 12.
Article in English | MEDLINE | ID: mdl-33183223

ABSTRACT

BACKGROUND: Protein kinases are a large family of druggable proteins that are genomically and proteomically altered in many human cancers. Kinase-targeted drugs are emerging as promising avenues for personalized medicine because of the differential response shown by altered kinases to drug treatment in patients and cell-based assays. However, an incomplete understanding of the relationships connecting genome, proteome and drug sensitivity profiles present a major bottleneck in targeting kinases for personalized medicine. RESULTS: In this study, we propose a multi-component Quantitative Structure-Mutation-Activity Relationship Tests (QSMART) model and neural networks framework for providing explainable models of protein kinase inhibition and drug response ([Formula: see text]) profiles in cell lines. Using non-small cell lung cancer as a case study, we show that interaction terms that capture associations between drugs, pathways, and mutant kinases quantitatively contribute to the response of two EGFR inhibitors (afatinib and lapatinib). In particular, protein-protein interactions associated with the JNK apoptotic pathway, associations between lung development and axon extension, and interaction terms connecting drug substructures and the volume/charge of mutant residues at specific structural locations contribute significantly to the observed [Formula: see text] values in cell-based assays. CONCLUSIONS: By integrating multi-omics data in the QSMART model, we not only predict drug responses in cancer cell lines with high accuracy but also identify features and explainable interaction terms contributing to the accuracy. Although we have tested our multi-component explainable framework on protein kinase inhibitors, it can be extended across the proteome to investigate the complex relationships connecting genotypes and drug sensitivity profiles.


Subject(s)
Neural Networks, Computer , Protein Kinase Inhibitors/chemistry , Quantitative Structure-Activity Relationship , Afatinib/pharmacology , Carcinoma, Non-Small-Cell Lung/metabolism , Carcinoma, Non-Small-Cell Lung/pathology , Cell Line, Tumor , ErbB Receptors/antagonists & inhibitors , ErbB Receptors/genetics , ErbB Receptors/metabolism , Humans , Lapatinib/pharmacology , Lung Neoplasms/metabolism , Lung Neoplasms/pathology , MAP Kinase Signaling System/drug effects , Mutation , Precision Medicine , Protein Interaction Maps/drug effects , Protein Kinase Inhibitors/metabolism , Protein Kinase Inhibitors/pharmacology
16.
Elife ; 92020 04 01.
Article in English | MEDLINE | ID: mdl-32234211

ABSTRACT

Glycosyltransferases (GTs) are prevalent across the tree of life and regulate nearly all aspects of cellular functions. The evolutionary basis for their complex and diverse modes of catalytic functions remain enigmatic. Here, based on deep mining of over half million GT-A fold sequences, we define a minimal core component shared among functionally diverse enzymes. We find that variations in the common core and emergence of hypervariable loops extending from the core contributed to GT-A diversity. We provide a phylogenetic framework relating diverse GT-A fold families for the first time and show that inverting and retaining mechanisms emerged multiple times independently during evolution. Using evolutionary information encoded in primary sequences, we trained a machine learning classifier to predict donor specificity with nearly 90% accuracy and deployed it for the annotation of understudied GTs. Our studies provide an evolutionary framework for investigating complex relationships connecting GT-A fold sequence, structure, function and regulation.


Carbohydrates are one of the major groups of large biological molecules that regulate nearly all aspects of life. Yet, unlike DNA or proteins, carbohydrates are made without a template to follow. Instead, these molecules are built from a set of sugar-based building blocks by the intricate activities of a large and diverse family of enzymes known as glycosyltransferases. An incomplete understanding of how glycosyltransferases recognize and build diverse carbohydrates presents a major bottleneck in developing therapeutic strategies for diseases associated with abnormalities in these enzymes. It also limits efforts to engineer these enzymes for biotechnology applications and biofuel production. Taujale et al. have now used evolutionary approaches to map the evolution of a major subset of glycosyltransferases from species across the tree of life to understand how these enzymes evolved such precise mechanisms to build diverse carbohydrates. First, a minimal structural unit was defined based on being shared among a group of over half a million unique glycosyltransferase enzymes with different activities. Further analysis then showed that the diverse activities of these enzymes evolved through the accumulation of mutations within this structural unit, as well as in much more variable regions in the enzyme that extend from the minimal unit. Taujale et al. then built an extended family tree for this collection of glycosyltransferases and details of the evolutionary relationships between the enzymes helped them to create a machine learning framework that could predict which sugar-containing molecules were the raw materials for a given glycosyltransferase. This framework could make predictions with nearly 90% accuracy based only on information that can be deciphered from the gene for that enzyme. These findings will provide scientists with new hypotheses for investigating the complex relationships connecting the genetic information about glycosyltransferases with their structures and activities. Further refinement of the machine learning framework may eventually enable the design of enzymes with properties that are desirable for applications in biotechnology.


Subject(s)
Glycosyltransferases/chemistry , Protein Folding , Evolution, Molecular , Humans , Phylogeny , Substrate Specificity
17.
J Microbiol Biol Educ ; 16(2): 198-202, 2015 Dec.
Article in English | MEDLINE | ID: mdl-26753026

ABSTRACT

New interdisciplinary biological sciences like bioinformatics, biophysics, and systems biology have become increasingly relevant in modern science. Many papers have suggested the importance of adding these subjects, particularly bioinformatics, to an undergraduate curriculum; however, most of their assertions have relied on qualitative arguments. In this paper, we will show our metadata analysis of a scientific literature database (PubMed) that quantitatively describes the importance of the subjects of bioinformatics, systems biology, and biophysics as compared with a well-established interdisciplinary subject, biochemistry. Specifically, we found that the development of each subject assessed by its publication volume was well described by a set of simple nonlinear equations, allowing us to characterize them quantitatively. Bioinformatics, which had the highest ratio of publications produced, was predicted to grow between 77% and 93% by 2025 according to the model. Due to the large number of publications produced in bioinformatics, which nearly matches the number published in biochemistry, it can be inferred that bioinformatics is almost equal in significance to biochemistry. Based on our analysis, we suggest that bioinformatics be added to the standard biology undergraduate curriculum. Adding this course to an undergraduate curriculum will better prepare students for future research in biology.

SELECTION OF CITATIONS
SEARCH DETAIL
...