RESUMO
Glycans constitute the most complicated post-translational modification, modulating protein activity in health and disease. However, structural annotation from tandem mass spectrometry (MS/MS) data is a bottleneck in glycomics, preventing high-throughput endeavors and relegating glycomics to a few experts. Trained on a newly curated set of 500,000 annotated MS/MS spectra, here we present CandyCrunch, a dilated residual neural network predicting glycan structure from raw liquid chromatography-MS/MS data in seconds (top-1 accuracy: 90.3%). We developed an open-access Python-based workflow of raw data conversion and prediction, followed by automated curation and fragment annotation, with predictions recapitulating and extending expert annotation. We demonstrate that this can be used for de novo annotation, diagnostic fragment identification and high-throughput glycomics. For maximum impact, this entire pipeline is tightly interlaced with our glycowork platform and can be easily tested at https://colab.research.google.com/github/BojarLab/CandyCrunch/blob/main/CandyCrunch.ipynb . We envision CandyCrunch to democratize structural glycomics and the elucidation of biological roles of glycans.
Assuntos
Aprendizado Profundo , Polissacarídeos , Espectrometria de Massas em Tandem , Espectrometria de Massas em Tandem/métodos , Polissacarídeos/química , Polissacarídeos/análise , Glicômica/métodos , Humanos , Cromatografia Líquida/métodos , Software , Fluxo de Trabalho , Redes Neurais de ComputaçãoRESUMO
Breast milk is abundant with functionalized milk oligosaccharides (MOs) to nourish and protect the neonate. Yet we lack a comprehensive understanding of the repertoire and evolution of MOs across Mammalia. We report â¼400 MO-species associations (>100 novel structures) from milk glycomics of nine mostly understudied species: alpaca, beluga whale, black rhinoceros, bottlenose dolphin, impala, L'Hoest's monkey, pygmy hippopotamus, domestic sheep, and striped dolphin. This revealed the hitherto unknown existence of the LacdiNAc motif (GalNAcß1-4GlcNAc) in MOs of all species except alpaca, sheep, and striped dolphin, indicating the widespread occurrence of this potentially antimicrobial motif in MOs. We also characterize glucuronic acid-containing MOs in the milk of impala, dolphins, sheep, and rhinoceros, previously only reported in cows. We demonstrate that these GlcA-MOs exhibit potent immunomodulatory effects. Our study extends the number of known MOs by >15%. Combined with >1900 curated MO-species associations, we characterize MO motif distributions, presenting an exhaustive overview of MO biodiversity.
Assuntos
Antílopes , Camelídeos Americanos , Golfinhos , Stenella , Humanos , Feminino , Recém-Nascido , Animais , Bovinos , Ovinos , Leite Humano , OligossacarídeosRESUMO
Artificial intelligence (AI) methods have been and are now being increasingly integrated in prediction software implemented in bioinformatics and its glycoscience branch known as glycoinformatics. AI techniques have evolved in the past decades, and their applications in glycoscience are not yet widespread. This limited use is partly explained by the peculiarities of glyco-data that are notoriously hard to produce and analyze. Nonetheless, as time goes, the accumulation of glycomics, glycoproteomics, and glycan-binding data has reached a point where even the most recent deep learning methods can provide predictors with good performance. We discuss the historical development of the application of various AI methods in the broader field of glycoinformatics. A particular focus is placed on shining a light on challenges in glyco-data handling, contextualized by lessons learnt from related disciplines. Ending on the discussion of state-of-the-art deep learning approaches in glycoinformatics, we also envision the future of glycoinformatics, including development that need to occur in order to truly unleash the capabilities of glycoscience in the systems biology era.
Assuntos
Inteligência Artificial , Glicômica , Glicômica/métodos , Software , Biologia Computacional/métodos , PolissacarídeosRESUMO
Structural details of oligosaccharides, or glycans, often carry biological relevance, which is why they are typically elucidated using tandem mass spectrometry. Common approaches to distinguish isomers rely on diagnostic glycan fragments for annotating topologies or linkages. Diagnostic fragments are often only known informally among practitioners or stem from individual studies, with unclear validity or generalizability, causing annotation heterogeneity and hampering new analysts. Drawing on a curated set of 237,000 O-glycomics spectra, we here present a rule-based machine learning workflow to uncover quantifiably valid and generalizable diagnostic fragments. This results in fragmentation rules to robustly distinguish common O-glycan isomers for reduced glycans in negative ion mode. We envision this resource to improve glycan annotation accuracy and concomitantly make annotations more transparent and homogeneous across analysts.
RESUMO
Plant lectins have garnered attention for their roles as laboratory probes and potential therapeutics. Here, we report the discovery and characterization of Cucumis melo agglutinin (CMA1), a new R-type lectin from melon. Our findings reveal CMA1's unique glycan-binding profile, mechanistically explained by its 3D structure, augmenting our understanding of R-type lectins. We expressed CMA1 recombinantly and assessed its binding specificity using multiple glycan arrays, covering 1,046 unique sequences. This resulted in a complex binding profile, strongly preferring C2-substituted, beta-linked galactose (both GalNAc and Fuca1-2Gal), which we contrasted with the established R-type lectin Ricinus communis agglutinin 1 (RCA1). We also report binding of specific glycosaminoglycan subtypes and a general enhancement of binding by sulfation. Further validation using agglutination, thermal shift assays, and surface plasmon resonance confirmed and quantified this binding specificity in solution. Finally, we solved the high-resolution structure of the CMA1 N-terminal domain using X-ray crystallography, supporting our functional findings at the molecular level. Our study provides a comprehensive understanding of CMA1, laying the groundwork for further exploration of its biological and therapeutic potential.
RESUMO
Glycans are essential to all scales of biology, with their intricate structures being crucial for their biological functions. The structural complexity of glycans is communicated through simplified and unified visual representations according to the Symbol Nomenclature for Glycans (SNFGs) guidelines adopted by the community. Here, we introduce GlycoDraw, a Python-native implementation for high-throughput generation of high-quality, SNFG-compliant glycan figures with flexible display options. GlycoDraw is released as part of our glycan analysis ecosystem, glycowork, facilitating integration into existing workflows by enabling fully automated annotation of glycan-related figures and thus assisting the analysis of e.g. differential abundance data or glycomics mass spectra.
Assuntos
Ecossistema , Polissacarídeos , Polissacarídeos/química , Glicômica , Espectrometria de Massas em TandemRESUMO
Controlling gene expression with sophisticated logic gates has been and remains one of the central aims of synthetic biology. However, conventional implementations of biocomputers use central processing units (CPUs) assembled from multiple protein-based gene switches, limiting the programming flexibility and complexity that can be achieved within single cells. Here, we introduce a CRISPR/Cas9-based core processor that enables different sets of user-defined guide RNA inputs to program a single transcriptional regulator (dCas9-KRAB) to perform a wide range of bitwise computations, from simple Boolean logic gates to arithmetic operations such as the half adder. Furthermore, we built a dual-core CPU combining two orthogonal core processors in a single cell. In principle, human cells integrating multiple orthogonal CRISPR/Cas9-based core processors could offer enormous computational capacity.
Assuntos
Sistemas CRISPR-Cas , Computadores Moleculares , Regulação da Expressão Gênica , Células HEK293 , HumanosRESUMO
While glycans are crucial for biological processes, existing analysis modalities make it difficult for researchers with limited computational background to include these diverse carbohydrates into workflows. Here, we present glycowork, an open-source Python package designed for glycan-related data science and machine learning by end users. Glycowork includes functions to, for instance, automatically annotate glycan motifs and analyze their distributions via heatmaps and statistical enrichment. We also provide visualization methods, routines to interact with stored databases, trained machine learning models and learned glycan representations. We envision that glycowork can extract further insights from glycan datasets and demonstrate this with workflows that analyze glycan motifs in various biological contexts. Glycowork can be freely accessed at https://github.com/BojarLab/glycowork/.
Assuntos
Ciência de Dados , Aprendizado de Máquina , Polissacarídeos/química , Software , Bases de Dados FactuaisRESUMO
In this study, we designed and built a gene switch that employs metabolically inert l-glucose to regulate transgene expression in mammalian cells via d-idonate-mediated control of the bacterial regulator LgnR. To this end, we engineered a metabolic cascade in mammalian cells to produce the inducer molecule d-idonate from its precursor l-glucose by ectopically expressing the Paracoccus species 43P-derived catabolic enzymes LgdA, LgnH, and LgnI. To obtain ON- and OFF-switches, we fused LgnR to the human transcriptional silencer domain Krüppel associated box (KRAB) and the viral trans-activator domain VP16, respectively. Thus, these artificial transcription factors KRAB-LgnR or VP16-LgnR modulated cognate promoters containing LgnR-specific binding sites in a d-idonate-dependent manner as a direct result of l-glucose metabolism. In a proof-of-concept experiment, we show that the switches can control production of the model biopharmaceutical rituximab in both transiently and stably transfected HEK-293T cells, as well as CHO-K1 cells. Rituximab production reached 5.9 µg/ml in stably transfected HEK-293T cells and 3.3 µg/ml in stably transfected CHO-K1 cells.
Assuntos
Redes Reguladoras de Genes , Glucose , Rituximab/biossíntese , Animais , Células CHO , Cricetulus , Genes Reporter , Glicosilação , Células HEK293 , Humanos , Paracoccus/enzimologia , Plasmídeos , Açúcares Ácidos , Fatores de Transcrição/genética , TransfecçãoRESUMO
Engineered proteins with enhanced or altered functionality, generated for example by mutation or domain fusion, are at the core of nearly all synthetic biology endeavors in the context of precision medicine, also known as personalized medicine. From designer receptors sensing elevated blood markers to effectors rerouting signaling pathways to synthetic transcription factors and the customized therapeutics they regulate, engineered proteins play a crucial role at every step of novel therapeutic approaches using synthetic biology. Here, recent developments in protein engineering aided by advances in directed evolution, de novo design, and machine learning are discussed. Building on clinical successes already achieved with chimeric antigen receptor (CAR-) T cells and other cell-based therapies, these developments are expected to further enhance the capabilities of mammalian synthetic biology in biomedical and other applications.
Assuntos
Tecnologia Biomédica , Engenharia de Proteínas , Biologia Sintética , Animais , Tecnologia Biomédica/métodos , Humanos , Medicina de Precisão , Biologia Sintética/tendências , Terapêutica/métodos , Terapêutica/tendênciasRESUMO
Strategies for expanding the sensor space of designer receptors are urgently needed to tailor cell-based therapies to respond to any type of medically relevant molecules. Here, we describe a universal approach to designing receptor scaffolds that enables antibody-specific molecular input to activate JAK/STAT, MAPK, PLCG or PI3K/Akt signaling rewired to transgene expression driven by synthetic promoters. To demonstrate its scope, we equipped the GEMS (generalized extracellular molecule sensor) platform with antibody fragments targeting a synthetic azo dye, nicotine, a peptide tag and the PSA (prostate-specific antigen) biomarker, thereby covering inputs ranging from small molecules to proteins. These four GEMS devices provided robust signaling and transgene expression with high signal-to-noise ratios in response to their specific ligands. The sensitivity of the nicotine- and PSA-specific GEMS devices matched the clinically relevant concentration ranges, and PSA-specific GEMS were able to detect pathological PSA levels in the serum of patients diagnosed with prostate cancer.
Assuntos
Biomarcadores Tumorais/análise , Corantes Fluorescentes/química , Nicotina/química , Antígeno Prostático Específico/química , Neoplasias da Próstata/diagnóstico por imagem , Células Cultivadas , Corantes Fluorescentes/síntese química , Células HEK293 , Humanos , Masculino , Nicotina/síntese química , Antígeno Prostático Específico/síntese químicaRESUMO
Capitalizing on the ability of mammalian cells to conduct complex post-translational modifications, most protein therapeutics are currently produced in cell culture systems. Addition of a signal peptide to the product protein enables its accumulation in the cell culture supernatant, but separation of the product from endogenously secreted proteins remains costly and labor-intensive. We considered that global downregulation of translation of non-product proteins would be an efficient strategy to minimize downstream processing requirements. Therefore, taking advantage of the ability of mammalian protein kinase R (PKR) to switch off most cellular translation processes in response to infection by viruses, we fused a caffeine-inducible dimerization domain to the catalytic domain of PKR. Addition of caffeine to this construct results in homodimerization and activation of PKR, effectively rewiring rapid global translational downregulation to the addition of the stimulus in a dose-dependent manner. Then, to protect translation of the target therapeutic, we screened viral and cellular internal ribosomal entry sites (IRESes) known or suspected to be resistant to PKR-induced translational stress. After choosing the best-in-class Seneca valley virus (SVV) IRES, we additionally screened for IRES transactivation factors (ITAFs) as well as for supplementary small molecules to further boost the production titer of the product protein under conditions of global translational downregulation. Importantly, the residual global translation activity of roughly 10% under maximal downregulation is sufficient to maintain cellular viability during a production timeframe of at least five days. Standard industrially used adherent as well as suspension-adapted cell lines transfected with this synthetic biology-inspired Protein Kinase R-Enhanced Protein Production (PREPP) system could produce several medicinally relevant protein therapeutics, such as the blockbuster drug rituximab, in substantial quantities and with significantly higher purity than previous culture technologies. We believe incorporation of such purity-by-design technology in the production process will alleviate downstream processing bottlenecks in future biopharmaceutical manufacturing.
Assuntos
Engenharia Metabólica/métodos , Biossíntese de Proteínas/genética , Animais , Anticorpos Monoclonais/biossíntese , Anticorpos Monoclonais/isolamento & purificação , Cafeína/farmacologia , Catálise , Ciclo Celular , Linhagem Celular , Regulação para Baixo , Genes Reporter/genética , Humanos , Metabolômica , Processamento de Proteína Pós-Traducional , Ribossomos/genética , Ribossomos/metabolismo , Rituximab/biossíntese , Rituximab/isolamento & purificação , Transfecção , Vírus/genéticaRESUMO
Brassinosteroids, which control plant growth and development, are sensed by the membrane receptor kinase BRASSINOSTEROID INSENSITIVE 1 (BRI1). Brassinosteroid binding to the BRI1 leucine-rich repeat (LRR) domain induces heteromerisation with a SOMATIC EMBRYOGENESIS RECEPTOR KINASE (SERK)-family co-receptor. This process allows the cytoplasmic kinase domains of BRI1 and SERK to interact, trans-phosphorylate and activate each other. Here we report crystal structures of the BRI1 kinase domain in its activated form and in complex with nucleotides. BRI1 has structural features reminiscent of both serine/threonine and tyrosine kinases, providing insight into the evolution of dual-specificity kinases in plants. Phosphorylation of Thr1039, Ser1042 and Ser1044 causes formation of a catalytically competent activation loop. Mapping previously identified serine/threonine and tyrosine phosphorylation sites onto the structure, we analyse their contribution to brassinosteroid signaling. The location of known genetic missense alleles provide detailed insight into the BRI1 kinase mechanism, while our analyses are inconsistent with a previously reported guanylate cyclase activity. We identify a protein interaction surface on the C-terminal lobe of the kinase and demonstrate that the isolated BRI1, SERK2 and SERK3 cytoplasmic segments form homodimers in solution and have a weak tendency to heteromerise. We propose a model in which heterodimerisation of the BRI1 and SERK ectodomains brings their cytoplasmic kinase domains in a catalytically competent arrangement, an interaction that can be modulated by the BRI1 inhibitor protein BKI1.
Assuntos
Arabidopsis/metabolismo , Brassinosteroides/metabolismo , Serina/metabolismo , Transdução de Sinais/fisiologia , Arabidopsis/genética , Proteínas de Arabidopsis/metabolismo , Mutação , Fosforilação , Proteínas Quinases/metabolismo , Proteínas Serina-Treonina Quinases/metabolismo , Treonina/metabolismoRESUMO
Milk oligosaccharides, complex carbohydrates unique to mammalian milk, play crucial roles in infant nutrition and immune development. This review explores their biochemical diversity, tracing the evolutionary paths that have led to their variation across different species. We highlight the intersection of nutrition, biology, and chemistry in understanding these compounds. Additionally, we discuss the latest computational methods and analytical techniques that have revolutionized the study of milk oligosaccharides, offering insights into their structural complexity and functional roles. This brief but essential review not only aims to provide a deeper understanding of milk oligosaccharides but also discuss the road toward their potential applications.
Assuntos
Leite Humano , Oligossacarídeos , Humanos , Lactente , Animais , Leite Humano/química , Oligossacarídeos/química , MamíferosRESUMO
Motivation: Structural analysis of glycans poses significant challenges in glycobiology due to their complex sequences. Research questions such as analyzing the sequence content of the α1-6 branch in N-glycans, are biologically meaningful yet can be hard to automate. Results: Here, we introduce a regular expression system, designed for glycans, feature-complete, and closely aligned with regular expression formatting. We use this to annotate glycan motifs of arbitrary complexity, perform differential expression analysis on designated sequence stretches, or elucidate branch-specific binding specificities of lectins in an automated manner. We are confident that glycan regular expressions will empower computational analyses of these sequences. Availability and implementation: Our regular expression framework for glycans is implemented in Python and is incorporated into the open-source glycowork package (version 1.1+). Code and documentation are available at https://github.com/BojarLab/glycowork/blob/master/glycowork/motif/regex.py.
RESUMO
Glycans, present across all domains of life, comprise a wide range of monosaccharides assembled into complex, branching structures. Here, we present an in silico protocol to construct biosynthetic networks from a list of observed glycans using the Python package glycowork. We describe steps for data preparation, network construction, feature analysis, and data export. This protocol is implemented in Python using example data and can be adapted for use with customized datasets. For complete details on the use and execution of this protocol, please refer to Thomès et al.1.
Assuntos
Polissacarídeos , Polissacarídeos/biossíntese , Polissacarídeos/metabolismo , Polissacarídeos/química , Software , Vias Biossintéticas , Simulação por Computador , Biologia Computacional/métodosRESUMO
Glycosylation is described as a non-templated biosynthesis. Yet, the template-free premise is antithetical to the observation that different N-glycans are consistently placed at specific sites. It has been proposed that glycosite-proximal protein structures could constrain glycosylation and explain the observed microheterogeneity. Using site-specific glycosylation data, we trained a hybrid neural network to parse glycosites (recurrent neural network) and match them to feasible N-glycosylation events (graph neural network). From glycosite-flanking sequences, the algorithm predicts most human N-glycosylation events documented in the GlyConnect database and proposed structures corresponding to observed monosaccharide composition of the glycans at these sites. The algorithm also recapitulated glycosylation in Enhanced Aromatic Sequons, SARS-CoV-2 spike, and IgG3 variants, thus demonstrating the ability of the algorithm to predict both glycan structure and abundance. Thus, protein structure constrains glycosylation, and the neural network enables predictive in silico glycosylation of uncharacterized or novel protein sequences and genetic variants.
RESUMO
There is an increasing appreciation for the role of cell surface glycans in modulating interactions with extracellular ligands and participating in intercellular communication. We recently reported the existence of sialoglycoRNAs, where mammalian small RNAs are covalently linked to N-glycans through the modified base acp3U and trafficked to the cell surface. However, little is currently known about the role for O-glycosylation, another major class of carbohydrate polymer modifications. Here, we use parallel genetic, enzymatic, and mass spectrometry approaches to demonstrate that O-linked glycan biosynthesis is responsible for the majority of sialoglycoRNA levels. By examining the O-glycans associated with RNA from cell lines and colon organoids we find known and previously unreported O-linked glycan structures. Further, we find that O-linked glycans released from small RNA from organoids derived from ulcerative colitis patients exhibit higher levels of sialylation than glycans from healthy organoids. Together, our work provides flexible tools to interrogate O-linked glycoRNAs (O-glycoRNA) and suggests that they may be modulated in human disease.
RESUMO
Glycomics, the comprehensive profiling of all glycan structures in samples, is rapidly expanding to enable insights into physiology and disease mechanisms. However, glycan structure complexity and glycomics data interpretation present challenges, especially for differential expression analysis. Here, we present a framework for differential glycomics expression analysis. Our methodology encompasses specialized and domain-informed methods for data normalization and imputation, glycan motif extraction and quantification, differential expression analysis, motif enrichment analysis, time series analysis, and meta-analytic capabilities, synthesizing results across multiple studies. All methods are integrated into our open-source glycowork package, facilitating performant workflows and user-friendly access. We demonstrate these methods using dedicated simulations and glycomics datasets of N-, O-, lipid-linked, and free glycans. Differential expression tests here focus on human datasets and cancer vs. healthy tissue comparisons. Our rigorous approach allows for robust, reliable, and comprehensive differential expression analyses in glycomics, contributing to advancing glycomics research and its translation to clinical and diagnostic applications.
Assuntos
Glicômica , Polissacarídeos , Humanos , Glicômica/métodos , Polissacarídeos/químicaRESUMO
Glycans are important polysaccharides on cellular surfaces that are bound to glycoproteins and glycolipids. These are one of the most common post-translational modifications of proteins in eukaryotic cells. They play important roles in protein folding, cell-cell interactions, and other extracellular processes. Changes in glycan structures may influence the course of different diseases, such as infections or cancer. Glycans are commonly represented using the IUPAC-condensed notation. IUPAC-condensed is a textual representation of glycans operating on the same topological level as the Symbol Nomenclature for Glycans (SNFG) that assigns colored, geometrical shapes to the main monomers. These symbols are then connected in tree-like structures, visualizing the glycan structure on a topological level. Yet for a representation on the atomic level, notations such as SMILES should be used. To our knowledge, there is no easy-to-use, general, open-source, and offline tool to convert the IUPAC-condensed notation to SMILES. Here, we present the open-access Python package GlyLES for the generalizable generation of SMILES representations out of IUPAC-condensed representations. GlyLES uses a grammar to read in the monomer tree from the IUPAC-condensed notation. From this tree, the tool can compute the atomic structures of each monomer based on their IUPAC-condensed descriptions. In the last step, it merges all monomers into the atomic structure of a glycan in the SMILES notation. GlyLES is the first package that allows conversion from the IUPAC-condensed notation of glycans to SMILES strings. This may have multiple applications, including straightforward visualization, substructure search, molecular modeling and docking, and a new featurization strategy for machine-learning algorithms. GlyLES is available at https://github.com/kalininalab/GlyLES .