RESUMEN
Glycoproteomics is a powerful yet analytically challenging research tool. Software packages aiding the interpretation of complex glycopeptide tandem mass spectra have appeared, but their relative performance remains untested. Conducted through the HUPO Human Glycoproteomics Initiative, this community study, comprising both developers and users of glycoproteomics software, evaluates solutions for system-wide glycopeptide analysis. The same mass spectrometrybased glycoproteomics datasets from human serum were shared with participants and the relative team performance for N- and O-glycopeptide data analysis was comprehensively established by orthogonal performance tests. Although the results were variable, several high-performance glycoproteomics informatics strategies were identified. Deep analysis of the data revealed key performance-associated search parameters and led to recommendations for improved 'high-coverage' and 'high-accuracy' glycoproteomics search solutions. This study concludes that diverse software packages for comprehensive glycopeptide data analysis exist, points to several high-performance search strategies and specifies key variables that will guide future software developments and assist informatics decision-making in glycoproteomics.
Asunto(s)
Glicopéptidos/sangre , Glicoproteínas/sangre , Informática/métodos , Proteoma/análisis , Proteómica/métodos , Investigadores/estadística & datos numéricos , Programas Informáticos , Glicosilación , Humanos , Proteoma/metabolismo , Espectrometría de Masas en TándemRESUMEN
Glycopeptides in peptide or digested protein samples pose a number of analytical and bioinformatics challenges beyond those posed by unmodified peptides or peptides with smaller posttranslational modifications. Exact structural elucidation of glycans is generally beyond the capability of a single mass spectrometry experiment, so a reasonable level of identification for tandem mass spectrometry, taken by several glycopeptide software tools, is that of peptide sequence and glycan composition, meaning the number of monosaccharides of each distinct mass, e.g., HexNAc(2)Hex(5) rather than man5. Even at this level, however, glycopeptide analysis poses challenges: finding glycopeptide spectra when they are a tiny fraction of the total spectra; assigning spectra with unanticipated glycans, not in the initial glycan database; and finding, scoring, and labeling diagnostic peaks in tandem mass spectra. Here, we discuss recent improvements to Byonic, a glycoproteomics search program, that address these three issues. Byonic now supports filtering spectra by m/z peaks, so that the user can limit attention to spectra with diagnostic peaks, e.g., at least two out of three of 204.087 for HexNAc, 274.092 for NeuAc (with water loss), and 366.139 for HexNAc-Hex, all within a set mass tolerance, e.g., ± 0.01 Da. Also, new is glycan "wildcard" search, which allows an unspecified mass within a user-set mass range to be applied to N- or O-linked glycans and enables assignment of spectra with unanticipated glycans. Finally, the next release of Byonic supports user-specified peak annotations from user-defined posttranslational modifications. We demonstrate the utility of these new software features by finding previously unrecognized glycopeptides in publicly available data, including glycosylated neuropeptides from rat brain.
Asunto(s)
Glicopéptidos/metabolismo , Procesamiento Proteico-Postraduccional , Proteómica/métodos , Programas Informáticos , Animales , Células Endoteliales/metabolismo , Glicosilación , Humanos , Células Asesinas Naturales/metabolismo , Neuropéptidos/metabolismo , Ratas Sprague-Dawley , Linfocitos T/metabolismoRESUMEN
Charge deconvolution infers the mass from mass over charge (m/z) measurements in electrospray ionization mass spectra. When applied over a wide input m/z or broad target mass range, charge-deconvolution algorithms can produce artifacts, such as false masses at one-half or one-third of the correct mass. Indeed, a maximum entropy term in the objective function of MaxEnt, the most commonly used charge deconvolution algorithm, favors a deconvolved spectrum with many peaks over one with fewer peaks. Here we describe a new "parsimonious" charge deconvolution algorithm that produces fewer artifacts. The algorithm is especially well-suited to high-resolution native mass spectrometry of intact glycoproteins and protein complexes. Deconvolution of native mass spectra poses special challenges due to salt and small molecule adducts, multimers, wide mass ranges, and fewer and lower charge states. We demonstrate the performance of the new deconvolution algorithm on a range of samples. On the heavily glycosylated plasma properdin glycoprotein, the new algorithm could deconvolve monomer and dimer simultaneously and, when focused on the m/z range of the monomer, gave accurate and interpretable masses for glycoforms that had previously been analyzed manually using m/z peaks rather than deconvolved masses. On therapeutic antibodies, the new algorithm facilitated the analysis of extensions, truncations, and Fab glycosylation. The algorithm facilitates the use of native mass spectrometry for the qualitative and quantitative analysis of protein and protein assemblies.
Asunto(s)
Algoritmos , Anticuerpos Monoclonales Humanizados/análisis , Cetuximab/análisis , Glicoproteínas/análisis , Inmunoglobulina G/análisis , Infliximab/análisis , Properdina/análisis , Daclizumab , Entropía , Glicosilación , Humanos , Fragmentos de Péptidos/análisis , Mapeo Peptídico , Proteolisis , Soluciones , Espectrometría de Masa por Ionización de Electrospray/instrumentación , Espectrometría de Masa por Ionización de Electrospray/métodos , Electricidad Estática , Tripsina/químicaRESUMEN
The sialyl-Lewis A (sLeA) glycan forms the basis of the CA19-9 assay and is the current best biomarker for pancreatic cancer, but because it is not elevated in â¼25% of pancreatic cancers, it is not useful for early diagnosis. We hypothesized that sLeA-low tumors secrete glycans that are related to sLeA but not detectable by CA19-9 antibodies. We used a method called motif profiling to predict that a structural isomer of sLeA called sialyl-Lewis X (sLeX) is elevated in the plasma of some sLeA-low cancers. We corroborated this prediction in a set of 48 plasma samples and in a blinded set of 200 samples. An antibody sandwich assay formed by the capture and detection of sLeX was elevated in 13 of 69 cancers that were not elevated in sLeA, and a novel hybrid assay of sLeA capture and sLeX detected 24 of 69 sLeA-low cancers. A two-marker panel based on combined sLeA and sLeX detection differentiated 109 pancreatic cancers from 91 benign pancreatic diseases with 79% accuracy (74% sensitivity and 78% specificity), significantly better than sLeA alone, which yielded 68% accuracy (65% sensitivity and 71% specificity). Furthermore, sLeX staining was evident in tumors that do not elevate plasma sLeA, including those with poorly differentiated ductal adenocarcinoma. Thus, glycan-based biomarkers could characterize distinct subgroups of patients. In addition, the combined use of sLeA and sLeX, or related glycans, could lead to a biomarker panel that is useful in the clinical diagnosis of pancreatic cancer. Précis: This paper shows that a structural isomer of the current best biomarker for pancreatic cancer, CA19-9, is elevated in the plasma of patients who are low in CA19-9, potentially enabling more comprehensive detection and classification of pancreatic cancers.
Asunto(s)
Carcinoma Ductal Pancreático/sangre , Oligosacáridos/sangre , Neoplasias Pancreáticas/sangre , Anticuerpos Monoclonales/química , Antígenos de Carbohidratos Asociados a Tumores/análisis , Antígenos de Carbohidratos Asociados a Tumores/química , Antígenos de Carbohidratos Asociados a Tumores/genética , Antígeno CA-19-9 , Secuencia de Carbohidratos , Carcinoma Ductal Pancreático/química , Carcinoma Ductal Pancreático/diagnóstico , Carcinoma Ductal Pancreático/inmunología , Expresión Génica , Humanos , Inmunoensayo , Datos de Secuencia Molecular , Oligosacáridos/química , Oligosacáridos/inmunología , Neoplasias Pancreáticas/química , Neoplasias Pancreáticas/diagnóstico , Neoplasias Pancreáticas/inmunología , Polisacáridos/química , Polisacáridos/inmunología , Sensibilidad y Especificidad , Antígeno Sialil Lewis XRESUMEN
Glycans are critical to protein biology and are useful as disease biomarkers. Many studies of glycans rely on clinical specimens, but the low amount of sample available for some specimens limits the experimental options. Here we present a method to obtain information about protein glycosylation using a minimal amount of protein. We treat proteins that were captured or directly spotted in small microarrays (2.2 mm × 2.2 mm) with exoglycosidases to successively expose underlying features, and then we probe the native or exposed features using a panel of lectins or glycan-binding reagents. We developed an algorithm to interpret the data and provide predictions about the glycan motifs that are present in the sample. We demonstrated the efficacy of the method to characterize differences between glycoproteins in their sialic acid linkages and N-linked glycan branching, and we validated the assignments by comparing results from mass spectrometry and chromatography. The amount of protein used on-chip was about 11 ng. The method also proved effective for analyzing the glycosylation of a cancer biomarker in human plasma, MUC5AC, using only 20 µL of the plasma. A glycan on MUC5AC that is associated with cancer had mostly 2,3-linked sialic acid, whereas other glycans on MUC5AC had a 2,6 linkage of sialic acid. The on-chip glycan modification and probing (on-chip GMAP) method provides a platform for analyzing protein glycosylation in clinical specimens and could complement the existing toolkit for studying glycosylation in disease.
Asunto(s)
Mucina 5AC/sangre , Polisacáridos/análisis , Algoritmos , Glicosilación , Humanos , Análisis por Micromatrices , Polisacáridos/síntesis química , Programas InformáticosRESUMEN
The fucose post-translational modification is frequently increased in pancreatic cancer, thus forming the basis for promising biomarkers, but a subset of pancreatic cancer patients does not elevate the known fucose-containing biomarkers. We hypothesized that such patients elevate glycan motifs with fucose in linkages and contexts different from the known fucose-containing biomarkers. We used a database of glycan array data to identify the lectins CCL2 to detect glycan motifs with fucose in a 3' linkage; CGL2 for motifs with fucose in a 2' linkage; and RSL for fucose in all linkages. We used several practical methods to test the lectins and determine the optimal mode of detection, and we then tested whether the lectins detected glycans in pancreatic cancer patients who did not elevate the sialyl-Lewis A glycan, which is upregulated in â¼75% of pancreatic adenocarcinomas. Patients who did not upregulate sialyl-Lewis A, which contains fucose in a 4' linkage, tended to upregulate fucose in a 3' linkage, as detected by CCL2, but they did not upregulate total fucose or fucose in a 2' linkage. CCL2 binding was high in cancerous epithelia from pancreatic tumors, including areas negative for sialyl-Lewis A and a related motif containing 3' fucose, sialyl-Lewis X. Thus, glycans containing 3' fucose may complement sialyl-Lewis A to contribute to improved detection of pancreatic cancer. Furthermore, the use of panels of recombinant lectins may uncover details about glycosylation that could be important for characterizing and detecting cancer.
Asunto(s)
Adenocarcinoma/metabolismo , Fucosa/metabolismo , Lectinas/metabolismo , Neoplasias Pancreáticas/metabolismo , Polisacáridos/metabolismo , Regulación hacia Arriba , Quimiocina CCL2/metabolismo , Humanos , Sondas Moleculares , Polisacáridos/químicaRESUMEN
Lectin-glycan interactions have critical functions in multiple normal and pathological processes, but the binding partners and functions for many glycans and lectins are not known. An important step in better understanding glycan-lectin biology is enabling systematic quantification and analysis of the interactions. Glycan arrays can provide the experimental information for such analyses, and the thousands of glycan array datasets available through the Consortium for Functional Glycomics provide the opportunity to extend the analyses to a broad scale. We developed software, based on our previously described Motif Segregation algorithm, for the automated analysis of glycan array data, and we analyzed the entire storehouse of 2883 datasets from the Consortium for Functional Glycomics. We mined the resulting database to make comparisons of specificities across multiple lectins and comparisons between glycans in their lectin receptors. Of the lectins in the database, viral lectins were the most different from other organism types, with specificities nearly always restricted to sialic acids, and mammalian lectins had the most diverse range of specificities. Certain mammalian lectins were unique in their specificities for sulfated glycans. Simple modifications to a lactosamine core structure radically altered the types of lectins that were highly specific for the glycan. Unmodified lactosamine was specifically recognized by plant, fungal, viral, and mammalian lectins; sialylation shifted the binding mainly to viral lectins; and sulfation resulted in mainly mammalian lectins with the highest specificities. We anticipate that this analysis program and database will be valuable in fundamental glycobiology studies, detailed analyses of lectin specificities, and practical applications in translational research.
Asunto(s)
Lectinas/química , Polisacáridos/química , Programas Informáticos , Secuencias de Aminoácidos , Animales , Sitios de Unión , Bases de Datos de Compuestos Químicos , Humanos , Análisis por Micromatrices , Modelos Moleculares , Unión Proteica , Especificidad de la EspecieRESUMEN
BACKGROUND AND AIMS: The CA19-9 antigen is the current best biomarker for pancreatic cancer, but it is not elevated in about 25% of pancreatic cancer patients at a cutoff that gives a 25% false-positive rate. We hypothesized that antigens related to the CA19-9 antigen, which is a glycan called sialyl-Lewis A (sLeA), are elevated in distinct subsets of pancreatic cancers. METHODS: We profiled the levels of multiple glycans and mucin glycoforms in plasma from 200 subjects with either pancreatic cancer or benign pancreatic disease, and we validated selected findings in additional cohorts of 116 and 100 subjects, the latter run blinded and including cancers that exclusively were early-stage. RESULTS: We found significant elevations in two glycans: an isomer of sLeA called sialyl-Lewis X, present both in sulfated and non-sulfated forms; and the sialylated form of a marker for pluripotent stem cells, type 1 N-acetyl-lactosamine. The glycans performed as well as sLeA as individual markers and were elevated in distinct groups of patients, resulting in a 3-marker panel that significantly improved upon any individual biomarker. The panel gave 85% sensitivity and 90% specificity in the combined discovery and validation cohorts, relative to 54% sensitivity and 86% specificity for sLeA; and it gave 80% sensitivity and 84% specificity in the independent test cohort, as opposed to 66% sensitivity and 72% specificity for sLeA. CONCLUSIONS: Glycans related to sLeA are elevated in distinct subsets of pancreatic cancers and yield improved diagnostic accuracy over CA19-9.
RESUMEN
Human experts can annotate peaks in MALDI-TOF profiles of detached N-glycans with some degree of accuracy. Even though MALDI-TOF profiles give only intact masses without any fragmentation information, expert knowledge of the most common glycans and biosynthetic pathways in the biological system can point to a small set of most likely glycan structures at the "cartoon" level of detail. Cartoonist is a recently developed, fully automatic annotation tool for MALDI-TOF glycan profiles. Here we benchmark Cartoonist's automatic annotations against human expert annotations on human and mouse N-glycan data from the Consortium for Functional Glycomics. We find that Cartoonist and expert annotations largely agree, but the expert tends to annotate more specifically, meaning fewer suggested structures per peak, and Cartoonist more comprehensively, meaning more annotated peaks. On peaks for which both Cartoonist and the expert give unique cartoons, the two cartoons agree in over 90% of all cases. This article is part of a Special Issue entitled: Computational Proteomics.
Asunto(s)
Algoritmos , Reconocimiento de Normas Patrones Automatizadas/métodos , Polisacáridos/química , Análisis de Secuencia/métodos , Programas Informáticos , Espectrometría de Masa por Láser de Matriz Asistida de Ionización Desorción/métodos , Benchmarking , Secuencia de Carbohidratos , Humanos , Datos de Secuencia Molecular , Reproducibilidad de los Resultados , Sensibilidad y EspecificidadRESUMEN
Recent research has uncovered unexpected ways that glycans contribute to biology, as well as new strategies for combatting disease using approaches involving glycans. To make full use of glycans for clinical applications, we need more detailed information on the location, nature, and dynamics of glycan expression in vivo. Such studies require the use of specimens acquired directly from patients. Effective studies of clinical specimens require low-volume assays, high precision measurements, and the ability to process many samples. Assays using affinity reagents-lectins and glycan-binding antibodies-can meet these requirements, but further developments are needed to make the methods routine and effective. Recent advances in the use of glycan-binding proteins involve improved determination of specificity using glycan arrays; the availability of databases for mining and analyzing glycan array data; lectin engineering methods; and the ability to quantitatively interpret lectin measurements. Here, we describe many of the challenges and opportunities involved in the application of these new approaches to the study of biological samples. The new tools hold promise for developing methods to improve the outcomes of patients afflicted with diseases characterized by aberrant glycan expression.
Asunto(s)
Anticuerpos/metabolismo , Biomarcadores/análisis , Glicoproteínas/metabolismo , Lectinas/metabolismo , Neoplasias/diagnóstico , Polisacáridos/metabolismo , Análisis por Matrices de Proteínas/métodos , Sitios de Unión , Proteínas Portadoras/análisis , Humanos , Neoplasias/metabolismoRESUMEN
The glycan array is a powerful tool for investigating the specificities of glycan-binding proteins. By incubating a glycan-binding protein on a glycan array, the relative binding to hundreds of different oligosaccharides can be quantified in parallel. Based on these data, much information can be obtained about the preference of a glycan-binding protein for specific subcomponents of oligosaccharides or motifs. In many cases, the analysis and interpretation of glycan array data can be time consuming and imprecise if done manually. Recently we developed software, called GlycoSearch, to facilitate the analysis and interpretation of glycan array data based on the previously developed methods called Motif Segregation and Outlier Motif Analysis. Here we describe the principles behind the software, the use of the software, and an example application. The automated, objective, and precise analysis of glycan array data should enhance the value of the data for a broad range of research applications.
Asunto(s)
Biología Computacional/métodos , Glicómica/métodos , Polisacáridos/metabolismo , Programas Informáticos , Estadística como Asunto , Lectinas de Plantas/química , Unión ProteicaRESUMEN
The validation of candidate biomarkers often is hampered by the lack of a reliable means of assessing and comparing performance. We present here a reference set of serum and plasma samples to facilitate the validation of biomarkers for resectable pancreatic cancer. The reference set includes a large cohort of stage I-II pancreatic cancer patients, recruited from 5 different institutions, and relevant control groups. We characterized the performance of the current best serological biomarker for pancreatic cancer, CA 19-9, using plasma samples from the reference set to provide a benchmark for future biomarker studies and to further our knowledge of CA 19-9 in early-stage pancreatic cancer and the control groups. CA 19-9 distinguished pancreatic cancers from the healthy and chronic pancreatitis groups with an average sensitivity and specificity of 70-74%, similar to previous studies using all stages of pancreatic cancer. Chronic pancreatitis patients did not show CA 19-9 elevations, but patients with benign biliary obstruction had elevations nearly as high as the cancer patients. We gained additional information about the biomarker by comparing two distinct assays. The two CA 9-9 assays agreed well in overall performance but diverged in measurements of individual samples, potentially due to subtle differences in antibody specificity as revealed by glycan array analysis. Thus, the reference set promises be a valuable resource for biomarker validation and comparison, and the CA 19-9 data presented here will be useful for benchmarking and for exploring relationships to CA 19-9.
Asunto(s)
Antígeno CA-19-9/sangre , Neoplasias Pancreáticas/sangre , Anciano , Anciano de 80 o más Años , Biomarcadores de Tumor/sangre , Femenino , Humanos , Masculino , Persona de Mediana Edad , Pancreatitis Crónica/sangre , Sensibilidad y EspecificidadRESUMEN
The glycan array is a powerful tool for investigating the specificities of glycan-binding proteins. By incubating a glycan-binding protein on a glycan array, the relative binding to hundreds of different oligosaccharides can be quantified in parallel. Based on these data, much information can be obtained about the preference of a glycan-binding protein for specific subcomponents of oligosaccharides, or motifs. In many cases, the analysis and interpretation of glycan array data can be time consuming and imprecise if done manually. Recently, GlycoSearch software was developed to facilitate the analysis and interpretation of glycan array data based on two previously developed methods, Motif Segregation and Outlier Motif Analysis. Here, the principles behind this method and the use of this new tool for mining glycan array data are described. The automated, objective, and precise analysis of glycan array data should enhance the value of these data for a broad range of research applications.
Asunto(s)
Lectinas/química , Polisacáridos/química , Programas Informáticos , Sitios de Unión , Lectinas de Plantas/química , Análisis por Matrices de Proteínas/métodos , Especificidad por SustratoRESUMEN
PURPOSE: Lectins are valuable tools for detecting specific glycans in biological samples, but the interpretation of the measurements can be ambiguous due to the complexities of lectin specificities. Here, we present an approach to improve the accuracy of interpretation by converting lectin measurements into quantitative predictions of the presence of various glycan motifs. EXPERIMENTAL DESIGN: The conversion relies on a database of analyzed glycan array data that provides information on the specificities of the lectins for each of the motifs. We tested the method using measurements of lectin binding to glycans on glycan arrays and then applied the method to predicting motifs on the protein mucin 1 (MUC1) expressed in eight different pancreatic cancer cell lines. RESULTS: The combined measurements from several lectins were more accurate than individual measurements for predicting the presence or absence of motifs on arrayed glycans. The analysis of MUC1 revealed that each cell line expressed a unique pattern of glycoforms, and that the glycoforms significantly differed between MUC1 collected from conditioned media and MUC1 collected from cell lysates. CONCLUSIONS AND CLINICAL RELEVANCE: This new method could provide more accurate analyses of glycans in biological sample and make the use of lectins more practical and effective for a broad range of researchers.