Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 25
Filter
1.
Anal Chem ; 93(32): 11215-11224, 2021 08 17.
Article in English | MEDLINE | ID: mdl-34355890

ABSTRACT

The accurate processing of complex liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS) data from biological samples is a major challenge for metabolomics, proteomics, and related approaches. Here, we present the pipelines and systems for threshold-avoiding quantification (PASTAQ) LC-MS/MS preprocessing toolset, which allows highly accurate quantification of data-dependent acquisition LC-MS/MS datasets. PASTAQ performs compound quantification using single-stage (MS1) data and implements novel algorithms for high-performance and accurate quantification, retention time alignment, feature detection, and linking annotations from multiple identification engines. PASTAQ offers straightforward parameterization and automatic generation of quality control plots for data and preprocessing assessment. This design results in smaller variance when analyzing replicates of proteomes mixed with known ratios and allows the detection of peptides over a larger dynamic concentration range compared to widely used proteomics preprocessing tools. The performance of the pipeline is also demonstrated in a biological human serum dataset for the identification of gender-related proteins.


Subject(s)
Proteomics , Tandem Mass Spectrometry , Algorithms , Chromatography, Liquid , Humans , Peptides , Proteome
2.
Anal Chem ; 92(24): 16138-16148, 2020 12 15.
Article in English | MEDLINE | ID: mdl-33317272

ABSTRACT

Mass spectrometry imaging (MSI) is a technique that provides comprehensive molecular information with high spatial resolution from tissue. Today, there is a strong push toward sharing data sets through public repositories in many research fields where MSI is commonly applied; yet, there is no standardized protocol for analyzing these data sets in a reproducible manner. Shifts in the mass-to-charge ratio (m/z) of molecular peaks present a major obstacle that can make it impossible to distinguish one compound from another. Here, we present a label-free m/z alignment approach that is compatible with multiple instrument types and makes no assumptions on the sample's molecular composition. Our approach, MSIWarp (https://github.com/horvatovichlab/MSIWarp), finds an m/z recalibration function by maximizing a similarity score that considers both the intensity and m/z position of peaks matched between two spectra. MSIWarp requires only centroid spectra to find the recalibration function and is thereby readily applicable to almost any MSI data set. To deal with particularly misaligned or peak-sparse spectra, we provide an option to detect and exclude spurious peak matches with a tailored random sample consensus (RANSAC) procedure. We evaluate our approach with four publicly available data sets from both time-of-flight (TOF) and Orbitrap instruments and demonstrate up to 88% improvement in m/z alignment.

3.
Anal Chem ; 88(8): 4229-38, 2016 Apr 19.
Article in English | MEDLINE | ID: mdl-26959230

ABSTRACT

Complex shotgun proteomics peptide profiles obtained in quantitative differential protein expression studies, such as in biomarker discovery, may be affected by multiple experimental factors. These preanalytical factors may affect the measured protein abundances which in turn influence the outcome of the associated statistical analysis and validation. It is therefore important to determine which factors influence the abundance of peptides in a complex proteomics experiment and to identify those peptides that are most influenced by these factors. In the current study we analyzed depleted human serum samples to evaluate experimental factors that may influence the resulting peptide profile such as the residence time in the autosampler at 4 °C, stopping or not stopping the trypsin digestion with acid, the type of blood collection tube, different hemolysis levels, differences in clotting times, the number of freeze-thaw cycles, and different trypsin/protein ratios. To this end we used a two-level fractional factorial design of resolution IV (2(IV)(7-3)). The design required analysis of 16 samples in which the main effects were not confounded by two-factor interactions. Data preprocessing using the Threshold Avoiding Proteomics Pipeline (Suits, F.; Hoekman, B.; Rosenling, T.; Bischoff, R.; Horvatovich, P. Anal. Chem. 2011, 83, 7786-7794, ref 1) produced a data-matrix containing quantitative information on 2,559 peaks. The intensity of the peaks was log-transformed, and peaks having intensities of a low t-test significance (p-value > 0.05) and a low absolute fold ratio (<2) between the two levels of each factor were removed. The remaining peaks were subjected to analysis of variance (ANOVA)-simultaneous component analysis (ASCA). Permutation tests were used to identify which of the preanalytical factors influenced the abundance of the measured peptides most significantly. The most important preanalytical factors affecting peptide intensity were (1) the hemolysis level, (2) stopping trypsin digestion with acid, and (3) the trypsin/protein ratio. This provides guidelines for the experimentalist to keep the ratio of trypsin/protein constant and to control the trypsin reaction by stopping it with acid at an accurately set pH. The hemolysis level cannot be controlled tightly as it depends on the status of a patient's blood (e.g., red blood cells are more fragile in patients undergoing chemotherapy) and the care with which blood was sampled (e.g., by avoiding shear stress). However, its level can be determined with a simple UV spectrophotometric measurement and samples with extreme levels or the peaks affected by hemolysis can be discarded from further analysis. The loadings of the ASCA model led to peptide peaks that were most affected by a given factor, for example, to hemoglobin-derived peptides in the case of the hemolysis level. Peak intensity differences for these peptides were assessed by means of extracted ion chromatograms confirming the results of the ASCA model.


Subject(s)
Peptides/blood , Principal Component Analysis , Proteins/analysis , Proteomics , Analysis of Variance , Humans
4.
Adv Exp Med Biol ; 926: 21-47, 2016.
Article in English | MEDLINE | ID: mdl-27686804

ABSTRACT

Proteogenomics is a multi-omics research field that has the aim to efficiently integrate genomics, transcriptomics and proteomics. With this approach it is possible to identify new patient-specific proteoforms that may have implications in disease development, specifically in cancer. Understanding the impact of a large number of mutations detected at the genomics level is needed to assess the effects at the proteome level. Proteogenomics data integration would help in identifying molecular changes that are persistent across multiple molecular layers and enable better interpretation of molecular mechanisms of disease, such as the causal relationship between single nucleotide polymorphisms (SNPs) and the expression of transcripts and translation of proteins compared to mainstream proteomics approaches. Identifying patient-specific protein forms and getting a better picture of molecular mechanisms of disease opens the avenue for precision and personalized medicine. Proteogenomics is, however, a challenging interdisciplinary science that requires the understanding of sample preparation, data acquisition and processing for genomics, transcriptomics and proteomics. This chapter aims to guide the reader through the technology and bioinformatics aspects of these multi-omics approaches, illustrated with proteogenomics applications having clinical or biological relevance.


Subject(s)
Chromosome Mapping/methods , Proteogenomics/methods , Proteome/genetics , RNA, Messenger/genetics , Software , Amino Acid Sequence , Base Sequence , Cell Line , Chromosome Mapping/statistics & numerical data , Fibroblasts/cytology , Fibroblasts/metabolism , Genome, Human , High-Throughput Nucleotide Sequencing , Humans , Mass Spectrometry , Polymorphism, Single Nucleotide , Precision Medicine/methods , Proteogenomics/instrumentation , Proteome/metabolism , RNA, Messenger/metabolism
5.
Mol Cell Proteomics ; 12(1): 263-76, 2013 Jan.
Article in English | MEDLINE | ID: mdl-23115301

ABSTRACT

In this paper, we compare the performance of six different feature selection methods for LC-MS-based proteomics and metabolomics biomarker discovery-t test, the Mann-Whitney-Wilcoxon test (mww test), nearest shrunken centroid (NSC), linear support vector machine-recursive features elimination (SVM-RFE), principal component discriminant analysis (PCDA), and partial least squares discriminant analysis (PLSDA)-using human urine and porcine cerebrospinal fluid samples that were spiked with a range of peptides at different concentration levels. The ideal feature selection method should select the complete list of discriminating features that are related to the spiked peptides without selecting unrelated features. Whereas many studies have to rely on classification error to judge the reliability of the selected biomarker candidates, we assessed the accuracy of selection directly from the list of spiked peptides. The feature selection methods were applied to data sets with different sample sizes and extents of sample class separation determined by the concentration level of spiked compounds. For each feature selection method and data set, the performance for selecting a set of features related to spiked compounds was assessed using the harmonic mean of the recall and the precision (f-score) and the geometric mean of the recall and the true negative rate (g-score). We conclude that the univariate t test and the mww test with multiple testing corrections are not applicable to data sets with small sample sizes (n = 6), but their performance improves markedly with increasing sample size up to a point (n > 12) at which they outperform the other methods. PCDA and PLSDA select small feature sets with high precision but miss many true positive features related to the spiked peptides. NSC strikes a reasonable compromise between recall and precision for all data sets independent of spiking level and number of samples. Linear SVM-RFE performs poorly for selecting features related to the spiked compounds, even though the classification error is relatively low.


Subject(s)
Metabolomics/methods , Peptides/cerebrospinal fluid , Peptides/urine , Proteomics/methods , Animals , Biomarkers/analysis , Chromatography, Liquid , Computational Biology , Data Interpretation, Statistical , Gene Expression Profiling , Humans , Mass Spectrometry , Oligonucleotide Array Sequence Analysis , Pattern Recognition, Automated , Principal Component Analysis , Swine
6.
Proteomics ; 14(7-8): 862-71, 2014 Apr.
Article in English | MEDLINE | ID: mdl-24478260

ABSTRACT

Scanning MS by MALDI MS imaging (MALDI-MSI) creates large volumetric global datasets that describe the location and identity of ions registered at each sampling location. While thousands of ion peaks are recorded in a typical whole-tissue analysis, only a fraction of these measured molecules are purposefully scrutinized within a given experimental design. To address this need, we recently reported new methods to query the full volume of MALDI-MSI data that correlate all ion masses to one another. As an example of this utility, we demonstrate that specific ion peak m/z signatures can be used to localize similar histological structures within tissue samples. In this study, we use the example of ion peak masses that are associated with tissue spaces occupied by airway bronchioles in rat lung samples. The volume of raw data was preprocessed into structures of 0.1 mass unit bins containing metadata collected at each sampling position. Interactive visualization in ParaView identified ion peaks that especially showed strong association with airway bronchioles but not vascular or parenchymal tissue compartments. Further iterative statistical correlation queries provided ranked indices of all m/z values in the global dataset regarding coincident distributions at any given X, Y position in the histological spaces occupied by bronchioles The study further provides methods for extracting important information contained in global datasets that previously was unseen or inaccessible.


Subject(s)
Ions , Molecular Imaging/methods , Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization/methods , Animals , Molecular Weight , Rats , Tissue Distribution
7.
Mol Cell Proteomics ; 11(6): M111.015974, 2012 Jun.
Article in English | MEDLINE | ID: mdl-22318370

ABSTRACT

Data processing forms an integral part of biomarker discovery and contributes significantly to the ultimate result. To compare and evaluate various publicly available open source label-free data processing workflows, we developed msCompare, a modular framework that allows the arbitrary combination of different feature detection/quantification and alignment/matching algorithms in conjunction with a novel scoring method to evaluate their overall performance. We used msCompare to assess the performance of workflows built from modules of publicly available data processing packages such as SuperHirn, OpenMS, and MZmine and our in-house developed modules on peptide-spiked urine and trypsin-digested cerebrospinal fluid (CSF) samples. We found that the quality of results varied greatly among workflows, and interestingly, heterogeneous combinations of algorithms often performed better than the homogenous workflows. Our scoring method showed that the union of feature matrices of different workflows outperformed the original homogenous workflows in some cases. msCompare is open source software (https://trac.nbic.nl/mscompare), and we provide a web-based data processing service for our framework by integration into the Galaxy server of the Netherlands Bioinformatics Center (http://galaxy.nbic.nl/galaxy) to allow scientists to determine which combination of modules provides the most accurate processing for their particular LC-MS data sets.


Subject(s)
Data Interpretation, Statistical , Software , Adult , Aged , Algorithms , Animals , Biomarkers/cerebrospinal fluid , Biomarkers/urine , Chromatography, Liquid/standards , Chromatography, Reverse-Phase , Female , Humans , Male , Mass Spectrometry/standards , Middle Aged , Online Systems , Peptide Fragments/chemistry , Peptide Mapping , Proteomics , Reference Standards , Swine
8.
Anal Chem ; 85(9): 4398-404, 2013 May 07.
Article in English | MEDLINE | ID: mdl-23537055

ABSTRACT

Mass spectrometry imaging (MSI) generates large volumetric data sets consisting of mass to charge ratio (m/z), ion current, and x,y coordinate location. These data sets usually serve limited purposes centered on measuring the distribution of a small set of ions with known m/z. Such earmarked queries consider only a fraction of the full mass spectrum captured, and there are few tools to assist the exploration of the remaining volume of unknown data in terms of demonstrating similarity or discordance in tissue compartment distribution patterns. Here we present a novel, interactive approach to extract information from MSI data that relies on precalculated data structures to perform queries of large data sets with a typical laptop. We have devised methods to query the full volume to find new m/z values of potential interest based on similarity to biological structures or to the spatial distribution of known ions. We describe these query methods in detail and provide examples demonstrating the power of the methods to "discover" m/z values of ions that have such potentially interesting correlations. The "discovered" ions may be further correlated with either positional locations or the coincident distribution of other ions using successive queries. Finally, we show it is possible to gain insight to the fragmentation pattern of the parent molecule from such correlations. The ability to discover new ions of interest in the unknown bulk of an MSI data set offers the potential to further our understanding of biological and physiological processes related to health and disease.


Subject(s)
Lung/cytology , Animals , Rats , Rats, Wistar , Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization
9.
J Proteome Res ; 11(4): 2048-60, 2012 Apr 06.
Article in English | MEDLINE | ID: mdl-22320401

ABSTRACT

The experimental autoimmune encephalomyelitis (EAE) model resembles certain aspects of multiple sclerosis (MScl), with common features such as motor dysfunction, axonal degradation, and infiltration of T-cells. We studied the cerebrospinal fluid (CSF) proteome in the EAE rat model to identify proteomic changes relevant for MScl disease pathology. EAE was induced in male Lewis rats by injection of myelin basic protein (MBP) together with complete Freund's adjuvant (CFA). An inflammatory control group was injected with CFA alone, and a nontreated group served as healthy control. CSF was collected at day 10 and 14 after immunization and analyzed by bottom-up proteomics on Orbitrap LC-MS and QTOF LC-MS platforms in two independent laboratories. By combining results, 44 proteins were discovered to be significantly increased in EAE animals compared to both control groups, 25 of which have not been mentioned in relation to the EAE model before. Lysozyme C1, fetuin B, T-kininogen, serum paraoxonase/arylesterase 1, glutathione peroxidase 3, complement C3, and afamin are among the proteins significantly elevated in this rat EAE model. Two proteins, afamin and complement C3, were validated in an independent sample set using quantitative selected reaction monitoring mass spectrometry. The molecular weights of the identified differentially abundant proteins indicated an increased transport across the blood-brain barrier (BBB) at the peak of the disease, caused by an increase in BBB permeability.


Subject(s)
Cerebrospinal Fluid Proteins/analysis , Disease Models, Animal , Encephalomyelitis, Autoimmune, Experimental/cerebrospinal fluid , Multiple Sclerosis/cerebrospinal fluid , Proteome/analysis , Proteomics/methods , Animals , Body Weight , Cerebrospinal Fluid Proteins/chemistry , Chromatography, Liquid , Male , Mass Spectrometry , Paralysis/cerebrospinal fluid , Rats , Rats, Inbred Lew
10.
Bioinformatics ; 27(8): 1176-8, 2011 Apr 15.
Article in English | MEDLINE | ID: mdl-21349866

ABSTRACT

UNLABELLED: Warp2D is a novel time alignment approach, which uses the overlapping peak volume of the reference and sample peak lists to correct misleading peak shifts. Here, we present an easy-to-use web interface for high-throughput Warp2D batch processing time alignment service using the Dutch Life Science Grid, reducing processing time from days to hours. This service provides the warping function, the sample chromatogram peak list with adjusted retention times and normalized quality scores based on the sum of overlapping peak volume of all peaks. Heat maps before and after time alignment are created from the arithmetic mean of the sum of overlapping peak area rearranged with hierarchical clustering, allowing the quality control of the time alignment procedure. Taverna workflow and command line tool are provided for remote processing of local user data. AVAILABILITY: online data processing service is available at http://www.nbpp.nl/warp2d.html. Taverna workflow is available at myExperiment with title '2D Time Alignment-Webservice and Workflow' at http://www.myexperiment.org/workflows/1283.html. Command line tool is available at http://www.nbpp.nl/Warp2D_commandline.zip. CONTACT: p.l.horvatovich@rug.nl SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Chromatography, Liquid/methods , Mass Spectrometry/methods , Metabolomics/methods , Proteomics/methods , Software , Animals , High-Throughput Screening Assays , Internet , Mice
11.
Anal Chem ; 83(20): 7786-94, 2011 Oct 15.
Article in English | MEDLINE | ID: mdl-21879761

ABSTRACT

We present a new proteomics analysis pipeline focused on maximizing the dynamic range of detected molecules in liquid chromatography-mass spectrometry (LC-MS) data and accurately quantifying low-abundance peaks to identify those with biological relevance. Although there has been much work to improve the quality of data derived from LC-MS instruments, the goal of this study was to extend the dynamic range of analyzed compounds by making full use of the information available within each data set and across multiple related chromatograms in an experiment. Our aim was to distinguish low-abundance signal peaks from noise by noting their coherent behavior across multiple data sets, and central to this is the need to delay the culling of noise peaks until the final peak-matching stage of the pipeline, when peaks from a single sample appear in the context of all others. The application of thresholds that might discard signal peaks early is thereby avoided, hence the name TAPP: threshold-avoiding proteomics pipeline. TAPP focuses on quantitative low-level processing of raw LC-MS data and includes novel preprocessing, peak detection, time alignment, and cluster-based matching. We demonstrate the performance of TAPP on biologically relevant sample data consisting of porcine cerebrospinal fluid spiked over a wide range of concentrations with horse heart cytochrome c.


Subject(s)
Chromatography, High Pressure Liquid , Mass Spectrometry , Proteomics , Animals , Cytochromes c/analysis , Horses , Myocardium/metabolism
12.
Clin Chem ; 57(12): 1703-11, 2011 Dec.
Article in English | MEDLINE | ID: mdl-21998343

ABSTRACT

BACKGROUND: Because cerebrospinal fluid (CSF) is in close contact with diseased areas in neurological disorders, it is an important source of material in the search for molecular biomarkers. However, sample handling for CSF collected from patients in a clinical setting might not always be adequate for use in proteomics and metabolomics studies. METHODS: We left CSF for 0, 30, and 120 min at room temperature immediately after sample collection and centrifugation/removal of cells. At 2 laboratories CSF proteomes were subjected to tryptic digestion and analyzed by use of nano-liquid chromatography (LC) Orbitrap mass spectrometry (MS) and chipLC quadrupole TOF-MS. Metabolome analysis was performed at 3 laboratories by NMR, GC-MS, and LC-MS. Targeted analyses of cystatin C and albumin were performed by LC-tandem MS in the selected reaction monitoring mode. RESULTS: We did not find significant changes in the measured proteome and metabolome of CSF stored at room temperature after centrifugation, except for 2 peptides and 1 metabolite, 2,3,4-trihydroxybutanoic (threonic) acid, of 5780 identified peptides and 93 identified metabolites. A sensitive protein stability marker, cystatin C, was not affected. CONCLUSIONS: The measured proteome and metabolome of centrifuged human CSF is stable at room temperature for up to 2 hours. We cannot exclude, however, that changes undetectable with our current methodology, such as denaturation or proteolysis, might occur because of sample handling conditions. The stability we observed gives laboratory personnel at the collection site sufficient time to aliquot samples before freezing and storage at -80 °C.


Subject(s)
Metabolome , Proteome/metabolism , Specimen Handling , Cerebrospinal Fluid , Chromatography, Gas , Chromatography, Liquid , Humans , Magnetic Resonance Spectroscopy , Mass Spectrometry/methods , Time Factors
13.
J Proteome Res ; 9(3): 1483-95, 2010 Mar 05.
Article in English | MEDLINE | ID: mdl-20070124

ABSTRACT

Time alignment of complex LC-MS data remains a challenge in proteomics and metabolomics studies. This work describes modifications of the Dynamic Time Warping (DTW) and the Parametric Time Warping (PTW) algorithms that improve the alignment quality for complex, highly variable LC-MS data sets. Regular DTW or PTW use one-dimensional profiles such as the Total Ion Chromatogram (TIC) or Base Peak Chromatogram (BPC) resulting in correct alignment if the signals have a relatively simple structure. However, when aligning the TICs of chromatograms from complex mixtures with large concentration variability such as serum or urine, both algorithms often lead to misalignment of peaks and thus incorrect comparisons in the subsequent statistical analysis. This is mainly due to the fact that compounds with different m/z values but similar retention times are not considered separately but confounded in the benefit function of the algorithms using only one-dimensional information. Thus, it is necessary to treat the information of different mass traces separately in the warping function to ensure that compounds having the same m/z value and retention time are aligned to each other. The Component Detection Algorithm (CODA) is widely used to calculate the quality of an LC-MS mass trace. By combining CODA with the warping algorithms of DTW or PTW (DTW-CODA or PTW-CODA), we include only high quality mass traces measured by CODA in the benefit function. Our results show that using several CODA selected high quality mass traces in DTW-CODA and PTW-CODA significantly improves the alignment quality of three different, highly complex LC-MS data sets. Moreover, DTW-CODA leads to better preservation of peak shape as compared to the original DTW-TIC algorithm, which often suffers from a substantial peak shape distortion. Our results show that combination of CODA selected mass traces with different time alignment algorithm is a general principle that provide accurate alignment for highly complex samples with large concentration variability.


Subject(s)
Algorithms , Chromatography, Liquid/methods , Computational Biology/methods , Mass Spectrometry/methods , Metabolomics/methods , Blood Proteins/analysis , Databases, Factual , Humans , Proteinuria/blood , Time Factors , Trypsin/chemistry , Trypsin/metabolism , Urine/chemistry
14.
J Proteome Res ; 8(12): 5511-22, 2009 Dec.
Article in English | MEDLINE | ID: mdl-19845411

ABSTRACT

To standardize the use of cerebrospinal fluid (CSF) for biomarker research, a set of stability studies have been performed on porcine samples to investigate the influence of common sample handling procedures on proteins, peptides, metabolites and free amino acids. This study focuses at the effect on proteins and peptides, analyzed by applying label-free quantitation using microfluidics nanoscale liquid chromatography coupled with quadrupole time-of-flight mass spectrometry (chipLC-MS) as well as matrix-assisted laser desorption ionization Fourier transform ion cyclotron resonance mass spectrometry (MALDI-FT-ICR-MS) and Orbitrap LC-MS/MS to trypsin-digested CSF samples. The factors assessed were a 30 or 120 min time delay at room temperature before storage at -80 degrees C after the collection of CSF in order to mimic potential delays in the clinic (delayed storage), storage at 4 degrees C after trypsin digestion to mimic the time that samples remain in the cooled autosampler of the analyzer, and repeated freeze-thaw cycles to mimic storage and handling procedures in the laboratory. The delayed storage factor was also analyzed by gas chromatography mass spectrometry (GC-MS) and liquid chromatography mass spectrometry (LC-MS) for changes of metabolites and free amino acids, respectively. Our results show that repeated freeze/thawing introduced changes in transthyretin peptide levels. The trypsin digested samples left at 4 degrees C in the autosampler showed a time-dependent decrease of peak areas for peptides from prostaglandin D-synthase and serotransferrin. Delayed storage of CSF led to changes in prostaglandin D-synthase derived peptides as well as to increased levels of certain amino acids and metabolites. The changes of metabolites, amino acids and proteins in the delayed storage study appear to be related to remaining white blood cells. Our recommendations are to centrifuge CSF samples immediately after collection to remove white blood cells, aliquot, and then snap-freeze the supernatant in liquid nitrogen for storage at -80 degrees C. Preferably samples should not be left in the autosampler for more than 24 h and freeze/thaw cycles should be avoided if at all possible.


Subject(s)
Cerebrospinal Fluid/chemistry , Protein Stability , Proteome/chemistry , Specimen Handling/methods , Tissue Preservation/methods , Amino Acids , Biomarkers/cerebrospinal fluid , Cryopreservation , Humans , Intramolecular Oxidoreductases/metabolism , Leukocytes/chemistry , Leukocytes/metabolism , Lipocalins/metabolism , Metabolomics , Peptides , Proteins , Proteome/metabolism , Proteomics/methods , Reference Standards , Specimen Handling/standards , Tissue Preservation/standards
15.
Bioinformatics ; 24(8): 1070-7, 2008 Apr 15.
Article in English | MEDLINE | ID: mdl-18353791

ABSTRACT

MOTIVATION: Mass spectrometry data are subjected to considerable noise. Good noise models are required for proper detection and quantification of peptides. We have characterized noise in both quadrupole time-of-flight (Q-TOF) and ion trap data, and have constructed models for the noise. RESULTS: We find that the noise in Q-TOF data from Applied Biosystems QSTAR fits well to a combination of multinomial and Poisson model with detector dead-time correction. In comparison, ion trap noise from Agilent MSD-Trap-SL is larger than the Q-TOF noise and is proportional to Poisson noise. We then demonstrate that the noise model can be used to improve deisotoping for peptide detection, by estimating appropriate cutoffs of the goodness of fit parameter at prescribed error rates. The noise models also have implications in noise reduction, retention time alignment and significance testing for biomarker discovery.


Subject(s)
Artifacts , Models, Chemical , Proteins/chemistry , Sequence Analysis, Protein/methods , Spectrometry, Mass, Electrospray Ionization/methods , Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization/methods , Computer Simulation , Models, Statistical , Proteins/analysis , Proteomics/methods , Reproducibility of Results , Sensitivity and Specificity
16.
Anal Chem ; 80(9): 3095-104, 2008 May 01.
Article in English | MEDLINE | ID: mdl-18396914

ABSTRACT

We describe a new time alignment method that takes advantage of both dimensions of LC-MS data to resolve ambiguities in peak matching while remaining computationally efficient. This approach, Warp2D, combines peak extraction with a two-dimensional correlation function to provide a reliable alignment scoring function that is insensitive to spurious peaks and background noise. One-dimensional alignment methods are often based on the total-ion-current elution profile of the spectrum and are unable to distinguish peaks of different masses. Our approach uses one-dimensional alignment in time, but with a scoring function derived from the overlap of peaks in two dimensions, thereby combining the specificity of two-dimensional methods with the computational performance of one-dimensional methods. The peaks are approximated as two-dimensional Gaussians of varying width. This approximation allows peak overlap (the measure of alignment quality) to be calculated analytically, without computationally intensive numerical integration in two dimensions. To demonstrate the general applicability of Warp2D, we chose a variety of complex samples that have substantial biological and analytical variability, including human serum and urine. We show that Warp2D works well with these diverse sample sets and with minimal tuning of parameters, based on the reduced standard deviation of peak elution times after warping. The combination of high computational speed, robustness with complex samples, and lack of need for detailed tuning makes this alignment method well suited to high-throughput LC-MS studies.


Subject(s)
Data Interpretation, Statistical , Gas Chromatography-Mass Spectrometry/methods , Adult , Aged , Aged, 80 and over , Animals , Blood Proteins/analysis , Cytochromes c/analysis , Female , Horses , Humans , Middle Aged , Pregnancy , Urinalysis/methods , Uterine Cervical Neoplasms/blood , Uterine Cervical Neoplasms/urine
17.
Anal Chem ; 80(18): 7012-21, 2008 Sep 15.
Article in English | MEDLINE | ID: mdl-18715018

ABSTRACT

Correlation optimized warping (COW) based on the total ion current (TIC) is a widely used time alignment algorithm (COW-TIC). This approach works successfully on chromatograms containing few compounds and having a well-defined TIC. In this paper, we have combined COW with a component detection algorithm (CODA) to align LC-MS chromatograms containing thousands of biological compounds with overlapping chromatographic peaks, a situation where COW-TIC often fails. CODA is a variable selection procedure that selects mass chromatograms with low noise and low background (so-called "high-quality" mass chromatograms). High-quality mass chromatograms selected in each COW segment ensure that the same compounds (based on their mass and their retention time) are used in the two-dimensional benefit function of COW to obtain correct and optimal alignments (COW-CODA). The performance of the COW-CODA algorithm was evaluated on three types of complex data sets obtained from the LC-MS analysis of samples commonly used for biomarker discovery and compared to COW-TIC using a new global comparison method based on overlapping peak area: trypsin-digested serum obtained from cervical cancer patients, trypsin-digested serum from a single patient that was treated with varying preanalytical parameters (factorial design study), and urine from pregnant and nonpregnant women. While COW-CODA did result in minor misalignments in rare cases, it was clearly superior to the COW-TIC algorithm, especially when applied to highly variable chromatograms (factorial design, urine). The presented algorithm thus enables automatic time alignment and accurate peak matching of multiple LC-MS data sets obtained from complex body fluids that are often used for biomarker discovery.


Subject(s)
Algorithms , Chromatography, Liquid/methods , Mass Spectrometry/methods , Female , Humans , Pregnancy , Reference Standards , Reproducibility of Results , Time Factors , Trypsin/metabolism , Urine/chemistry , Uterine Cervical Neoplasms/blood
18.
J Chromatogr A ; 1373: 61-72, 2014 Dec 19.
Article in English | MEDLINE | ID: mdl-25482036

ABSTRACT

Retention time alignment is one of the most challenging steps in processing LC-MS datasets of complex proteomics samples acquired within a differential profiling study. A large number of time alignment methods have been developed for accurate pre-processing of such datasets. These methods generally assume that common compounds elute in the same order but they do not test whether this assumption holds. If this assumption is not valid, alignments based on a monotonic retention time function will lose accuracy for peaks that depart from the expected order of the retention time correspondence function. To address this issue, we propose a quality control method that assesses if a pair of complex LC-MS datasets can be aligned with the same alignment performance based on statistical tests before correcting retention time shifts. The algorithm first confirms the presence of an adequate number of common peaks (>∼100 accurately matched peak pairs), then determines if the probability for a conserved elution order of those common peaks is sufficiently high (>0.01) and finally performs retention time alignment of two LC-MS chromatograms. This procedure was applied to LC-MS and LC-MS/MS datasets from two different inter-laboratory proteomics studies showing that a large number of common peaks in chromatograms acquired by different laboratories change elution order with considerable retention time differences.


Subject(s)
Chromatography, High Pressure Liquid/methods , Mass Spectrometry/methods , Algorithms , Monte Carlo Method , Probability , Proteomics/methods , Time Factors
19.
Electrophoresis ; 28(23): 4493-505, 2007 Dec.
Article in English | MEDLINE | ID: mdl-18041038

ABSTRACT

The discovery of biomarkers in easily accessible body fluids such as serum is one of the most challenging topics in proteomics requiring highly efficient separation and detection methodologies. Here, we present the application of a microfluidics-based LC-MS system (chip-LC-MS) to the label-free profiling of immunodepleted, trypsin-digested serum in comparison to conventional capillary LC-MS (cap-LC-MS). Both systems proved to have a repeatability of approximately 20% RSD for peak area, all sample preparation steps included, while repeatability of the LC-MS part by itself was less than 10% RSD for the chip-LC-MS system. Importantly, the chip-LC-MS system had a two times higher resolution in the LC dimension and resulted in a lower average charge state of the tryptic peptide ions generated in the ESI interface when compared to cap-LC-MS while requiring approximately 30 times less (~5 pmol) sample. In order to characterize both systems for their capability to find discriminating peptides in trypsin-digested serum samples, five out of ten individually prepared, identical sera were spiked with horse heart cytochrome c. A comprehensive data processing methodology was applied including 2-D smoothing, resolution reduction, peak picking, time alignment, and matching of the individual peak lists to create an aligned peak matrix amenable for statistical analysis. Statistical analysis by supervised classification and variable selection showed that both LC-MS systems could discriminate the two sample groups. However, the chip-LC-MS system allowed to assign 55% of the overall signal to selected peaks against 32% for the cap-LC-MS system.


Subject(s)
Capillary Electrochromatography , Peptides/blood , Protein Array Analysis , Serum/chemistry , Analysis of Variance , Animals , Biomarkers/blood , Humans , Proteomics/methods , Reproducibility of Results , Sensitivity and Specificity , Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization/methods , Tissue Array Analysis , Trypsin/metabolism
20.
Int J Bioinform Res Appl ; 2(2): 161-76, 2006.
Article in English | MEDLINE | ID: mdl-18048160

ABSTRACT

The distributions of residue hydrophobicity for individual domains as well as for the aggregates of domains on a single chain have been found to exhibit well-defined second-order hydrophobic moment profiles. This indicates that most of the domains do fold into a stable entity with a core composed predominantly of hydrophobic residues as well as a prevalence of hydrophobic residues at the interface between domains. A simple scoring function based upon the relative hydrophobic moment dipole orientations shows that 80% of the dipoles of adjacent domains point to each other, highlighting hydrophobic residue prevalence at the domain interfaces.


Subject(s)
Computational Biology/methods , Amino Acid Motifs , Databases, Protein , Hydrophobic and Hydrophilic Interactions , Models, Statistical , Molecular Conformation , Protein Conformation , Protein Folding , Protein Structure, Secondary , Protein Structure, Tertiary , Proteins/chemistry , Proteomics/methods , Solvents/chemistry
SELECTION OF CITATIONS
SEARCH DETAIL