Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 25
Filtrar
1.
Anal Chem ; 93(32): 11215-11224, 2021 08 17.
Artigo em Inglês | MEDLINE | ID: mdl-34355890

RESUMO

The accurate processing of complex liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS) data from biological samples is a major challenge for metabolomics, proteomics, and related approaches. Here, we present the pipelines and systems for threshold-avoiding quantification (PASTAQ) LC-MS/MS preprocessing toolset, which allows highly accurate quantification of data-dependent acquisition LC-MS/MS datasets. PASTAQ performs compound quantification using single-stage (MS1) data and implements novel algorithms for high-performance and accurate quantification, retention time alignment, feature detection, and linking annotations from multiple identification engines. PASTAQ offers straightforward parameterization and automatic generation of quality control plots for data and preprocessing assessment. This design results in smaller variance when analyzing replicates of proteomes mixed with known ratios and allows the detection of peptides over a larger dynamic concentration range compared to widely used proteomics preprocessing tools. The performance of the pipeline is also demonstrated in a biological human serum dataset for the identification of gender-related proteins.


Assuntos
Proteômica , Espectrometria de Massas em Tandem , Algoritmos , Cromatografia Líquida , Humanos , Peptídeos , Proteoma
2.
Anal Chem ; 92(24): 16138-16148, 2020 12 15.
Artigo em Inglês | MEDLINE | ID: mdl-33317272

RESUMO

Mass spectrometry imaging (MSI) is a technique that provides comprehensive molecular information with high spatial resolution from tissue. Today, there is a strong push toward sharing data sets through public repositories in many research fields where MSI is commonly applied; yet, there is no standardized protocol for analyzing these data sets in a reproducible manner. Shifts in the mass-to-charge ratio (m/z) of molecular peaks present a major obstacle that can make it impossible to distinguish one compound from another. Here, we present a label-free m/z alignment approach that is compatible with multiple instrument types and makes no assumptions on the sample's molecular composition. Our approach, MSIWarp (https://github.com/horvatovichlab/MSIWarp), finds an m/z recalibration function by maximizing a similarity score that considers both the intensity and m/z position of peaks matched between two spectra. MSIWarp requires only centroid spectra to find the recalibration function and is thereby readily applicable to almost any MSI data set. To deal with particularly misaligned or peak-sparse spectra, we provide an option to detect and exclude spurious peak matches with a tailored random sample consensus (RANSAC) procedure. We evaluate our approach with four publicly available data sets from both time-of-flight (TOF) and Orbitrap instruments and demonstrate up to 88% improvement in m/z alignment.

3.
Anal Chem ; 88(8): 4229-38, 2016 Apr 19.
Artigo em Inglês | MEDLINE | ID: mdl-26959230

RESUMO

Complex shotgun proteomics peptide profiles obtained in quantitative differential protein expression studies, such as in biomarker discovery, may be affected by multiple experimental factors. These preanalytical factors may affect the measured protein abundances which in turn influence the outcome of the associated statistical analysis and validation. It is therefore important to determine which factors influence the abundance of peptides in a complex proteomics experiment and to identify those peptides that are most influenced by these factors. In the current study we analyzed depleted human serum samples to evaluate experimental factors that may influence the resulting peptide profile such as the residence time in the autosampler at 4 °C, stopping or not stopping the trypsin digestion with acid, the type of blood collection tube, different hemolysis levels, differences in clotting times, the number of freeze-thaw cycles, and different trypsin/protein ratios. To this end we used a two-level fractional factorial design of resolution IV (2(IV)(7-3)). The design required analysis of 16 samples in which the main effects were not confounded by two-factor interactions. Data preprocessing using the Threshold Avoiding Proteomics Pipeline (Suits, F.; Hoekman, B.; Rosenling, T.; Bischoff, R.; Horvatovich, P. Anal. Chem. 2011, 83, 7786-7794, ref 1) produced a data-matrix containing quantitative information on 2,559 peaks. The intensity of the peaks was log-transformed, and peaks having intensities of a low t-test significance (p-value > 0.05) and a low absolute fold ratio (<2) between the two levels of each factor were removed. The remaining peaks were subjected to analysis of variance (ANOVA)-simultaneous component analysis (ASCA). Permutation tests were used to identify which of the preanalytical factors influenced the abundance of the measured peptides most significantly. The most important preanalytical factors affecting peptide intensity were (1) the hemolysis level, (2) stopping trypsin digestion with acid, and (3) the trypsin/protein ratio. This provides guidelines for the experimentalist to keep the ratio of trypsin/protein constant and to control the trypsin reaction by stopping it with acid at an accurately set pH. The hemolysis level cannot be controlled tightly as it depends on the status of a patient's blood (e.g., red blood cells are more fragile in patients undergoing chemotherapy) and the care with which blood was sampled (e.g., by avoiding shear stress). However, its level can be determined with a simple UV spectrophotometric measurement and samples with extreme levels or the peaks affected by hemolysis can be discarded from further analysis. The loadings of the ASCA model led to peptide peaks that were most affected by a given factor, for example, to hemoglobin-derived peptides in the case of the hemolysis level. Peak intensity differences for these peptides were assessed by means of extracted ion chromatograms confirming the results of the ASCA model.


Assuntos
Peptídeos/sangue , Análise de Componente Principal , Proteínas/análise , Proteômica , Análise de Variância , Humanos
4.
Adv Exp Med Biol ; 926: 21-47, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27686804

RESUMO

Proteogenomics is a multi-omics research field that has the aim to efficiently integrate genomics, transcriptomics and proteomics. With this approach it is possible to identify new patient-specific proteoforms that may have implications in disease development, specifically in cancer. Understanding the impact of a large number of mutations detected at the genomics level is needed to assess the effects at the proteome level. Proteogenomics data integration would help in identifying molecular changes that are persistent across multiple molecular layers and enable better interpretation of molecular mechanisms of disease, such as the causal relationship between single nucleotide polymorphisms (SNPs) and the expression of transcripts and translation of proteins compared to mainstream proteomics approaches. Identifying patient-specific protein forms and getting a better picture of molecular mechanisms of disease opens the avenue for precision and personalized medicine. Proteogenomics is, however, a challenging interdisciplinary science that requires the understanding of sample preparation, data acquisition and processing for genomics, transcriptomics and proteomics. This chapter aims to guide the reader through the technology and bioinformatics aspects of these multi-omics approaches, illustrated with proteogenomics applications having clinical or biological relevance.


Assuntos
Mapeamento Cromossômico/métodos , Proteogenômica/métodos , Proteoma/genética , RNA Mensageiro/genética , Software , Sequência de Aminoácidos , Sequência de Bases , Linhagem Celular , Mapeamento Cromossômico/estatística & dados numéricos , Fibroblastos/citologia , Fibroblastos/metabolismo , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Espectrometria de Massas , Polimorfismo de Nucleotídeo Único , Medicina de Precisão/métodos , Proteogenômica/instrumentação , Proteoma/metabolismo , RNA Mensageiro/metabolismo
5.
Mol Cell Proteomics ; 12(1): 263-76, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-23115301

RESUMO

In this paper, we compare the performance of six different feature selection methods for LC-MS-based proteomics and metabolomics biomarker discovery-t test, the Mann-Whitney-Wilcoxon test (mww test), nearest shrunken centroid (NSC), linear support vector machine-recursive features elimination (SVM-RFE), principal component discriminant analysis (PCDA), and partial least squares discriminant analysis (PLSDA)-using human urine and porcine cerebrospinal fluid samples that were spiked with a range of peptides at different concentration levels. The ideal feature selection method should select the complete list of discriminating features that are related to the spiked peptides without selecting unrelated features. Whereas many studies have to rely on classification error to judge the reliability of the selected biomarker candidates, we assessed the accuracy of selection directly from the list of spiked peptides. The feature selection methods were applied to data sets with different sample sizes and extents of sample class separation determined by the concentration level of spiked compounds. For each feature selection method and data set, the performance for selecting a set of features related to spiked compounds was assessed using the harmonic mean of the recall and the precision (f-score) and the geometric mean of the recall and the true negative rate (g-score). We conclude that the univariate t test and the mww test with multiple testing corrections are not applicable to data sets with small sample sizes (n = 6), but their performance improves markedly with increasing sample size up to a point (n > 12) at which they outperform the other methods. PCDA and PLSDA select small feature sets with high precision but miss many true positive features related to the spiked peptides. NSC strikes a reasonable compromise between recall and precision for all data sets independent of spiking level and number of samples. Linear SVM-RFE performs poorly for selecting features related to the spiked compounds, even though the classification error is relatively low.


Assuntos
Metabolômica/métodos , Peptídeos/líquido cefalorraquidiano , Peptídeos/urina , Proteômica/métodos , Animais , Biomarcadores/análise , Cromatografia Líquida , Biologia Computacional , Interpretação Estatística de Dados , Perfilação da Expressão Gênica , Humanos , Espectrometria de Massas , Análise de Sequência com Séries de Oligonucleotídeos , Reconhecimento Automatizado de Padrão , Análise de Componente Principal , Suínos
6.
Proteomics ; 14(7-8): 862-71, 2014 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-24478260

RESUMO

Scanning MS by MALDI MS imaging (MALDI-MSI) creates large volumetric global datasets that describe the location and identity of ions registered at each sampling location. While thousands of ion peaks are recorded in a typical whole-tissue analysis, only a fraction of these measured molecules are purposefully scrutinized within a given experimental design. To address this need, we recently reported new methods to query the full volume of MALDI-MSI data that correlate all ion masses to one another. As an example of this utility, we demonstrate that specific ion peak m/z signatures can be used to localize similar histological structures within tissue samples. In this study, we use the example of ion peak masses that are associated with tissue spaces occupied by airway bronchioles in rat lung samples. The volume of raw data was preprocessed into structures of 0.1 mass unit bins containing metadata collected at each sampling position. Interactive visualization in ParaView identified ion peaks that especially showed strong association with airway bronchioles but not vascular or parenchymal tissue compartments. Further iterative statistical correlation queries provided ranked indices of all m/z values in the global dataset regarding coincident distributions at any given X, Y position in the histological spaces occupied by bronchioles The study further provides methods for extracting important information contained in global datasets that previously was unseen or inaccessible.


Assuntos
Íons , Imagem Molecular/métodos , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz/métodos , Animais , Peso Molecular , Ratos , Distribuição Tecidual
7.
Mol Cell Proteomics ; 11(6): M111.015974, 2012 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-22318370

RESUMO

Data processing forms an integral part of biomarker discovery and contributes significantly to the ultimate result. To compare and evaluate various publicly available open source label-free data processing workflows, we developed msCompare, a modular framework that allows the arbitrary combination of different feature detection/quantification and alignment/matching algorithms in conjunction with a novel scoring method to evaluate their overall performance. We used msCompare to assess the performance of workflows built from modules of publicly available data processing packages such as SuperHirn, OpenMS, and MZmine and our in-house developed modules on peptide-spiked urine and trypsin-digested cerebrospinal fluid (CSF) samples. We found that the quality of results varied greatly among workflows, and interestingly, heterogeneous combinations of algorithms often performed better than the homogenous workflows. Our scoring method showed that the union of feature matrices of different workflows outperformed the original homogenous workflows in some cases. msCompare is open source software (https://trac.nbic.nl/mscompare), and we provide a web-based data processing service for our framework by integration into the Galaxy server of the Netherlands Bioinformatics Center (http://galaxy.nbic.nl/galaxy) to allow scientists to determine which combination of modules provides the most accurate processing for their particular LC-MS data sets.


Assuntos
Interpretação Estatística de Dados , Software , Adulto , Idoso , Algoritmos , Animais , Biomarcadores/líquido cefalorraquidiano , Biomarcadores/urina , Cromatografia Líquida/normas , Cromatografia de Fase Reversa , Feminino , Humanos , Masculino , Espectrometria de Massas/normas , Pessoa de Meia-Idade , Sistemas On-Line , Fragmentos de Peptídeos/química , Mapeamento de Peptídeos , Proteômica , Padrões de Referência , Suínos
8.
Anal Chem ; 85(9): 4398-404, 2013 May 07.
Artigo em Inglês | MEDLINE | ID: mdl-23537055

RESUMO

Mass spectrometry imaging (MSI) generates large volumetric data sets consisting of mass to charge ratio (m/z), ion current, and x,y coordinate location. These data sets usually serve limited purposes centered on measuring the distribution of a small set of ions with known m/z. Such earmarked queries consider only a fraction of the full mass spectrum captured, and there are few tools to assist the exploration of the remaining volume of unknown data in terms of demonstrating similarity or discordance in tissue compartment distribution patterns. Here we present a novel, interactive approach to extract information from MSI data that relies on precalculated data structures to perform queries of large data sets with a typical laptop. We have devised methods to query the full volume to find new m/z values of potential interest based on similarity to biological structures or to the spatial distribution of known ions. We describe these query methods in detail and provide examples demonstrating the power of the methods to "discover" m/z values of ions that have such potentially interesting correlations. The "discovered" ions may be further correlated with either positional locations or the coincident distribution of other ions using successive queries. Finally, we show it is possible to gain insight to the fragmentation pattern of the parent molecule from such correlations. The ability to discover new ions of interest in the unknown bulk of an MSI data set offers the potential to further our understanding of biological and physiological processes related to health and disease.


Assuntos
Pulmão/citologia , Animais , Ratos , Ratos Wistar , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz
9.
J Proteome Res ; 11(4): 2048-60, 2012 Apr 06.
Artigo em Inglês | MEDLINE | ID: mdl-22320401

RESUMO

The experimental autoimmune encephalomyelitis (EAE) model resembles certain aspects of multiple sclerosis (MScl), with common features such as motor dysfunction, axonal degradation, and infiltration of T-cells. We studied the cerebrospinal fluid (CSF) proteome in the EAE rat model to identify proteomic changes relevant for MScl disease pathology. EAE was induced in male Lewis rats by injection of myelin basic protein (MBP) together with complete Freund's adjuvant (CFA). An inflammatory control group was injected with CFA alone, and a nontreated group served as healthy control. CSF was collected at day 10 and 14 after immunization and analyzed by bottom-up proteomics on Orbitrap LC-MS and QTOF LC-MS platforms in two independent laboratories. By combining results, 44 proteins were discovered to be significantly increased in EAE animals compared to both control groups, 25 of which have not been mentioned in relation to the EAE model before. Lysozyme C1, fetuin B, T-kininogen, serum paraoxonase/arylesterase 1, glutathione peroxidase 3, complement C3, and afamin are among the proteins significantly elevated in this rat EAE model. Two proteins, afamin and complement C3, were validated in an independent sample set using quantitative selected reaction monitoring mass spectrometry. The molecular weights of the identified differentially abundant proteins indicated an increased transport across the blood-brain barrier (BBB) at the peak of the disease, caused by an increase in BBB permeability.


Assuntos
Proteínas do Líquido Cefalorraquidiano/análise , Modelos Animais de Doenças , Encefalomielite Autoimune Experimental/líquido cefalorraquidiano , Esclerose Múltipla/líquido cefalorraquidiano , Proteoma/análise , Proteômica/métodos , Animais , Peso Corporal , Proteínas do Líquido Cefalorraquidiano/química , Cromatografia Líquida , Masculino , Espectrometria de Massas , Paralisia/líquido cefalorraquidiano , Ratos , Ratos Endogâmicos Lew
10.
Bioinformatics ; 27(8): 1176-8, 2011 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-21349866

RESUMO

UNLABELLED: Warp2D is a novel time alignment approach, which uses the overlapping peak volume of the reference and sample peak lists to correct misleading peak shifts. Here, we present an easy-to-use web interface for high-throughput Warp2D batch processing time alignment service using the Dutch Life Science Grid, reducing processing time from days to hours. This service provides the warping function, the sample chromatogram peak list with adjusted retention times and normalized quality scores based on the sum of overlapping peak volume of all peaks. Heat maps before and after time alignment are created from the arithmetic mean of the sum of overlapping peak area rearranged with hierarchical clustering, allowing the quality control of the time alignment procedure. Taverna workflow and command line tool are provided for remote processing of local user data. AVAILABILITY: online data processing service is available at http://www.nbpp.nl/warp2d.html. Taverna workflow is available at myExperiment with title '2D Time Alignment-Webservice and Workflow' at http://www.myexperiment.org/workflows/1283.html. Command line tool is available at http://www.nbpp.nl/Warp2D_commandline.zip. CONTACT: p.l.horvatovich@rug.nl SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Cromatografia Líquida/métodos , Espectrometria de Massas/métodos , Metabolômica/métodos , Proteômica/métodos , Software , Animais , Ensaios de Triagem em Larga Escala , Internet , Camundongos
11.
Anal Chem ; 83(20): 7786-94, 2011 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-21879761

RESUMO

We present a new proteomics analysis pipeline focused on maximizing the dynamic range of detected molecules in liquid chromatography-mass spectrometry (LC-MS) data and accurately quantifying low-abundance peaks to identify those with biological relevance. Although there has been much work to improve the quality of data derived from LC-MS instruments, the goal of this study was to extend the dynamic range of analyzed compounds by making full use of the information available within each data set and across multiple related chromatograms in an experiment. Our aim was to distinguish low-abundance signal peaks from noise by noting their coherent behavior across multiple data sets, and central to this is the need to delay the culling of noise peaks until the final peak-matching stage of the pipeline, when peaks from a single sample appear in the context of all others. The application of thresholds that might discard signal peaks early is thereby avoided, hence the name TAPP: threshold-avoiding proteomics pipeline. TAPP focuses on quantitative low-level processing of raw LC-MS data and includes novel preprocessing, peak detection, time alignment, and cluster-based matching. We demonstrate the performance of TAPP on biologically relevant sample data consisting of porcine cerebrospinal fluid spiked over a wide range of concentrations with horse heart cytochrome c.


Assuntos
Cromatografia Líquida de Alta Pressão , Espectrometria de Massas , Proteômica , Animais , Citocromos c/análise , Cavalos , Miocárdio/metabolismo
12.
Clin Chem ; 57(12): 1703-11, 2011 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-21998343

RESUMO

BACKGROUND: Because cerebrospinal fluid (CSF) is in close contact with diseased areas in neurological disorders, it is an important source of material in the search for molecular biomarkers. However, sample handling for CSF collected from patients in a clinical setting might not always be adequate for use in proteomics and metabolomics studies. METHODS: We left CSF for 0, 30, and 120 min at room temperature immediately after sample collection and centrifugation/removal of cells. At 2 laboratories CSF proteomes were subjected to tryptic digestion and analyzed by use of nano-liquid chromatography (LC) Orbitrap mass spectrometry (MS) and chipLC quadrupole TOF-MS. Metabolome analysis was performed at 3 laboratories by NMR, GC-MS, and LC-MS. Targeted analyses of cystatin C and albumin were performed by LC-tandem MS in the selected reaction monitoring mode. RESULTS: We did not find significant changes in the measured proteome and metabolome of CSF stored at room temperature after centrifugation, except for 2 peptides and 1 metabolite, 2,3,4-trihydroxybutanoic (threonic) acid, of 5780 identified peptides and 93 identified metabolites. A sensitive protein stability marker, cystatin C, was not affected. CONCLUSIONS: The measured proteome and metabolome of centrifuged human CSF is stable at room temperature for up to 2 hours. We cannot exclude, however, that changes undetectable with our current methodology, such as denaturation or proteolysis, might occur because of sample handling conditions. The stability we observed gives laboratory personnel at the collection site sufficient time to aliquot samples before freezing and storage at -80 °C.


Assuntos
Metaboloma , Proteoma/metabolismo , Manejo de Espécimes , Líquido Cefalorraquidiano , Cromatografia Gasosa , Cromatografia Líquida , Humanos , Espectroscopia de Ressonância Magnética , Espectrometria de Massas/métodos , Fatores de Tempo
13.
J Proteome Res ; 9(3): 1483-95, 2010 Mar 05.
Artigo em Inglês | MEDLINE | ID: mdl-20070124

RESUMO

Time alignment of complex LC-MS data remains a challenge in proteomics and metabolomics studies. This work describes modifications of the Dynamic Time Warping (DTW) and the Parametric Time Warping (PTW) algorithms that improve the alignment quality for complex, highly variable LC-MS data sets. Regular DTW or PTW use one-dimensional profiles such as the Total Ion Chromatogram (TIC) or Base Peak Chromatogram (BPC) resulting in correct alignment if the signals have a relatively simple structure. However, when aligning the TICs of chromatograms from complex mixtures with large concentration variability such as serum or urine, both algorithms often lead to misalignment of peaks and thus incorrect comparisons in the subsequent statistical analysis. This is mainly due to the fact that compounds with different m/z values but similar retention times are not considered separately but confounded in the benefit function of the algorithms using only one-dimensional information. Thus, it is necessary to treat the information of different mass traces separately in the warping function to ensure that compounds having the same m/z value and retention time are aligned to each other. The Component Detection Algorithm (CODA) is widely used to calculate the quality of an LC-MS mass trace. By combining CODA with the warping algorithms of DTW or PTW (DTW-CODA or PTW-CODA), we include only high quality mass traces measured by CODA in the benefit function. Our results show that using several CODA selected high quality mass traces in DTW-CODA and PTW-CODA significantly improves the alignment quality of three different, highly complex LC-MS data sets. Moreover, DTW-CODA leads to better preservation of peak shape as compared to the original DTW-TIC algorithm, which often suffers from a substantial peak shape distortion. Our results show that combination of CODA selected mass traces with different time alignment algorithm is a general principle that provide accurate alignment for highly complex samples with large concentration variability.


Assuntos
Algoritmos , Cromatografia Líquida/métodos , Biologia Computacional/métodos , Espectrometria de Massas/métodos , Metabolômica/métodos , Proteínas Sanguíneas/análise , Bases de Dados Factuais , Humanos , Proteinúria/sangue , Fatores de Tempo , Tripsina/química , Tripsina/metabolismo , Urina/química
14.
J Proteome Res ; 8(12): 5511-22, 2009 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-19845411

RESUMO

To standardize the use of cerebrospinal fluid (CSF) for biomarker research, a set of stability studies have been performed on porcine samples to investigate the influence of common sample handling procedures on proteins, peptides, metabolites and free amino acids. This study focuses at the effect on proteins and peptides, analyzed by applying label-free quantitation using microfluidics nanoscale liquid chromatography coupled with quadrupole time-of-flight mass spectrometry (chipLC-MS) as well as matrix-assisted laser desorption ionization Fourier transform ion cyclotron resonance mass spectrometry (MALDI-FT-ICR-MS) and Orbitrap LC-MS/MS to trypsin-digested CSF samples. The factors assessed were a 30 or 120 min time delay at room temperature before storage at -80 degrees C after the collection of CSF in order to mimic potential delays in the clinic (delayed storage), storage at 4 degrees C after trypsin digestion to mimic the time that samples remain in the cooled autosampler of the analyzer, and repeated freeze-thaw cycles to mimic storage and handling procedures in the laboratory. The delayed storage factor was also analyzed by gas chromatography mass spectrometry (GC-MS) and liquid chromatography mass spectrometry (LC-MS) for changes of metabolites and free amino acids, respectively. Our results show that repeated freeze/thawing introduced changes in transthyretin peptide levels. The trypsin digested samples left at 4 degrees C in the autosampler showed a time-dependent decrease of peak areas for peptides from prostaglandin D-synthase and serotransferrin. Delayed storage of CSF led to changes in prostaglandin D-synthase derived peptides as well as to increased levels of certain amino acids and metabolites. The changes of metabolites, amino acids and proteins in the delayed storage study appear to be related to remaining white blood cells. Our recommendations are to centrifuge CSF samples immediately after collection to remove white blood cells, aliquot, and then snap-freeze the supernatant in liquid nitrogen for storage at -80 degrees C. Preferably samples should not be left in the autosampler for more than 24 h and freeze/thaw cycles should be avoided if at all possible.


Assuntos
Líquido Cefalorraquidiano/química , Estabilidade Proteica , Proteoma/química , Manejo de Espécimes/métodos , Preservação de Tecido/métodos , Aminoácidos , Biomarcadores/líquido cefalorraquidiano , Criopreservação , Humanos , Oxirredutases Intramoleculares/metabolismo , Leucócitos/química , Leucócitos/metabolismo , Lipocalinas/metabolismo , Metabolômica , Peptídeos , Proteínas , Proteoma/metabolismo , Proteômica/métodos , Padrões de Referência , Manejo de Espécimes/normas , Preservação de Tecido/normas
15.
Bioinformatics ; 24(8): 1070-7, 2008 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-18353791

RESUMO

MOTIVATION: Mass spectrometry data are subjected to considerable noise. Good noise models are required for proper detection and quantification of peptides. We have characterized noise in both quadrupole time-of-flight (Q-TOF) and ion trap data, and have constructed models for the noise. RESULTS: We find that the noise in Q-TOF data from Applied Biosystems QSTAR fits well to a combination of multinomial and Poisson model with detector dead-time correction. In comparison, ion trap noise from Agilent MSD-Trap-SL is larger than the Q-TOF noise and is proportional to Poisson noise. We then demonstrate that the noise model can be used to improve deisotoping for peptide detection, by estimating appropriate cutoffs of the goodness of fit parameter at prescribed error rates. The noise models also have implications in noise reduction, retention time alignment and significance testing for biomarker discovery.


Assuntos
Artefatos , Modelos Químicos , Proteínas/química , Análise de Sequência de Proteína/métodos , Espectrometria de Massas por Ionização por Electrospray/métodos , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz/métodos , Simulação por Computador , Modelos Estatísticos , Proteínas/análise , Proteômica/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
16.
Anal Chem ; 80(9): 3095-104, 2008 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-18396914

RESUMO

We describe a new time alignment method that takes advantage of both dimensions of LC-MS data to resolve ambiguities in peak matching while remaining computationally efficient. This approach, Warp2D, combines peak extraction with a two-dimensional correlation function to provide a reliable alignment scoring function that is insensitive to spurious peaks and background noise. One-dimensional alignment methods are often based on the total-ion-current elution profile of the spectrum and are unable to distinguish peaks of different masses. Our approach uses one-dimensional alignment in time, but with a scoring function derived from the overlap of peaks in two dimensions, thereby combining the specificity of two-dimensional methods with the computational performance of one-dimensional methods. The peaks are approximated as two-dimensional Gaussians of varying width. This approximation allows peak overlap (the measure of alignment quality) to be calculated analytically, without computationally intensive numerical integration in two dimensions. To demonstrate the general applicability of Warp2D, we chose a variety of complex samples that have substantial biological and analytical variability, including human serum and urine. We show that Warp2D works well with these diverse sample sets and with minimal tuning of parameters, based on the reduced standard deviation of peak elution times after warping. The combination of high computational speed, robustness with complex samples, and lack of need for detailed tuning makes this alignment method well suited to high-throughput LC-MS studies.


Assuntos
Interpretação Estatística de Dados , Cromatografia Gasosa-Espectrometria de Massas/métodos , Adulto , Idoso , Idoso de 80 Anos ou mais , Animais , Proteínas Sanguíneas/análise , Citocromos c/análise , Feminino , Cavalos , Humanos , Pessoa de Meia-Idade , Gravidez , Urinálise/métodos , Neoplasias do Colo do Útero/sangue , Neoplasias do Colo do Útero/urina
17.
Anal Chem ; 80(18): 7012-21, 2008 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-18715018

RESUMO

Correlation optimized warping (COW) based on the total ion current (TIC) is a widely used time alignment algorithm (COW-TIC). This approach works successfully on chromatograms containing few compounds and having a well-defined TIC. In this paper, we have combined COW with a component detection algorithm (CODA) to align LC-MS chromatograms containing thousands of biological compounds with overlapping chromatographic peaks, a situation where COW-TIC often fails. CODA is a variable selection procedure that selects mass chromatograms with low noise and low background (so-called "high-quality" mass chromatograms). High-quality mass chromatograms selected in each COW segment ensure that the same compounds (based on their mass and their retention time) are used in the two-dimensional benefit function of COW to obtain correct and optimal alignments (COW-CODA). The performance of the COW-CODA algorithm was evaluated on three types of complex data sets obtained from the LC-MS analysis of samples commonly used for biomarker discovery and compared to COW-TIC using a new global comparison method based on overlapping peak area: trypsin-digested serum obtained from cervical cancer patients, trypsin-digested serum from a single patient that was treated with varying preanalytical parameters (factorial design study), and urine from pregnant and nonpregnant women. While COW-CODA did result in minor misalignments in rare cases, it was clearly superior to the COW-TIC algorithm, especially when applied to highly variable chromatograms (factorial design, urine). The presented algorithm thus enables automatic time alignment and accurate peak matching of multiple LC-MS data sets obtained from complex body fluids that are often used for biomarker discovery.


Assuntos
Algoritmos , Cromatografia Líquida/métodos , Espectrometria de Massas/métodos , Feminino , Humanos , Gravidez , Padrões de Referência , Reprodutibilidade dos Testes , Fatores de Tempo , Tripsina/metabolismo , Urina/química , Neoplasias do Colo do Útero/sangue
18.
J Chromatogr A ; 1373: 61-72, 2014 Dec 19.
Artigo em Inglês | MEDLINE | ID: mdl-25482036

RESUMO

Retention time alignment is one of the most challenging steps in processing LC-MS datasets of complex proteomics samples acquired within a differential profiling study. A large number of time alignment methods have been developed for accurate pre-processing of such datasets. These methods generally assume that common compounds elute in the same order but they do not test whether this assumption holds. If this assumption is not valid, alignments based on a monotonic retention time function will lose accuracy for peaks that depart from the expected order of the retention time correspondence function. To address this issue, we propose a quality control method that assesses if a pair of complex LC-MS datasets can be aligned with the same alignment performance based on statistical tests before correcting retention time shifts. The algorithm first confirms the presence of an adequate number of common peaks (>∼100 accurately matched peak pairs), then determines if the probability for a conserved elution order of those common peaks is sufficiently high (>0.01) and finally performs retention time alignment of two LC-MS chromatograms. This procedure was applied to LC-MS and LC-MS/MS datasets from two different inter-laboratory proteomics studies showing that a large number of common peaks in chromatograms acquired by different laboratories change elution order with considerable retention time differences.


Assuntos
Cromatografia Líquida de Alta Pressão/métodos , Espectrometria de Massas/métodos , Algoritmos , Método de Monte Carlo , Probabilidade , Proteômica/métodos , Fatores de Tempo
19.
Electrophoresis ; 28(23): 4493-505, 2007 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-18041038

RESUMO

The discovery of biomarkers in easily accessible body fluids such as serum is one of the most challenging topics in proteomics requiring highly efficient separation and detection methodologies. Here, we present the application of a microfluidics-based LC-MS system (chip-LC-MS) to the label-free profiling of immunodepleted, trypsin-digested serum in comparison to conventional capillary LC-MS (cap-LC-MS). Both systems proved to have a repeatability of approximately 20% RSD for peak area, all sample preparation steps included, while repeatability of the LC-MS part by itself was less than 10% RSD for the chip-LC-MS system. Importantly, the chip-LC-MS system had a two times higher resolution in the LC dimension and resulted in a lower average charge state of the tryptic peptide ions generated in the ESI interface when compared to cap-LC-MS while requiring approximately 30 times less (~5 pmol) sample. In order to characterize both systems for their capability to find discriminating peptides in trypsin-digested serum samples, five out of ten individually prepared, identical sera were spiked with horse heart cytochrome c. A comprehensive data processing methodology was applied including 2-D smoothing, resolution reduction, peak picking, time alignment, and matching of the individual peak lists to create an aligned peak matrix amenable for statistical analysis. Statistical analysis by supervised classification and variable selection showed that both LC-MS systems could discriminate the two sample groups. However, the chip-LC-MS system allowed to assign 55% of the overall signal to selected peaks against 32% for the cap-LC-MS system.


Assuntos
Eletrocromatografia Capilar , Peptídeos/sangue , Análise Serial de Proteínas , Soro/química , Análise de Variância , Animais , Biomarcadores/sangue , Humanos , Proteômica/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz/métodos , Análise Serial de Tecidos , Tripsina/metabolismo
20.
Int J Bioinform Res Appl ; 2(2): 161-76, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-18048160

RESUMO

The distributions of residue hydrophobicity for individual domains as well as for the aggregates of domains on a single chain have been found to exhibit well-defined second-order hydrophobic moment profiles. This indicates that most of the domains do fold into a stable entity with a core composed predominantly of hydrophobic residues as well as a prevalence of hydrophobic residues at the interface between domains. A simple scoring function based upon the relative hydrophobic moment dipole orientations shows that 80% of the dipoles of adjacent domains point to each other, highlighting hydrophobic residue prevalence at the domain interfaces.


Assuntos
Biologia Computacional/métodos , Motivos de Aminoácidos , Bases de Dados de Proteínas , Interações Hidrofóbicas e Hidrofílicas , Modelos Estatísticos , Conformação Molecular , Conformação Proteica , Dobramento de Proteína , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína , Proteínas/química , Proteômica/métodos , Solventes/química
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA