Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 15 de 15
Filter
1.
Anal Chem ; 92(21): 14466-14475, 2020 11 03.
Article in English | MEDLINE | ID: mdl-33079518

ABSTRACT

A data-independent acquisition (DIA) approach is being increasingly adopted as a promising strategy for identification and quantitation of proteomes. As most DIA data sets are acquired with wide isolation windows, highly complex MS/MS spectra are generated, which negatively impacts obtaining peptide information through classical protein database searches. Therefore, the analysis of DIA data mainly relies on the evidence of the existence of peptides from prebuilt spectral libraries. Consequently, one major weakness of this method is that it does not account for peptides that are not included in the spectral library, precluding the use of DIA for discovery studies. Here, we present a strategy termed Precursor ion And Small Slice-DIA (PASS-DIA) in which MS/MS spectra are acquired with small isolation windows (slices) and MS/MS spectra are interpreted with accurately determined precursor ion masses. This method enables the direct application of conventional spectrum-centric analysis pipelines for peptide identification and precursor ion-based quantitation. The performance of PASS-DIA was observed to be superior to both data-dependent acquisition (DDA) and conventional DIA experiments with 69 and 48% additional protein identifications, respectively. Application of PASS-DIA for the analysis of post-translationally modified peptides again highlighted its superior performance in characterizing phosphopeptides (77% more), N-terminal acetylated peptides (56% more), and N-glycopeptides (83% more) as compared to DDA alone. Finally, the use of PASS-DIA to characterize a rare proteome of human fallopian tube organoids enabled 34% additional protein identifications than DDA alone and revealed biologically relevant pathways including low abundance proteins. Overall, PASS-DIA is a novel DIA approach for use as a discovery tool that outperforms both conventional DDA and DIA experiments to provide additional protein information. We believe that the PASS-DIA method is an important strategy for discovery-type studies when deeper proteome characterization is required.


Subject(s)
Proteomics/methods , Tandem Mass Spectrometry , Data Interpretation, Statistical
2.
J Proteome Res ; 18(2): 616-622, 2019 02 01.
Article in English | MEDLINE | ID: mdl-30525664

ABSTRACT

We designed a metaproteomic analysis method (ComPIL) to accommodate the ever-increasing number of sequences against which experimental shotgun proteomics spectra could be accurately and rapidly queried. Our objective was to create these large databases for the analysis of complex metasamples with unknown composition, including those derived from human, animal, and environmental microbiomes. The amount of high-throughput sequencing data has substantially increased since our original database was assembled in 2014. Here, we present a rebuild of the ComPIL libraries comprised of updated publicly disseminated sequence data as well as a modified version of the search engine ProLuCID-ComPIL optimized for querying experimental spectra. ComPIL 2.0 consists of 113 million protein records and roughly 4.8 billion unique tryptic peptide sequences and is 2.3 times the size of our original version. We searched a data set collected on a healthy human gut microbiome proteomic sample and compared the results to demonstrate that ComPIL 2.0 showed a substantial increase in the number of unique identified peptides and proteins compared to the first ComPIL version. The high confidence of protein identification and accuracy demonstrated by the use of ComPIL 2.0 may encourage the method's application for large-scale proteomic annotation of complex protein systems.


Subject(s)
Complex Mixtures/analysis , Databases, Protein , Proteomics/methods , Amino Acid Sequence , Animals , Bacterial Proteins/analysis , Gastrointestinal Microbiome , Humans , Peptides/analysis , Search Engine
3.
J Proteome Res ; 16(12): 4425-4434, 2017 12 01.
Article in English | MEDLINE | ID: mdl-28965411

ABSTRACT

Human Proteome Project aims to map all human proteins including missing proteins as well as proteoforms with post translational modifications, alternative splicing variants (ASVs), and single amino acid variants (SAAVs). neXtProt and Ensemble databases are usually used to provide curated information on human coding genes. However, to find these proteoforms, we (Chr #11 team) first introduce a streamlined pipeline using customized and concatenated neXtProt and GENCODE originated from Ensemble, with controlled false discovery rate (FDR). Because of large sized databases used in this pipeline, we found more stringent FDR filtering (0.1% at the peptide level and 1% at the protein level) to claim novel findings, such as GENCODE ASVs and missing proteins, from human hippocampus data set (MSV000081385) and ProteomeXchange (PXD007166). Using our next generation proteomic pipeline (nextPP) with neXtProt and GENCODE databases, two missing proteins such as activity-regulated cytoskeleton-associated protein (ARC, Chr 8) and glutamate receptor ionotropic, kainite 5 (GRIK5, Chr 19) were additionally identified with two or more unique peptides from human brain tissues. Additionally, by applying the pipeline to human brain related data sets such as cortex (PXD000067 and PXD000561), spinal cord, and fetal brain (PXD000561), seven GENCODE ASVs such as ACTN4-012 (Chr.19), DPYSL2-005 (Chr.8), MPRIP-003 (Chr.17), NCAM1-013 (Chr.11), EPB41L1-017 (Chr.20), AGAP1-004 (Chr.2), and CPNE5-005 (Chr.6) were identified from two or more data sets. The identified peptides of GENCODE ASVs were mapped onto novel exon insertions, alternative translations at 5'-untranslated region, or novel protein coding sequence. Applying the pipeline to male reproductive organ related data sets, 52 GENCODE ASVs were identified from two testis (PXD000561 and PXD002179) and a spermatozoa (PXD003947) data sets. Four out of 52 GENCODE ASVs such as RAB11FIP5-008 (Chr. 2), RP13-347D8.7-001 (Chr. X), PRDX4-002 (Chr. X), and RP11-666A8.13-001 (Chr. 17) were identified in all of the three samples.


Subject(s)
Brain Chemistry , Chromosomes, Human/genetics , Databases, Protein , Proteomics/methods , Alternative Splicing , Hippocampus/chemistry , Humans , Male , Protein Processing, Post-Translational , Spermatozoa/chemistry , Testis/chemistry
4.
PLoS Genet ; 10(10): e1004588, 2014 Oct.
Article in English | MEDLINE | ID: mdl-25299455

ABSTRACT

In addition to the DNA contributed by sperm and oocytes, embryos receive parent-specific epigenetic information that can include histone variants, histone post-translational modifications (PTMs), and DNA methylation. However, a global view of how such marks are erased or retained during gamete formation and reprogrammed after fertilization is lacking. To focus on features conveyed by histones, we conducted a large-scale proteomic identification of histone variants and PTMs in sperm and mixed-stage embryo chromatin from C. elegans, a species that lacks conserved DNA methylation pathways. The fate of these histone marks was then tracked using immunostaining. Proteomic analysis found that sperm harbor Ć¢ĀˆĀ¼2.4 fold lower levels of histone PTMs than embryos and revealed differences in classes of PTMs between sperm and embryos. Sperm chromatin repackaging involves the incorporation of the sperm-specific histone H2A variant HTAS-1, a widespread erasure of histone acetylation, and the retention of histone methylation at sites that mark the transcriptional history of chromatin domains during spermatogenesis. After fertilization, we show HTAS-1 and 6 histone PTM marks distinguish sperm and oocyte chromatin in the new embryo and characterize distinct paternal and maternal histone remodeling events during the oocyte-to-embryo transition. These include the exchange of histone H2A that is marked by ubiquitination, retention of HTAS-1, removal of the H2A variant HTZ-1, and differential reprogramming of histone PTMs. This work identifies novel and conserved features of paternal chromatin that are specified during spermatogenesis and processed in the embryo. Furthermore, our results show that different species, even those with diverged DNA packaging and imprinting strategies, use conserved histone modification and removal mechanisms to reprogram epigenetic information.


Subject(s)
Caenorhabditis elegans/embryology , Caenorhabditis elegans/genetics , Epigenesis, Genetic , Histones/metabolism , Spermatozoa/physiology , Acetylation , Amino Acid Sequence , Animals , Caenorhabditis elegans Proteins/genetics , Caenorhabditis elegans Proteins/metabolism , Chromatin/metabolism , Embryo, Nonmammalian , Female , Gene Expression Regulation, Developmental , Male , Methylation , Molecular Sequence Data , Oocytes/metabolism , Protein Processing, Post-Translational , Spermatozoa/metabolism , Ubiquitination
5.
J Proteome Res ; 15(11): 4082-4090, 2016 11 04.
Article in English | MEDLINE | ID: mdl-27537616

ABSTRACT

In the Chromosome-Centric Human Proteome Project (C-HPP), false-positive identification by peptide spectrum matches (PSMs) after database searches is a major issue for proteogenomic studies using liquid-chromatography and mass-spectrometry-based large proteomic profiling. Here we developed a simple strategy for protein identification, with a controlled false discovery rate (FDR) at the protein level, using an integrated proteomic pipeline (IPP) that consists of four engrailed steps as follows. First, using three different search engines, SEQUEST, MASCOT, and MS-GF+, individual proteomic searches were performed against the neXtProt database. Second, the search results from the PSMs were combined using statistical evaluation tools including DTASelect and Percolator. Third, the peptide search scores were converted into E-scores normalized using an in-house program. Last, ProteinInferencer was used to filter the proteins containing two or more peptides with a controlled FDR of 1.0% at the protein level. Finally, we compared the performance of the IPP to a conventional proteomic pipeline (CPP) for protein identification using a controlled FDR of <1% at the protein level. Using the IPP, a total of 5756 proteins (vs 4453 using the CPP) including 477 alternative splicing variants (vs 182 using the CPP) were identified from human hippocampal tissue. In addition, a total of 10 missing proteins (vs 7 using the CPP) were identified with two or more unique peptides, and their tryptic peptides were validated using MS/MS spectral pattern from a repository database or their corresponding synthetic peptides. This study shows that the IPP effectively improved the identification of proteins, including alternative splicing variants and missing proteins, in human hippocampal tissues for the C-HPP. All RAW files used in this study were deposited in ProteomeXchange (PXD000395).


Subject(s)
Hippocampus/chemistry , Proteogenomics/methods , Proteomics/methods , Search Engine , Alternative Splicing , Computational Biology/methods , Databases, Protein , False Positive Reactions , Humans , Mass Spectrometry/methods
6.
BMC Genomics ; 17(1): 642, 2016 08 16.
Article in English | MEDLINE | ID: mdl-27528457

ABSTRACT

BACKGROUND: Mass spectrometry-based shotgun proteomics experiments rely on accurate matching of experimental spectra against a database of protein sequences. Existing computational analysis methods are limited in the size of their sequence databases, which severely restricts the proteomic sequencing depth and functional analysis of highly complex samples. The growing amount of public high-throughput sequencing data will only exacerbate this problem. We designed a broadly applicable metaproteomic analysis method (ComPIL) that addresses protein database size limitations. RESULTS: Our approach to overcome this significant limitation in metaproteomics was to design a scalable set of sequence databases assembled for optimal library querying speeds. ComPIL was integrated with a modified version of the search engine ProLuCID (termed "Blazmass") to permit rapid matching of experimental spectra. Proof-of-principle analysis of human HEK293 lysate with a ComPIL database derived from high-quality genomic libraries was able to detect nearly all of the same peptides as a search with a human database (~500x fewer peptides in the database), with a small reduction in sensitivity. We were also able to detect proteins from the adenovirus used to immortalize these cells. We applied our method to a set of healthy human gut microbiome proteomic samples and showed a substantial increase in the number of identified peptides and proteins compared to previous metaproteomic analyses, while retaining a high degree of protein identification accuracy and allowing for a more in-depth characterization of the functional landscape of the samples. CONCLUSIONS: The combination of ComPIL with Blazmass allows proteomic searches to be performed with database sizes much larger than previously possible. These large database searches can be applied to complex meta-samples with unknown composition or proteomic samples where unexpected proteins may be identified. The protein database, proteomic search engine, and the proteomic data files for the 5 microbiome samples characterized and discussed herein are open source and available for use and additional analysis.


Subject(s)
Databases, Protein , Proteome , Proteomics/methods , Search Engine , Bacterial Proteins , Gastrointestinal Microbiome , Host-Pathogen Interactions , Humans , Peptides , Reproducibility of Results
7.
J Proteome Res ; 14(12): 5028-37, 2015 Dec 04.
Article in English | MEDLINE | ID: mdl-26549206

ABSTRACT

The goal of the Chromosome-Centric Human Proteome Project (C-HPP) is to fully provide proteomic information from each human chromosome, including novel proteoforms, such as novel protein-coding variants expressed from noncoding genomic regions, alternative splicing variants (ASVs), and single amino acid variants (SAAVs). In the 144 LC/MS/MS raw files from human hippocampal tissues of control, epilepsy, and Alzheimer's disease, we identified the novel proteoforms with a workflow including integrated proteomic pipeline using three different search engines, MASCOT, SEQUEST, and MS-GF+. With a <1% false discovery rate (FDR) at the protein level, the 11 detected peptides mapped to four translated long noncoding RNA variants against the customized databases of GENCODE lncRNA, which also mapped to coding-proteins at different chromosomal sites. We also identified four novel ASVs against the customized databases of GENCODE transcript. The target peptides from the variants were validated by tandem MS fragmentation pattern from their corresponding synthetic peptides. Additionally, a total of 128 SAAVs paired with their wild-type peptides were identified with FDR <1% at the peptide level using a customized database from neXtProt including nonsynonymous single nucleotide polymorphism (nsSNP) information. Among these results, several novel variants related in neuro-degenerative disease were identified using the workflow that could be applicable to C-HPP studies. All raw files used in this study were deposited in ProteomeXchange (PXD000395).


Subject(s)
Alzheimer Disease/metabolism , Epilepsy/metabolism , Hippocampus/metabolism , Proteomics/methods , Alternative Splicing , Alzheimer Disease/genetics , Amino Acid Sequence , Case-Control Studies , Chromatography, Liquid , Chromosomes, Human , Databases, Genetic , Databases, Protein , Epilepsy/genetics , Genetic Variation , Hippocampus/physiology , Humans , Molecular Sequence Data , Polymorphism, Single Nucleotide , Software , Tandem Mass Spectrometry , Workflow
8.
Bioinformatics ; 30(15): 2208-9, 2014 Aug 01.
Article in English | MEDLINE | ID: mdl-24681903

ABSTRACT

MOTIVATION: We introduce Census 2, an update of a mass spectrometry data analysis tool for peptide/protein quantification. New features for analysis of isobaric labeling, such as Tandem Mass Tag (TMT) or Isobaric Tags for Relative and Absolute Quantification (iTRAQ), have been added in this version, including a reporter ion impurity correction, a reporter ion intensity threshold filter and an option for weighted normalization to correct mixing errors. TMT/iTRAQ analysis can be performed on experiments using HCD (High Energy Collision Dissociation) only, CID (Collision Induced Dissociation)/HCD (High Energy Collision Dissociation) dual scans or HCD triple-stage mass spectrometry data. To improve measurement accuracy, we implemented weighted normalization, multiple tandem spectral approach, impurity correction and dynamic intensity threshold features. AVAILABILITY AND IMPLEMENTATION: Census 2 supports multiple input file formats including MS1/MS2, DTASelect, mzXML and pepXML. It requires JAVA version 6 or later to run. Free download of Census 2 for academic users is available at http://fields.scripps.edu/census/index.php. CONTACT: jyates@scripps.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Mass Spectrometry/methods , Proteomics/methods , Statistics as Topic/methods , Animals , Cell Line , Isotope Labeling , Mice , Peptides/analysis , Peptides/chemistry , Proteins/analysis , Proteins/chemistry
9.
J Proteome Res ; 13(3): 1494-501, 2014 Mar 07.
Article in English | MEDLINE | ID: mdl-24417624

ABSTRACT

Chemical labeling of peptides prior to shotgun proteomics allows relative quantification of proteins in biological samples independent of sample origin. Current strategies utilize isobaric labels that fragment into reporter ions. However, quantification of reporter ions results in distorted ratio measurements due to contaminating peptides that are co-selected in the same precursor isolation window. Here, we show that quantitation of isobaric peptide fragment isotopologues in tandem mass spectra reduces precursor interference. The method is based on the relative quantitation of isobaric isotopologues of dimethylated peptide fragments in tandem mass spectra following higher energy collisional dissociation (HCD). The approach enables precise quantification of a proteome down to single spectra per protein and quantifies >90% of proteins in a MudPIT experiment and accurately measures proteins in a model cell line for cystic fibrosis.


Subject(s)
Peptide Fragments/analysis , Proteome/chemistry , Proteomics/methods , Carbon Isotopes , Cell Line , Cystic Fibrosis/metabolism , Humans , Isotope Labeling , Tandem Mass Spectrometry
SELECTION OF CITATIONS
SEARCH DETAIL