Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 7 de 7
Filter
Add more filters










Database
Language
Publication year range
1.
Mol Cell Proteomics ; 11(8): 478-91, 2012 Aug.
Article in English | MEDLINE | ID: mdl-22493177

ABSTRACT

Peptide identification using tandem mass spectrometry is a core technology in proteomics. Latest generations of mass spectrometry instruments enable the use of electron transfer dissociation (ETD) to complement collision induced dissociation (CID) for peptide fragmentation. However, a critical limitation to the use of ETD has been optimal database search software. Percolator is a post-search algorithm, which uses semi-supervised machine learning to improve the rate of peptide spectrum identifications (PSMs) together with providing reliable significance measures. We have previously interfaced the Mascot search engine with Percolator and demonstrated sensitivity and specificity benefits with CID data. Here, we report recent developments in the Mascot Percolator V2.0 software including an improved feature calculator and support for a wider range of ion series. The updated software is applied to the analysis of several CID and ETD fragmented peptide data sets. This version of Mascot Percolator increases the number of CID PSMs by up to 80% and ETD PSMs by up to 60% at a 0.01 q-value (1% false discovery rate) threshold over a standard Mascot search, notably recovering PSMs from high charge state precursor ions. The greatly increased number of PSMs and peptide coverage afforded by Mascot Percolator has enabled a fuller assessment of CID/ETD complementarity to be performed. Using a data set of CID and ETcaD spectral pairs, we find that at a 1% false discovery rate, the overlap in peptide identifications by CID and ETD is 83%, which is significantly higher than that obtained using either stand-alone Mascot (69%) or OMSSA (39%). We conclude that Mascot Percolator is a highly sensitive and accurate post-search algorithm for peptide identification and allows direct comparison of peptide identifications using multiple alternative fragmentation techniques.


Subject(s)
Algorithms , Peptides/analysis , Proteomics/methods , Software , Tandem Mass Spectrometry/methods , Artificial Intelligence , Chromatography, Liquid , Databases, Protein , Escherichia coli/metabolism , Escherichia coli Proteins/analysis , Fungal Proteins/analysis , Humans , Reproducibility of Results , Yeasts/metabolism
2.
Genome Res ; 21(5): 756-67, 2011 May.
Article in English | MEDLINE | ID: mdl-21460061

ABSTRACT

Recent advances in proteomic mass spectrometry (MS) offer the chance to marry high-throughput peptide sequencing to transcript models, allowing the validation, refinement, and identification of new protein-coding loci. We present a novel pipeline that integrates highly sensitive and statistically robust peptide spectrum matching with genome-wide protein-coding predictions to perform large-scale gene validation and discovery in the mouse genome for the first time. In searching an excess of 10 million spectra, we have been able to validate 32%, 17%, and 7% of all protein-coding genes, exons, and splice boundaries, respectively. Moreover, we present strong evidence for the identification of multiple alternatively spliced translations from 53 genes and have uncovered 10 entirely novel protein-coding genes, which are not covered in any mouse annotation data sources. One such novel protein-coding gene is a fusion protein that spans the Ins2 and Igf2 loci to produce a transcript encoding the insulin II and the insulin-like growth factor 2-derived peptides. We also report nine processed pseudogenes that have unique peptide hits, demonstrating, for the first time, that they are not just transcribed but are translated and are therefore resurrected into new coding loci. This work not only highlights an important utility for MS data in genome annotation but also provides unique insights into the gene structure and propagation in the mouse genome. All these data have been subsequently used to improve the publicly available mouse annotation available in both the Vega and Ensembl genome browsers (http://vega.sanger.ac.uk).


Subject(s)
Alternative Splicing , Genes , Peptides/genetics , Proteomics/methods , Pseudogenes/genetics , Tandem Mass Spectrometry/methods , Animals , Genome , Genomics/methods , Mice , Peptides/chemistry
3.
Cancer Res ; 70(3): 883-95, 2010 Feb 01.
Article in English | MEDLINE | ID: mdl-20103622

ABSTRACT

Comparative genomic hybridization (CGH) can reveal important disease genes but the large regions identified could sometimes contain hundreds of genes. Here we combine high-resolution CGH analysis of 598 human cancer cell lines with insertion sites isolated from 1,005 mouse tumors induced with the murine leukemia virus (MuLV). This cross-species oncogenomic analysis revealed candidate tumor suppressor genes and oncogenes mutated in both human and mouse tumors, making them strong candidates for novel cancer genes. A significant number of these genes contained binding sites for the stem cell transcription factors Oct4 and Nanog. Notably, mice carrying tumors with insertions in or near stem cell module genes, which are thought to participate in cell self-renewal, died significantly faster than mice without these insertions. A comparison of the profile we identified to that induced with the Sleeping Beauty (SB) transposon system revealed significant differences in the profile of recurrently mutated genes. Collectively, this work provides a rich catalogue of new candidate cancer genes for functional analysis.


Subject(s)
Comparative Genomic Hybridization/methods , Genetic Predisposition to Disease/genetics , Neoplasms/genetics , Tumor Suppressor Proteins/genetics , Animals , Binding Sites/genetics , Cell Line, Tumor , DNA Transposable Elements/genetics , Female , Genomics/methods , Homeodomain Proteins/metabolism , Humans , Male , Mice , Mice, Inbred C57BL , Mutagenesis, Insertional , Mutation , Nanog Homeobox Protein , Neoplasms/metabolism , Neoplasms/pathology , Octamer Transcription Factor-3/metabolism , Species Specificity , Stem Cells/metabolism , Tumor Suppressor Proteins/metabolism
4.
Methods Mol Biol ; 604: 43-53, 2010.
Article in English | MEDLINE | ID: mdl-20013363

ABSTRACT

A variety of methods are described in the literature to assign peptide sequences to observed tandem MS data. Typically, the identified peptides are associated only with an arbitrary score that reflects the quality of the peptide-spectrum match but not with a statistically meaningful significance measure. In this chapter, we discuss why statistical significance measures can simplify and unify the interpretation of MS-based proteomic experiments. In addition, we also present available software solutions that convert scores into sound statistical measures.


Subject(s)
Peptides/analysis , Software , Tandem Mass Spectrometry/methods , Databases, Protein , Statistical Distributions
5.
J Proteome Res ; 8(6): 3176-81, 2009 Jun.
Article in English | MEDLINE | ID: mdl-19338334

ABSTRACT

Sound scoring methods for sequence database search algorithms such as Mascot and Sequest are essential for sensitive and accurate peptide and protein identifications from proteomic tandem mass spectrometry data. In this paper, we present a software package that interfaces Mascot with Percolator, a well performing machine learning method for rescoring database search results, and demonstrate it to be amenable for both low and high accuracy mass spectrometry data, outperforming all available Mascot scoring schemes as well as providing reliable significance measures. Mascot Percolator can be readily used as a stand alone tool or integrated into existing data analysis pipelines.


Subject(s)
Peptide Fragments/analysis , Proteomics/methods , Software , Algorithms , Artificial Intelligence , Chromatography, Liquid , Databases, Protein , Peptide Fragments/chemistry , Reproducibility of Results , Sensitivity and Specificity , Sequence Analysis, Protein , Tandem Mass Spectrometry
6.
Mol Cell Proteomics ; 7(5): 962-70, 2008 May.
Article in English | MEDLINE | ID: mdl-18216375

ABSTRACT

It is a major challenge to develop effective sequence database search algorithms to translate molecular weight and fragment mass information obtained from tandem mass spectrometry into high quality peptide and protein assignments. We investigated the peptide identification performance of Mascot and X!Tandem for mass tolerance settings common for low and high accuracy mass spectrometry. We demonstrated that sensitivity and specificity of peptide identification can vary substantially for different mass tolerance settings, but this effect was more significant for Mascot. We present an adjusted Mascot threshold, which allows the user to freely select the best trade-off between sensitivity and specificity. The adjusted Mascot threshold was compared with the default Mascot and X!Tandem scoring thresholds and shown to be more sensitive at the same false discovery rates for both low and high accuracy mass spectrometry data.


Subject(s)
Peptides/analysis , Proteomics/methods , Tandem Mass Spectrometry/methods , Algorithms , Animals , Cells, Cultured , Mice , Reproducibility of Results , Sensitivity and Specificity
7.
PLoS Comput Biol ; 3(10): 2032-42, 2007 Oct.
Article in English | MEDLINE | ID: mdl-17967053

ABSTRACT

Network analysis transcends conventional pairwise approaches to data analysis as the context of components in a network graph can be taken into account. Such approaches are increasingly being applied to genomics data, where functional linkages are used to connect genes or proteins. However, while microarray gene expression datasets are now abundant and of high quality, few approaches have been developed for analysis of such data in a network context. We present a novel approach for 3-D visualisation and analysis of transcriptional networks generated from microarray data. These networks consist of nodes representing transcripts connected by virtue of their expression profile similarity across multiple conditions. Analysing genome-wide gene transcription across 61 mouse tissues, we describe the unusual topography of the large and highly structured networks produced, and demonstrate how they can be used to visualise, cluster, and mine large datasets. This approach is fast, intuitive, and versatile, and allows the identification of biological relationships that may be missed by conventional analysis techniques. This work has been implemented in a freely available open-source application named BioLayout Express(3D).


Subject(s)
Computational Biology/methods , Gene Expression Profiling/methods , Gene Expression Regulation , Oligonucleotide Array Sequence Analysis/methods , Transcription, Genetic , Algorithms , Animals , Cluster Analysis , Gene Expression , Gene Regulatory Networks , Imaging, Three-Dimensional , Mice , Pattern Recognition, Automated , Software
SELECTION OF CITATIONS
SEARCH DETAIL
...