ABSTRACT
MOTIVATION: There are several well-established paradigms for identifying and pinpointing discriminative peptides/proteins using shotgun proteomic data; examples are peptide-spectrum matching, de novo sequencing, open searches, and even hybrid approaches. Such an arsenal of complementary paradigms can provide deep data coverage, albeit some unidentified discriminative peptides remain. RESULTS: We present DiagnoMass, software tool that groups similar spectra into spectral clusters and then shortlists those clusters that are discriminative for biological conditions. DiagnoMass then communicates with proteomic tools to attempt the identification of such clusters. We demonstrate the effectiveness of DiagnoMass by analyzing proteomic data from Escherichia coli, Salmonella, and Shigella, listing many high-quality discriminative spectral clusters that had thus far remained unidentified by widely adopted proteomic tools. DiagnoMass can also classify proteomic profiles. We anticipate the use of DiagnoMass as a vital tool for pinpointing biomarkers. AVAILABILITY: DiagnoMass and related documentation, including a usage protocol, are available at http://www.diagnomass.com.
Subject(s)
Proteomics , Software , Proteomics/methods , Proteins/chemistry , Peptides/chemistry , Escherichia coli , Algorithms , Databases, ProteinABSTRACT
MOTIVATION: Around 75% of all mass spectra remain unidentified by widely adopted proteomic strategies. We present DiagnoProt, an integrated computational environment that can efficiently cluster millions of spectra and use machine learning to shortlist high-quality unidentified mass spectra that are discriminative of different biological conditions. RESULTS: We exemplify the use of DiagnoProt by shortlisting 4366 high-quality unidentified tandem mass spectra that are discriminative of different types of the Aspergillus fungus. AVAILABILITY AND IMPLEMENTATION: DiagnoProt, a demonstration video and a user tutorial are available at http://patternlabforproteomics.org/diagnoprot . CONTACT: andrerfsilva@gmail.com or paulo@pcarvalho.com. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.