RESUMEN
Genome-based peptide fingerprint scanning (GFS) directly maps several types of protein mass spectral (MS) data to the loci in the genome that may have encoded for the protein. This process can be used either for protein identification or for proteogenomic mapping, which is gene-finding and annotation based on proteomic data. Inputs to the program are one or more mass spectrometry files from peptide mass fingerprinting and/or tandem MS (MS/MS) along with one or more sequences to search them against, and the output is the coordinates of any matches found. This unit describes the use of GFS and subsequent results analysis.
Asunto(s)
Algoritmos , Espectrometría de Masas/métodos , Mapeo Peptídico/métodos , Proteínas/química , Proteínas/genética , Análisis de Secuencia/métodos , Programas Informáticos , Secuencia de Aminoácidos , Secuencia de Bases , Mapeo Cromosómico , Código Genético , Datos de Secuencia MolecularRESUMEN
The interpretation of mass spectrometry data for protein identification has become a vital component of proteomics research. However, since most existing software tools rely on protein databases, their success is limited, especially as the pace of annotation efforts fails to keep pace with sequencing. We present a publicly available, web-based version of a software tool that maps peptide mass fingerprint data directly to their genomic origin, allowing for genome-based, annotation-independent protein identification.
Asunto(s)
Genómica/métodos , Internet , Espectrometría de Masas , Fragmentos de Péptidos , Bases de Datos de Proteínas , Proteínas/análisis , Proteómica/métodosRESUMEN
We present a Web-based application that uses whole-protein masses determined by mass spectrometry to identify putative co- and posttranslational proteolytic cleavages and chemical modifications. The protein cleavage and modification engine (PROCLAME) requires as input an intact mass measurement and a precursor identification based on peptide mass fingerprinting or tandem mass spectrometry. This approach predicts mass-modifying events using a depth-first tree search, bounded by a set of rules controlled by a custom-built fuzzy logic engine, to explore a large number of possible combinations of modifications accounting for the experimental mass. Candidates are saved during a search if they are within a user-specified instrument mass accuracy; the total number of possible candidates searched is based on a specified fuzzy cutoff score. Candidates are scored and ranked using a simple probabilistic model. There is generally not enough information in an intact mass measurement to determine a single unique protein characterization; however, the program provides utility by expediting the identification of sets of putative events consistent with the mass data and ranking them for further investigation. This approach uses a simple, intuitive rule base and lends itself to discovery of unannotated posttranslational events. We have assessed the program with both in silico-generated test data and with published data from an analysis of large ribosomal subunit proteins, both from the yeast S. cerevisiae. Results indicate a high degree of sensitivity and specificity in characterizing proteins whose masses resulted from reasonable proteolysis and covalent modification scenarios. The application is available on the web at http://proclame.unc.edu.