RESUMEN
Proteins in a proteome can be identified from a sequence of K integers equal to the digitized volumes of subsequences with L residues from the primary sequence of a stretched protein. Exhaustive computations on the proteins of Helicobacter pylori (UniProt id UP000000210) with L and K in the range 4-8 show that approx. 90% of the proteins can be identified uniquely in this manner. This computational result can be translated into practice with a nanopore, an emerging technology that does not require analyte immobilization, proteolysis or labeling. Unlike other methods, most of which focus on a specific target protein, nanopore-based methods enable the identification of multiple proteins from a sample in a single run. Recent work by Kennedy, Kolmogorov and associates shows that the blockade current due to a protein molecule translocating through a nanopore is roughly proportional to one or more contiguous residues. The present study points to a modified version in which the volumes of subsequences (rather than of single residues) may be obtained by integrating the blockade current due to L contiguous residues. The advantages arising from this include lower detector bandwidth, elimination of the homopolymer problem and reduced noise. Because an identifier is based on near as well as distant (up to 2KL-L) residues, this approach uses more global information than an approach based on single residues and short-range correlations. The results of the study, which are available in a data supplement, are discussed in detail. Potential implementation issues are addressed.
Asunto(s)
Proteínas Bacterianas/aislamiento & purificación , Helicobacter pylori/genética , Modelos Estadísticos , Mapeo Peptídico/estadística & datos numéricos , Proteoma/aislamiento & purificación , Secuencia de Aminoácidos , Aminoácidos , Proteínas Bacterianas/genética , Bases de Datos de Proteínas , Helicobacter pylori/química , Nanoporos , Fragmentos de Péptidos/análisis , Mapeo Peptídico/métodos , Proteoma/genéticaRESUMEN
Nine replicate samples of peptides from soybean leaves, each spiked with a different concentration of bovine apotransferrin peptides, were analyzed on a mass spectrometer using multidimensional protein identification technology (MudPIT). Proteins were detected from the peptide tandem mass spectra, and the numbers of spectra were statistically evaluated for variation between samples. The results corroborate prior knowledge that combining spectra from replicate samples increases the number of identifiable proteins and that a summed spectral count for a protein increases linearly with increasing molar amounts of protein. Furthermore, statistical analysis of spectral counts for proteins in two- and three-way comparisons between replicates and combined replicates revealed little significant variation arising from run-to-run differences or data-dependent instrument ion sampling that might falsely suggest differential protein accumulation. In these experiments, spectral counting was enabled by PANORAMICS, probability-based software that predicts proteins detected by sets of observed peptides. Three alternative approaches to counting spectra were also evaluated by comparison. As the counting thresholds were changed from weaker to more stringent, the accuracy of ratio determination also changed. These results suggest that thresholds for counting can be empirically set to improve relative quantitation. All together, the data confirm the accuracy and reliability of label-free spectral counting in the relative, quantitative analysis of proteins between samples.