RESUMEN
A FASTA file archive and reference resource has been added to ProteomeCommons.org. Motivation for this new functionality derives from two primary sources. The first is the recent FASTA standardization work done by the Human Proteome Organization's Proteomics Standards Initiative (HUPO-PSI). Second is the general lack of a uniform mechanism to properly cite FASTA files used in a study, and to publicly access such FASTA files post-publication. An extension to the Tranche data sharing network has been developed that includes web-pages, documentation, and tools for facilitating the use of FASTA files. These include conversion to the new HUPO-PSI format, and provisions for both citing and publicly archiving FASTA files. This new resource is available immediately, free of charge, and can be accessed at http://www.proteomecommons.org/data/fasta/. Source-code for related tools is also freely available under the BSD license.
Asunto(s)
Biología Computacional/métodos , Proteómica/métodos , Sistemas de Administración de Bases de Datos , Bases de Datos Factuales , Bases de Datos de Proteínas , Humanos , Almacenamiento y Recuperación de la Información , Internet , Mapeo Peptídico , Péptidos/química , Proteoma , Interfaz Usuario-ComputadorRESUMEN
A current focus of proteomics research is the establishment of acceptable confidence measures in the assignment of protein identifications in an unknown sample. Development of new algorithmic approaches would greatly benefit from a standard reference set of spectra for known proteins for the purpose of testing and training. Here we describe an openly available library of mass spectra generated on an ABI 4700 MALDI TOF/TOF from 246 known, individually purified and trypsin-digested protein samples. The initial full release of the Aurum Dataset includes gel images, peak lists, spectra, search result files, decoy database analysis files, FASTA file of protein sequences, manual curation, and summary pages describing protein coverage and peptides matched by MS/MS followed by decoy database analysis using Mascot, Sequest, and X!Tandem. The data are publicly available for use at ProteomeCommons.org.
Asunto(s)
Mapeo Peptídico/métodos , Proteómica , Proteínas Recombinantes/química , Espectrometría de Masa por Láser de Matriz Asistida de Ionización Desorción/métodos , Espectrometría de Masas en Tándem/métodos , Secuencia de Aminoácidos , Bases de Datos Factuales , Humanos , Datos de Secuencia Molecular , Biblioteca de Péptidos , Estándares de ReferenciaRESUMEN
We have recently announced the largest database of protein-ligand complexes, Binding MOAD (Mother of All Databases). After the August 2004 update, Binding MOAD contains 6816 complexes. There are 2220 protein families and 3316 unique ligands. After searching 6000+ crystallography papers, we have obtained binding data for 1793 (27%) of the complexes. We have also created a non-redundant set of complexes with only one complex from each protein family; in that set, 630 (28%) of the unique complexes have binding data. Here, we present information about the data provided at the Binding MOAD website. We also present the results of mining Binding MOAD to map the degree of solvent exposure for binding sites. We have determined that most cavities and ligands (70-85%) are well buried in the complexes. This fits with the common paradigm that a large degree of contact between the ligand and protein is significant in molecular recognition. GoCAV and the GoCAV viewer are the tools we created for this study. To share our data and make our online dataset more useful to other research groups, we have integrated the viewer into the Binding MOAD website (www.BindingMOAD.org).
Asunto(s)
Bases de Datos de Proteínas , Proteínas/química , Proteínas/metabolismo , Interfaz Usuario-Computador , Sitios de Unión , Cristalografía por Rayos X , Humanos , Internet , Ligandos , Modelos Químicos , Modelos Moleculares , Unión Proteica , Solventes/químicaRESUMEN
Multiple reaction monitoring (MRM) is a highly sensitive method of targeted mass spectrometry (MS) that can be used to selectively detect and quantify peptides based on the screening of specified precursor peptide-to-fragment ion transitions. MRM-MS sensitivity depends critically on the tuning of instrument parameters, such as collision energy and cone voltage, for the generation of maximal product ion signal. Although generalized equations and values exist for such instrument parameters, there is no clear indication that optimal signal can be reliably produced for all types of MRM transitions using such an algorithmic approach. To address this issue, we have devised a workflow functional on both Waters Quattro Premier and ABI 4000 QTRAP triple quadrupole instruments that allows rapid determination of the optimal value of any programmable instrument parameter for each MRM transition. Here, we demonstrate the strategy for the optimizations of collision energy and cone voltage, but the method could be applied to other instrument parameters, such as declustering potential, as well. The workflow makes use of the incremental adjustment of the precursor and product m/z values at the hundredth decimal place to create a series of MRM targets at different collision energies that can be cycled through in rapid succession within a single run, avoiding any run-to-run variability in execution or comparison. Results are easily visualized and quantified using the MRM software package Mr. M to determine the optimal instrument parameters for each transition.
Asunto(s)
Espectrometría de Masas/métodos , Proteómica/métodos , Algoritmos , Secuencia de Aminoácidos , Área Bajo la Curva , Biomarcadores/química , Biología Computacional/métodos , Proteínas Fúngicas/química , Haemophilus influenzae/metabolismo , Iones , Datos de Secuencia Molecular , Péptidos/química , Proteoma , Programas InformáticosRESUMEN
Unidentified tandem mass spectra typically represent 50-90% of the spectra acquired in proteomics studies. This manuscript describes a novel algorithm, "Bonanza", for clustering spectra without knowledge of peptide or protein identifications. Further analysis leverages existing peptide identifications to infer related, likely valid identifications. Significantly more spectra can be identified with this approach, including spectra with unexpected potential modifications or amino-acid substitutions.