Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
J Proteome Res ; 10(4): 2088-94, 2011 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-21222473

RESUMEN

Confident identification of peptides via tandem mass spectrometry underpins modern high-throughput proteomics. This has motivated considerable recent interest in the postprocessing of search engine results to increase confidence and calculate robust statistical measures, for example through the use of decoy databases to calculate false discovery rates (FDR). FDR-based analyses allow for multiple testing and can assign a single confidence value for both sets and individual peptide spectrum matches (PSMs). We recently developed an algorithm for combining the results from multiple search engines, integrating FDRs for sets of PSMs made by different search engine combinations. Here we describe a web-server and a downloadable application that makes this routinely available to the proteomics community. The web server offers a range of outputs including informative graphics to assess the confidence of the PSMs and any potential biases. The underlying pipeline also provides a basic protein inference step, integrating PSMs into protein ambiguity groups where peptides can be matched to more than one protein. Importantly, we have also implemented full support for the mzIdentML data standard, recently released by the Proteomics Standards Initiative, providing users with the ability to convert native formats to mzIdentML files, which are available to download.


Asunto(s)
Algoritmos , Péptidos/análisis , Motor de Búsqueda , Espectrometría de Masas en Tándem/instrumentación , Espectrometría de Masas en Tándem/métodos , Bases de Datos de Proteínas , Humanos , Internet , Proteómica/instrumentación , Proteómica/métodos , Interfaz Usuario-Computador
2.
Proteomics ; 10(6): 1127-40, 2010 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-20077415

RESUMEN

Alternative splicing (AS) and processing of pre-messenger RNAs explains the discrepancy between the number of genes and proteome complexity in multicellular eukaryotic organisms. However, relatively few alternative protein isoforms have been experimentally identified, particularly at the protein level. In this study, we assess the ability of proteomics to inform on differently spliced protein isoforms in human and four other model eukaryotes. The number of Ensembl-annotated genes for which proteomic data exists that informs on AS exceeds 33% of the alternately spliced genes in the human and worm genomes. Examining AS in chicken via proteomics for the first time, we find support for over 600 AS genes. However, although peptide identifications support only a small fraction of alternative protein isoforms that are annotated in Ensembl, many more variants are amenable to proteomic identification. There remains a sizeable gap between these existing identifications (10-52% of AS genes) and those that are theoretically feasible (90-99%). We also compare annotations between Swiss-Prot and Ensembl, recommending use of both to maximize coverage of AS. We propose that targeted proteomic experiments using selected reactions and standards are essential to uncover further alternative isoforms and discuss the issues surrounding these strategies.


Asunto(s)
Péptidos/química , Isoformas de Proteínas/genética , Proteómica/métodos , Empalme Alternativo , Secuencia de Aminoácidos , Animales , Pollos , Biología Computacional , Bases de Datos de Proteínas , Estudios de Factibilidad , Genoma Humano , Humanos , Datos de Secuencia Molecular , NADH Deshidrogenasa/genética , Proteoma/genética , Alineación de Secuencia
3.
Proteomics ; 9(15): 3928-33, 2009 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-19637238

RESUMEN

In proteomics, rapid developments in instrumentation led to the acquisition of increasingly large data sets. Correspondingly, ProDaC was founded in 2006 as a Coordination Action project within the 6th European Union Framework Programme to support data sharing and community-wide data collection. The objectives of ProDaC were the development of documentation and storage standards, setup of a standardized data submission pipeline and collection of data. Ending in March 2009, ProDaC has delivered a comprehensive toolbox of standards and computer programs to achieve these goals.


Asunto(s)
Recolección de Datos/normas , Bases de Datos de Proteínas/normas , Proteómica/normas , Sistemas de Administración de Bases de Datos/normas , Unión Europea
4.
Proteomics ; 9(5): 1220-9, 2009 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-19253293

RESUMEN

LC-MS experiments can generate large quantities of data, for which a variety of database search engines are available to make peptide and protein identifications. Decoy databases are becoming widely used to place statistical confidence in result sets, allowing the false discovery rate (FDR) to be estimated. Different search engines produce different identification sets so employing more than one search engine could result in an increased number of peptides (and proteins) being identified, if an appropriate mechanism for combining data can be defined. We have developed a search engine independent score, based on FDR, which allows peptide identifications from different search engines to be combined, called the FDR Score. The results demonstrate that the observed FDR is significantly different when analysing the set of identifications made by all three search engines, by each pair of search engines or by a single search engine. Our algorithm assigns identifications to groups according to the set of search engines that have made the identification, and re-assigns the score (combined FDR Score). The combined FDR Score can differentiate between correct and incorrect peptide identifications with high accuracy, allowing on average 35% more peptide identifications to be made at a fixed FDR than using a single search engine.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Almacenamiento y Recuperación de la Información , Péptidos/análisis , Proteómica/métodos , Bases de Datos de Proteínas , Modelos Estadísticos , Proteínas/análisis , Reproducibilidad de los Resultados , Programas Informáticos
5.
Methods Mol Biol ; 484: 319-32, 2008.
Artículo en Inglés | MEDLINE | ID: mdl-18592189

RESUMEN

Driven by advances in mass spectrometry and analytical chemistry, coupled with the expanding number of completely sequenced genomes, proteomics is becoming a widely exploited technology for characterizing the proteins found in living systems. As proteomics becomes increasingly more high-throughput there is a parallel need for storage of the large quantities of data generated, to support data exchange and allow further analyses. The capture and storage of such data, along with subsequent release and dissemination, not only aid in sharing of the data throughout the proteomics community but also provide scientific insights into the observations between different laboratories, instruments, and software. Growing numbers of resources offer a range of approaches for the capture, storage, and dissemination of proteomic experimental data reflecting the fact that proteomics has now come of age in the postgenomic era and is delivering large, complex datasets that are rich in information. This chapter demonstrates how one such resource, PepSeeker, can be used to mine useful information from proteomic data, which can then be exploited for peptide identification algorithms via a better understanding of how peptides fragment inside mass spectrometers.


Asunto(s)
Biología Computacional/métodos , Bases de Datos de Proteínas , Almacenamiento y Recuperación de la Información/métodos , Proteómica , Algoritmos , Espectrometría de Masas/métodos , Péptidos/análisis , Péptidos/genética , Interfaz Usuario-Computador
6.
Nucleic Acids Res ; 36(Web Server issue): W485-90, 2008 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-18440977

RESUMEN

Despite the growing volumes of proteomic data, integration of the underlying results remains problematic owing to differences in formats, data captured, protein accessions and services available from the individual repositories. To address this, we present the ISPIDER Central Proteomic Database search (http://www.ispider.manchester.ac.uk/cgi-bin/ProteomicSearch.pl), an integration service offering novel search capabilities over leading, mature, proteomic repositories including PRoteomics IDEntifications database (PRIDE), PepSeeker, PeptideAtlas and the Global Proteome Machine. It enables users to search for proteins and peptides that have been characterised in mass spectrometry-based proteomics experiments from different groups, stored in different databases, and view the collated results with specialist viewers/clients. In order to overcome limitations imposed by the great variability in protein accessions used by individual laboratories, the European Bioinformatics Institute's Protein Identifier Cross-Reference (PICR) service is used to resolve accessions from different sequence repositories. Custom-built clients allow users to view peptide/protein identifications in different contexts from multiple experiments and repositories, as well as integration with the Dasty2 client supporting any annotations available from Distributed Annotation System servers. Further information on the protein hits may also be added via external web services able to take a protein as input. This web server offers the first truly integrated access to proteomics repositories and provides a unique service to biologists interested in mass spectrometry-based proteomics.


Asunto(s)
Bases de Datos de Proteínas , Proteómica , Programas Informáticos , Gráficos por Computador , Internet , Espectrometría de Masas , Integración de Sistemas
7.
Proteomics ; 7(16): 2787-99, 2007 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-17640002

RESUMEN

Whilst the array of techniques available for quantitative proteomics continues to grow, the attendant bioinformatic software tools are similarly expanding in number. The data capture and analysis of such quantitative data is obviously crucial to the experiment and the methods used to process it will critically affect the quality of the data obtained. These tools must deal with a variety of issues, including identification of labelled and unlabelled peptide species, location of the corresponding MS scans in the experiment, construction of representative ion chromatograms, location of the true peptide ion chromatogram start and end, elimination of background signal in the mass spectrum and chromatogram and calculation of both peptide and protein ratios/abundances. A variety of tools and approaches are available, in part restricted by the nature of the experiment to be performed and available instrumentation. Currently, although there is no single consensus on precisely how to calculate protein and peptide abundances, many common themes have emerged which identify and reduce many of the key sources of error. These issues will be discussed, along with those relating to deposition of quantitative data. At present, mature data standards for quantitative proteomics are not yet available, although formats are beginning to emerge.


Asunto(s)
Proteómica , Estándares de Referencia , Programas Informáticos
8.
Proteome Sci ; 5: 4, 2007 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-17270041

RESUMEN

BACKGROUND: Proteomics continues to play a critical role in post-genomic science as continued advances in mass spectrometry and analytical chemistry support the separation and identification of increasing numbers of peptides and proteins from their characteristic mass spectra. In order to facilitate the sharing of this data, various standard formats have been, and continue to be, developed. Still not fully mature however, these are not yet able to cope with the increasing number of quantitative proteomic technologies that are being developed. RESULTS: We propose an extension to the PRIDE and mzData XML schema to accommodate the concept of multiple samples per experiment, and in addition, capture the intensities of the iTRAQ reporter ions in the entry. A simple Java-client has been developed to capture and convert the raw data from common spectral file formats, which also uses a third-party open source tool for the generation of iTRAQ reported intensities from Mascot output, into a valid PRIDE XML entry. CONCLUSION: We describe an extension to the PRIDE and mzData schemas to enable the capture of quantitative data. Currently this is limited to iTRAQ data but is readily extensible for other quantitative proteomic technologies. Furthermore, a software tool has been developed which enables conversion from various mass spectrum file formats and corresponding Mascot peptide identifications to PRIDE formatted XML. The tool represents a simple approach to preparing quantitative and qualitative data for submission to repositories such as PRIDE, which is necessary to facilitate data deposition and sharing in public domain database. The software is freely available from http://www.mcisb.org/software/PrideWizard.

9.
J Proteome Res ; 6(1): 399-408, 2007 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-17203985

RESUMEN

Protein identification via peptide mass fingerprinting (PMF) remains a key component of high-throughput proteomics experiments in post-genomic science. Candidate protein identifications are made using bioinformatic tools from peptide peak lists obtained via mass spectrometry (MS). These algorithms rely on several search parameters, including the number of potential uncut peptide bonds matching the primary specificity of the hydrolytic enzyme used in the experiment. Typically, up to one of these "missed cleavages" are considered by the bioinformatics search tools, usually after digestion of the in silico proteome by trypsin. Using two distinct, nonredundant datasets of peptides identified via PMF and tandem MS, a simple predictive method based on information theory is presented which is able to identify experimentally defined missed cleavages with up to 90% accuracy from amino acid sequence alone. Using this simple protocol, we are able to "mask" candidate protein databases so that confident missed cleavage sites need not be considered for in silico digestion. We show that that this leads to an improvement in database searching, with two different search engines, using the PMF dataset as a test set. In addition, the improved approach is also demonstrated on an independent PMF data set of known proteins that also has corresponding high-quality tandem MS data, validating the protein identifications. This approach has wider applicability for proteomics database searching, and the program for predicting missed cleavages and masking Fasta-formatted protein sequence databases has been made available via http:// ispider.smith.man.ac uk/MissedCleave.


Asunto(s)
Espectrometría de Masas/métodos , Proteínas/química , Proteómica/métodos , Tripsina/química , Algoritmos , Animales , Humanos , Hidrólisis , Internet , Modelos Estadísticos , Modelos Teóricos , Probabilidad , Reproducibilidad de los Resultados , Programas Informáticos
10.
Nucleic Acids Res ; 34(Database issue): D649-54, 2006 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-16381951

RESUMEN

Proteome science relies on bioinformatics tools to characterize proteins via their proteolytic peptides which are identified via characteristic mass spectra generated after their ions undergo fragmentation in the gas phase within the mass spectrometer. The resulting secondary ion mass spectra are compared with protein sequence databases in order to identify the amino acid sequence. Although these search tools (e.g. SEQUEST, Mascot, X!Tandem, Phenyx) are frequently successful, much is still not understood about the amino acid sequence patterns which promote/protect particular fragmentation pathways, and hence lead to the presence/absence of particular ions from different ion series. In order to advance this area, we have developed a database, PepSeeker (http://nwsr.smith.man.ac.uk/pepseeker), which captures this peptide identification and ion information from proteome experiments. The database currently contains >185,000 peptides and associated database search information. Users may query this resource to retrieve peptide, protein and spectral information based on protein or peptide information, including the amino acid sequence itself represented by regular expressions coupled with ion series information. We believe this database will be useful to proteome researchers wishing to understand gas phase peptide ion chemistry in order to improve peptide identification strategies. Questions can be addressed to j.selley@manchester.ac.uk.


Asunto(s)
Bases de Datos de Proteínas , Fragmentos de Péptidos/análisis , Proteoma/química , Proteómica/métodos , Internet , Espectrometría de Masas , Fragmentos de Péptidos/química , Proteoma/metabolismo , Análisis de Secuencia de Proteína , Interfaz Usuario-Computador
11.
Protein Sci ; 12(10): 2348-59, 2003 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-14500893

RESUMEN

It is well established that recognition between exposed edges of beta-sheets is an important mode of protein-protein interaction and can have pathological consequences; for instance, it has been linked to the aggregation of proteins into a fibrillar structure, which is associated with a number of predominantly neurodegenerative disorders. A number of protective mechanisms have evolved in the edge strands of beta-sheets, preventing the aggregation and insolubility of most natural beta-sheet proteins. Such mechanisms are unfavorable in the interior of a beta-sheet. The problem of distinguishing edge strands from central strands based on sequence information alone is important in predicting residues and mutations likely to be involved in aggregation, and is also a first step in predicting folding topology. Here we report support vector machine (SVM) and decision tree methods developed to classify edge strands from central strands in a representative set of protein domains. Interestingly, rules generated by the decision tree method are in close agreement with our knowledge of protein structure and are potentially useful in a number of different biological applications. When trained on strands from proteins of known structure, using structure-based (Dictionary of Secondary Structure in Proteins) strand assignments, both methods achieved mean cross-validated, prediction accuracies of approximately 78%. These accuracies were reduced when strand assignments from secondary structure prediction were used. Further investigation of this effect revealed that it could be explained by a significant reduction in the accuracy of standard secondary structure prediction methods for edge strands, in comparison with central strands.


Asunto(s)
Biología Computacional/métodos , Estructura Secundaria de Proteína , Proteínas/química , Algoritmos , Secuencia de Aminoácidos , Inteligencia Artificial , Bases de Datos de Proteínas , Árboles de Decisión , Enlace de Hidrógeno , Interacciones Hidrofóbicas e Hidrofílicas , Cómputos Matemáticos , Modelos Moleculares , Datos de Secuencia Molecular , Unión Proteica , Reproducibilidad de los Resultados , Solubilidad
12.
Protein Sci ; 11(7): 1862-6, 2002 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-12070339

RESUMEN

The association of amyloid fibril formation with a number of important diseases, and the extensive study of this process in vitro, has resulted in a large literature containing a vast amount of information about the fibril formation process. This includes mutations and experimental conditions that promote or protect against fibril formation. A database (fibril_one) was designed to hold information relating to the formation of fibrils. It was populated by extensive searches of the literature and other databases. A powerful World Wide Web query interface to the database was developed, enabling a simple and effective method to view amyloidogenic mutations associated with specific proteins. The Web interface was used to identify trends in the data. This revealed that mutations promoting fibril formation through altered folding tend to be associated with destabilization of the native fold. In particular, tendencies of mutations to disrupt the native secondary structure and packing in the hydrophobic core were discovered to be significant. Query access to the database is available freely on the World Wide Web at http://www.bioinformatics.leeds.ac.uk/group/online/fibril_one.


Asunto(s)
Péptidos beta-Amiloides/metabolismo , Bases de Datos de Proteínas , Humanos , Internet
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...