Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Bioinformatics ; 34(9): 1594-1596, 2018 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-29267848

RESUMO

Summary: Gap-filling is a necessary step to produce quality genome-scale metabolic reconstructions capable of flux-balance simulation. Most available gap-filling tools use an organism-agnostic approach, where reactions are selected from a database to fill gaps without consideration of the target organism. Conversely, our likelihood based gap-filling with probabilistic annotations selects candidate reactions based on a likelihood score derived specifically from the target organism's genome. Here, we present two new implementations of probabilistic annotation and likelihood based gap-filling: a web service called ProbAnnoWeb, and a standalone python package called ProbAnnoPy. Availability and implementation: Our tools are available as a web service with no installation needed (ProbAnnoWeb) at probannoweb.systemsbiology.net, and as a local python package implementation (ProbAnnoPy) at github.com/PriceLab/probannopy. Contact: evangelos.simeonidis@systemsbiology.org or nathan.price@systemsbiology.org. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Genoma , Funções Verossimilhança , Software
2.
J Proteome Res ; 15(11): 4091-4100, 2016 11 04.
Artigo em Inglês | MEDLINE | ID: mdl-27577934

RESUMO

The results of analysis of shotgun proteomics mass spectrometry data can be greatly affected by the selection of the reference protein sequence database against which the spectra are matched. For many species there are multiple sources from which somewhat different sequence sets can be obtained. This can lead to confusion about which database is best in which circumstances-a problem especially acute in human sample analysis. All sequence databases are genome-based, with sequences for the predicted gene and their protein translation products compiled. Our goal is to create a set of primary sequence databases that comprise the union of sequences from many of the different available sources and make the result easily available to the community. We have compiled a set of four sequence databases of varying sizes, from a small database consisting of only the ∼20,000 primary isoforms plus contaminants to a very large database that includes almost all nonredundant protein sequences from several sources. This set of tiered, increasingly complete human protein sequence databases suitable for mass spectrometry proteomics sequence database searching is called the Tiered Human Integrated Search Proteome set. In order to evaluate the utility of these databases, we have analyzed two different data sets, one from the HeLa cell line and the other from normal human liver tissue, with each of the four tiers of database complexity. The result is that approximately 0.8%, 1.1%, and 1.5% additional peptides can be identified for Tiers 2, 3, and 4, respectively, as compared with the Tier 1 database, at substantially increasing computational cost. This increase in computational cost may be worth bearing if the identification of sequence variants or the discovery of sequences that are not present in the reviewed knowledge base entries is an important goal of the study. We find that it is useful to search a data set against a simpler database, and then check the uniqueness of the discovered peptides against a more complex database. We have set up an automated system that downloads all the source databases on the first of each month and automatically generates a new set of search databases and makes them available for download at http://www.peptideatlas.org/thisp/ .


Assuntos
Bases de Dados de Proteínas/tendências , Proteômica/métodos , Biologia Computacional/métodos , Células HeLa , Humanos , Fígado/química , Fígado/citologia , Espectrometria de Massas , Isoformas de Proteínas/análise , Proteínas/análise
3.
Proteomics ; 14(21-22): 2389-99, 2014 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-25092112

RESUMO

Inferring which protein species have been detected in bottom-up proteomics experiments has been a challenging problem for which solutions have been maturing over the past decade. While many inference approaches now function well in isolation, comparing and reconciling the results generated across different tools remains difficult. It presently stands as one of the greatest barriers in collaborative efforts such as the Human Proteome Project and public repositories such as the PRoteomics IDEntifications (PRIDE) database. Here we present a framework for reporting protein identifications that seeks to improve capabilities for comparing results generated by different inference tools. This framework standardizes the terminology for describing protein identification results, associated with the HUPO-Proteomics Standards Initiative (PSI) mzIdentML standard, while still allowing for differing methodologies to reach that final state. It is proposed that developers of software for reporting identification results will adopt this terminology in their outputs. While the new terminology does not require any changes to the core mzIdentML model, it represents a significant change in practice, and, as such, the rules will be released via a new version of the mzIdentML specification (version 1.2) so that consumers of files are able to determine whether the new guidelines have been adopted by export software.


Assuntos
Espectrometria de Massas/normas , Proteínas/análise , Proteômica/normas , Software/normas , Bases de Dados de Proteínas , Humanos , Espectrometria de Massas/métodos , Proteômica/métodos
4.
J Proteome Res ; 13(1): 60-75, 2014 Jan 03.
Artigo em Inglês | MEDLINE | ID: mdl-24261998

RESUMO

The kidney, urine, and plasma proteomes are intimately related: proteins and metabolic waste products are filtered from the plasma by the kidney and excreted via the urine, while kidney proteins may be secreted into the circulation or released into the urine. Shotgun proteomics data sets derived from human kidney, urine, and plasma samples were collated and processed using a uniform software pipeline, and relative protein abundances were estimated by spectral counting. The resulting PeptideAtlas builds yielded 4005, 2491, and 3553 nonredundant proteins at 1% FDR for the kidney, urine, and plasma proteomes, respectively - for kidney and plasma, the largest high-confidence protein sets to date. The same pipeline applied to all available human data yielded a 2013 Human PeptideAtlas build containing 12,644 nonredundant proteins and at least one peptide for each of ∼14,000 Swiss-Prot entries, an increase over 2012 of ∼7.5% of the predicted human proteome. We demonstrate that abundances are correlated between plasma and urine, examine the most abundant urine proteins not derived from either plasma or kidney, and consider the biomarker potential of proteins associated with renal decline. This analysis forms part of the Biology and Disease-driven Human Proteome Project (B/D-HPP) and is a contribution to the Chromosome-centric Human Proteome Project (C-HPP) special issue.


Assuntos
Proteínas/metabolismo , Proteoma , Cromatografia Líquida , Humanos , Espectrometria de Massas em Tandem
5.
J Proteome Res ; 12(1): 162-71, 2013 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-23215161

RESUMO

The Human Proteome Project was launched in September 2010 with the goal of characterizing at least one protein product from each protein-coding gene. Here we assess how much of the proteome has been detected to date via tandem mass spectrometry by analyzing PeptideAtlas, a compendium of human derived LC-MS/MS proteomics data from many laboratories around the world. All data sets are processed with a consistent set of parameters using the Trans-Proteomic Pipeline and subjected to a 1% protein FDR filter before inclusion in PeptideAtlas. Therefore, PeptideAtlas contains only high confidence protein identifications. To increase proteome coverage, we explored new comprehensive public data sources for data likely to add new proteins to the Human PeptideAtlas. We then folded these data into a Human PeptideAtlas 2012 build and mapped it to Swiss-Prot, a protein sequence database curated to contain one entry per human protein coding gene. We find that this latest PeptideAtlas build includes at least one peptide for each of ~12500 Swiss-Prot entries, leaving ~7500 gene products yet to be confidently cataloged. We characterize these "PA-unseen" proteins in terms of tissue localization, transcript abundance, and Gene Ontology enrichment, and propose reasons for their absence from PeptideAtlas and strategies for detecting them in the future.


Assuntos
Cromossomos Humanos Par 20 , Peptídeos , Proteoma , Cromossomos Humanos Par 20/genética , Cromossomos Humanos Par 20/metabolismo , Bases de Dados de Proteínas , Expressão Gênica , Genoma Humano , Humanos , Peptídeos/genética , Peptídeos/metabolismo , Proteoma/genética , Proteoma/metabolismo , Espectrometria de Massas em Tandem
6.
Nat Methods ; 7(1): 43-6, 2010 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-19966807

RESUMO

Selected reaction monitoring (SRM) uses sensitive and specific mass spectrometric assays to measure target analytes across multiple samples, but it has not been broadly applied in proteomics owing to the tedious assay development process for each protein. We describe a method based on crude synthetic peptide libraries for the high-throughput development of SRM assays. We illustrate the power of the approach by generating and applying validated SRM assays for all Saccharomyces cerevisiae kinases and phosphatases.


Assuntos
Bioensaio/métodos , Ensaios de Triagem em Larga Escala/métodos , Espectrometria de Massas/métodos , Biblioteca de Peptídeos , Proteínas/análise , Proteoma/análise , Bases de Dados de Proteínas , Monoéster Fosfórico Hidrolases/metabolismo , Proteínas Quinases/metabolismo , Reprodutibilidade dos Testes , Saccharomyces cerevisiae/enzimologia
7.
Mol Cell Proteomics ; 10(9): M110.006353, 2011 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-21632744

RESUMO

Human blood plasma can be obtained relatively noninvasively and contains proteins from most, if not all, tissues of the body. Therefore, an extensive, quantitative catalog of plasma proteins is an important starting point for the discovery of disease biomarkers. In 2005, we showed that different proteomics measurements using different sample preparation and analysis techniques identify significantly different sets of proteins, and that a comprehensive plasma proteome can be compiled only by combining data from many different experiments. Applying advanced computational methods developed for the analysis and integration of very large and diverse data sets generated by tandem MS measurements of tryptic peptides, we have now compiled a high-confidence human plasma proteome reference set with well over twice the identified proteins of previous high-confidence sets. It includes a hierarchy of protein identifications at different levels of redundancy following a clearly defined scheme, which we propose as a standard that can be applied to any proteomics data set to facilitate cross-proteome analyses. Further, to aid in development of blood-based diagnostics using techniques such as selected reaction monitoring, we provide a rough estimate of protein concentrations using spectral counting. We identified 20,433 distinct peptides, from which we inferred a highly nonredundant set of 1929 protein sequences at a false discovery rate of 1%. We have made this resource available via PeptideAtlas, a large, multiorganism, publicly accessible compendium of peptides identified in tandem MS experiments conducted by laboratories around the world.


Assuntos
Biomarcadores/sangue , Proteínas Sanguíneas , Peptídeos , Plasma/química , Proteoma/análise , Proteômica/métodos , Algoritmos , Proteínas Sanguíneas/análise , Proteínas Sanguíneas/química , Proteínas Sanguíneas/normas , Cromatografia Líquida , Bases de Dados de Proteínas , Humanos , Espectrometria de Massas , Peptídeos/sangue , Peptídeos/química , Peptídeos/normas , Proteoma/química , Padrões de Referência , Software , Tripsina/metabolismo
8.
Proteomics ; 12(18): 2895-9, 2012 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-22837157

RESUMO

Proteome information resources of farm animals are lagging behind those of the classical model organisms despite their important biological and economic relevance. Here, we present a Bovine PeptideAtlas, representing a first collection of Bos taurus proteome data sets within the PeptideAtlas framework. This database was built primarily as a source of information for designing selected reaction monitoring assays for studying milk production and mammary gland health, but it has an intrinsic general value for the farm animal research community. The Bovine PeptideAtlas comprises 1921 proteins at 1.2% false discovery rate (FDR) and 8559 distinct peptides at 0.29% FDR identified in 107 samples from six tissues. The PeptideAtlas web interface has a rich set of visualization and data exploration tools, enabling users to interactively mine information about individual proteins and peptides, their prototypic features, genome mappings, and supporting spectral evidence.


Assuntos
Glândulas Mamárias Animais/química , Leite/química , Proteoma/química , Sequência de Aminoácidos , Animais , Bovinos , Bases de Dados de Proteínas , Feminino , Haptoglobinas/química , Dados de Sequência Molecular , Proteômica
9.
Proteomics ; 12(8): 1170-5, 2012 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-22318887

RESUMO

Public repositories for proteomics data have accelerated proteomics research by enabling more efficient cross-analyses of datasets, supporting the creation of protein and peptide compendia of experimental results, supporting the development and testing of new software tools, and facilitating the manuscript review process. The repositories available to date have been designed to accommodate either shotgun experiments or generic proteomic data files. Here, we describe a new kind of proteomic data repository for the collection and representation of data from selected reaction monitoring (SRM) measurements. The PeptideAtlas SRM Experiment Library (PASSEL) allows researchers to easily submit proteomic data sets generated by SRM. The raw data are automatically processed in a uniform manner and the results are stored in a database, where they may be downloaded or browsed via a web interface that includes a chromatogram viewer. PASSELenables cross-analysis of SRMdata, supports optimization of SRMdata collection, and facilitates the review process of SRMdata. Further, PASSELwill help in the assessment of proteotypic peptide performance in a wide array of samples containing the same peptide, as well as across multiple experimental protocols.


Assuntos
Cromatografia Líquida/métodos , Bases de Dados de Proteínas/normas , Peptídeos/análise , Proteômica/métodos , Software , Espectrometria de Massas em Tandem/métodos , Algoritmos , Processamento Eletrônico de Dados , Humanos , Internet , Biblioteca de Peptídeos , Proteômica/normas , Espectrometria de Massas em Tandem/normas
10.
Proteomics ; 10(6): 1190-5, 2010 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-20082347

RESUMO

Electron transfer dissociation (ETD) is an alternative fragmentation technique to CID that has recently become commercially available. ETD has several advantages over CID. It is less prone to fragmenting amino acid side chains, especially those that are modified, thus yielding fragment ion spectra with more uniform peak intensities. Further, precursor ions of longer peptides and higher charge states can be fragmented and identified. However, analysis of ETD spectra has a few important differences that require the optimization of the software packages used for the analysis of CID data or the development of specialized tools. We have adapted the Trans-Proteomic Pipeline to process ETD data. Specifically, we have added support for fragment ion spectra from high-charge precursors, compatibility with charge-state estimation algorithms, provisions for the use of the Lys-C protease, capabilities for ETD spectrum library building, and updates to the data formats to differentiate CID and ETD spectra. We show the results of processing data sets from several different types of ETD instruments and demonstrate that application of the ETD-enhanced Trans-Proteomic Pipeline can increase the number of spectrum identifications at a fixed false discovery rate by as much as 100% over native output from a single sequence search engine.


Assuntos
Biologia Computacional/métodos , Peptídeos/análise , Proteômica/métodos , Software , Espectrometria de Massas em Tandem/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA