Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 31
Filtrar
Más filtros

Bases de datos
Tipo del documento
Intervalo de año de publicación
1.
J Proteome Res ; 15(2): 677-8, 2016 Feb 05.
Artículo en Inglés | MEDLINE | ID: mdl-26680273

RESUMEN

Selenocysteine is a naturally occurring proteogenic amino acid that is encoded in the genomic sequence of relatively abundant proteins in many of the model species commonly used for biomedical research. On the basis of an analysis of publicly available proteomics information, it was discovered that peptides containing selenocysteine were not being identified in tandem mass spectrometry proteomics data. Once the chemical basis for this exclusion was understood, a simple alteration in search parameters led to the confident identification of selenocysteine containing peptides from existing proteomics data, with no change in experimental protocols required.


Asunto(s)
Péptidos/metabolismo , Proteómica/métodos , Selenocisteína/metabolismo , Selenoproteínas/metabolismo , Secuencia de Aminoácidos , Animales , Humanos , Datos de Secuencia Molecular , Péptidos/genética , Selenocisteína/genética , Selenoproteínas/química , Selenoproteínas/genética , Espectrometría de Masas en Tándem
2.
J Proteome Res ; 15(3): 983-90, 2016 Mar 04.
Artículo en Inglés | MEDLINE | ID: mdl-26842767

RESUMEN

Large scale proteomics have made it possible to broadly screen samples for the presence of many types of post-translational modifications, such as phosphorylation, acetylation, and ubiquitination. This type of data has allowed the localization of these modifications to either a specific site on a proteolytically generated peptide or to within a small domain on the peptide. The resulting modification acceptor sites can then be mapped onto the appropriate protein sequences and the information archived. This paper describes the usage of a very large archive of experimental observations of human post-translational modifications to create a map of the most reproducible modification observations onto the complete set of human protein sequences. This set of modification acceptor sites was then directly translated into the genomic coordinates for the codons for the residues at those sites. We constructed the database g2pDB using this protein-to-codon site mapping information. The information in g2pDB has been made available through a RESTful-style API, allowing researchers to determine which specific protein modifications would be perturbed by a set of observed nucleotide variants determined by high throughput DNA or RNA sequencing.


Asunto(s)
Bases de Datos de Proteínas , Procesamiento Proteico-Postraduccional , Acetilación , Secuencia de Aminoácidos , Humanos , Anotación de Secuencia Molecular , Mapeo Peptídico , Fosforilación , Proteómica , Programas Informáticos
3.
J Proteome Res ; 15(2): 411-21, 2016 Feb 05.
Artículo en Inglés | MEDLINE | ID: mdl-26718741

RESUMEN

The honey bee is a key pollinator in agricultural operations as well as a model organism for studying the genetics and evolution of social behavior. The Apis mellifera genome has been sequenced and annotated twice over, enabling proteomics and functional genomics methods for probing relevant aspects of their biology. One troubling trend that emerged from proteomic analyses is that honey bee peptide samples consistently result in lower peptide identification rates compared with other organisms. This suggests that the genome annotation can be improved, or atypical biological processes are interfering with the mass spectrometry workflow. First, we tested whether high levels of polymorphisms could explain some of the missed identifications by searching spectra against the reference proteome (OGSv3.2) versus a customized proteome of a single honey bee, but our results indicate that this contribution was minor. Likewise, error-tolerant peptide searches lead us to eliminate unexpected post-translational modifications as a major factor in missed identifications. We then used a proteogenomic approach with ~1500 raw files to search for missing genes and new exons, to revive discarded annotations and to identify over 2000 new coding regions. These results will contribute to a more comprehensive genome annotation and facilitate continued research on this important insect.


Asunto(s)
Abejas/genética , Genoma de los Insectos/genética , Genómica/métodos , Anotación de Secuencia Molecular/métodos , Animales , Abejas/metabolismo , Proteínas de Insectos/genética , Proteínas de Insectos/metabolismo , Espectrometría de Masas/métodos , Polimorfismo de Nucleótido Simple , Procesamiento Proteico-Postraduccional , Proteolisis , Proteoma/genética , Proteoma/metabolismo , Proteómica/métodos
4.
J Proteome Res ; 15(11): 3951-3960, 2016 11 04.
Artículo en Inglés | MEDLINE | ID: mdl-27487407

RESUMEN

The HUPO Human Proteome Project (HPP) has two overall goals: (1) stepwise completion of the protein parts list-the draft human proteome including confidently identifying and characterizing at least one protein product from each protein-coding gene, with increasing emphasis on sequence variants, post-translational modifications (PTMs), and splice isoforms of those proteins; and (2) making proteomics an integrated counterpart to genomics throughout the biomedical and life sciences community. PeptideAtlas and GPMDB reanalyze all major human mass spectrometry data sets available through ProteomeXchange with standardized protocols and stringent quality filters; neXtProt curates and integrates mass spectrometry and other findings to present the most up to date authorative compendium of the human proteome. The HPP Guidelines for Mass Spectrometry Data Interpretation version 2.1 were applied to manuscripts submitted for this 2016 C-HPP-led special issue [ www.thehpp.org/guidelines ]. The Human Proteome presented as neXtProt version 2016-02 has 16,518 confident protein identifications (Protein Existence [PE] Level 1), up from 13,664 at 2012-12, 15,646 at 2013-09, and 16,491 at 2014-10. There are 485 proteins that would have been PE1 under the Guidelines v1.0 from 2012 but now have insufficient evidence due to the agreed-upon more stringent Guidelines v2.0 to reduce false positives. neXtProt and PeptideAtlas now both require two non-nested, uniquely mapping (proteotypic) peptides of at least 9 aa in length. There are 2,949 missing proteins (PE2+3+4) as the baseline for submissions for this fourth annual C-HPP special issue of Journal of Proteome Research. PeptideAtlas has 14,629 canonical (plus 1187 uncertain and 1755 redundant) entries. GPMDB has 16,190 EC4 entries, and the Human Protein Atlas has 10,475 entries with supportive evidence. neXtProt, PeptideAtlas, and GPMDB are rich resources of information about post-translational modifications (PTMs), single amino acid variants (SAAVSs), and splice isoforms. Meanwhile, the Biology- and Disease-driven (B/D)-HPP has created comprehensive SRM resources, generated popular protein lists to guide targeted proteomics assays for specific diseases, and launched an Early Career Researchers initiative.


Asunto(s)
Guías como Asunto/normas , Procesamiento Proteico-Postraduccional , Proteoma/análisis , Bases de Datos de Proteínas , Susceptibilidad a Enfermedades , Humanos , Espectrometría de Masas , Polimorfismo de Nucleótido Simple , Isoformas de Proteínas , Proteómica/métodos
5.
J Proteome Res ; 15(2): 339-59, 2016 Feb 05.
Artículo en Inglés | MEDLINE | ID: mdl-26680015

RESUMEN

Claudins are the major transmembrane protein components of tight junctions in human endothelia and epithelia. Tissue-specific expression of claudin members suggests that this protein family is not only essential for sustaining the role of tight junctions in cell permeability control but also vital in organizing cell contact signaling by protein-protein interactions. How this protein family is collectively processed and regulated is key to understanding the role of junctional proteins in preserving cell identity and tissue integrity. The focus of this review is to first provide a brief overview of the functional context, on the basis of the extensive body of claudin biology research that has been thoroughly reviewed, for endogenous human claudin members and then ascertain existing and future proteomics techniques that may be applicable to systematically characterizing the chemical forms and interacting protein partners of this protein family in human. The ability to elucidate claudin-based signaling networks may provide new insight into cell development and differentiation programs that are crucial to tissue stability and manipulation.


Asunto(s)
Claudinas/metabolismo , Proteómica/métodos , Transducción de Señal , Uniones Estrechas/metabolismo , Claudinas/genética , Endotelio/metabolismo , Epitelio/metabolismo , Glicosilación , Humanos , Familia de Multigenes , Mapas de Interacción de Proteínas
6.
Anal Chem ; 88(5): 2847-55, 2016 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-26849966

RESUMEN

The growing complexity of proteomics samples and the desire for deeper analysis drive the development of both better MS instrument and advanced multidimensional separation schemes. We applied 1D, 2D, and 3D LC-MS/MS separation protocols (all of reversed-phase C18 functionality) to a tryptic digest of whole Jurkat cell lysate to estimate the depth of proteome coverage and to collect high-quality peptide retention information. We varied pH of the eluent and hydrophobicity of ion-pairing modifier to achieve good separation orthogonality (utilization of MS instrument time). All separation modes employed identical LC settings with formic-acid-based eluents in the last dimension. The 2D protocol used a high pH-low pH scheme with 21 concatenated fractions. In the 3D protocol, six concatenated fractions from the first dimension (C18, heptafluorobutyric acid) were analyzed using the identical 2D LC-MS procedure. This approach permitted a detailed evaluation of the analysis output consuming 21× and 126× the analysis time and sample load compared to 1D. Acquisition over 189 h of instrument time in 3D mode resulted in the identification of ∼14 000 proteins and ∼250 000 unique peptides. We estimated the dynamic range via peak intensity at the MS(2) level as approximately 10(4.2), 10(5.6), and 10(6.2) for the 1D, 2D, and 3D protocols, respectively. The uniform distribution of the number of acquired MS/MS, protein, and peptide identifications across all 126 fractions and through the chromatographic time scale in the last LC-MS stage indicates good separation orthogonality. The protocol is scalable and is amenable to the use of peptide retention prediction in all dimensions. All these features make it a very good candidate for large-scale bottom-up proteomic runs, which target both protein identification as well as the collection of peptide retention data sets for targeted quantitative applications.


Asunto(s)
Cromatografía Líquida de Alta Presión/métodos , Espectrometría de Masas/métodos , Péptidos/química , Proteómica
7.
Bioinformatics ; 31(12): 2056-8, 2015 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-25697819

RESUMEN

UNLABELLED: The Global Proteome Machine and Database (GPMDB) representational state transfer (REST) service was designed to provide simplified access to the proteomics information in GPMDB using a stable set of methods and parameters. Version 1 of this interface gives access to 25 methods for retrieving experimental information about protein post-translational modifications, amino acid variants, alternate splicing variants and protein cleavage patterns. AVAILABILITY AND IMPLEMENTATION: GPMDB data and database tables are freely available for commercial and non-commercial use. All software is also freely available, under the Artistic License. http://rest.thegpm.org/1 (GPMDB REST Service), http://wiki.thegpm.org/wiki/GPMDB_REST (Service description and help), and http://www.thegpm.org (GPM main project description and documentation). The code for the interface and an example REST client is available at ftp://ftp.thegpm.org/repos/gpmdb_rest


Asunto(s)
Algoritmos , Biología Computacional/métodos , Bases de Datos Factuales , Almacenamiento y Recuperación de la Información/métodos , Procesamiento Proteico-Postraduccional , Proteoma/análisis , Programas Informáticos , Humanos , Proteómica/métodos
8.
J Proteome Res ; 14(9): 3452-60, 2015 Sep 04.
Artículo en Inglés | MEDLINE | ID: mdl-26155816

RESUMEN

Remarkable progress continues on the annotation of the proteins identified in the Human Proteome and on finding credible proteomic evidence for the expression of "missing proteins". Missing proteins are those with no previous protein-level evidence or insufficient evidence to make a confident identification upon reanalysis in PeptideAtlas and curation in neXtProt. Enhanced with several major new data sets published in 2014, the human proteome presented as neXtProt, version 2014-09-19, has 16,491 unique confident proteins (PE level 1), up from 13,664 at 2012-12 and 15,646 at 2013-09. That leaves 2948 missing proteins from genes classified having protein existence level PE 2, 3, or 4, as well as 616 dubious proteins at PE 5. Here, we document the progress of the HPP and discuss the importance of assessing the quality of evidence, confirming automated findings and considering alternative protein matches for spectra and peptides. We provide guidelines for proteomics investigators to apply in reporting newly identified proteins.


Asunto(s)
Guías como Asunto , Proteínas/química , Proteoma , Humanos
9.
J Proteome Res ; 14(12): 4995-5006, 2015 Dec 04.
Artículo en Inglés | MEDLINE | ID: mdl-26435392

RESUMEN

V-erb-b2 erythroblastic leukemia viral oncogene homologue 2, known as ERBB2, is an important oncogene in the development of certain cancers. It can form a heterodimer with other epidermal growth factor receptor family members and activate kinase-mediated downstream signaling pathways. ERBB2 gene is located on chromosome 17 and is amplified in a subset of cancers, such as breast, gastric, and colon cancer. Of particular interest to the Chromosome-Centric Human Proteome Project (C-HPP) initiative is the amplification mechanism that typically results in overexpression of a set of genes adjacent to ERBB2, which provides evidence of a linkage between gene location and expression. In this report we studied patient samples from ERBB2-positive together with adjacent control nontumor tissues. In addition, non-ERBB2-expressing patient samples were selected as comparison to study the effect of expression of this oncogene. We detected 196 proteins in ERBB2-positive patient tumor samples that had minimal overlap (29 proteins) with the non-ERBB2 tumor samples. Interaction and pathway analysis identified extracellular signal regulated kinase (ERK) cascade and actin polymerization and actinmyosin assembly contraction as pathways of importance in ERBB2+ and ERBB2- gastric cancer samples, respectively. The raw data files are deposited at ProteomeXchange (identifier: PXD002674) as well as GPMDB.


Asunto(s)
Receptor ErbB-2/metabolismo , Neoplasias Gástricas/genética , Neoplasias Gástricas/metabolismo , Estudios de Casos y Controles , Línea Celular Tumoral , Perfilación de la Expresión Génica , Humanos , Inmunohistoquímica , Hibridación Fluorescente in Situ
10.
J Proteome Res ; 13(1): 15-20, 2014 Jan 03.
Artículo en Inglés | MEDLINE | ID: mdl-24364385

RESUMEN

One year ago the Human Proteome Project (HPP) leadership designated the baseline metrics for the Human Proteome Project to be based on neXtProt with a total of 13,664 proteins validated at protein evidence level 1 (PE1) by mass spectrometry, antibody-capture, Edman sequencing, or 3D structures. Corresponding chromosome-specific data were provided from PeptideAtlas, GPMdb, and Human Protein Atlas. This year, the neXtProt total is 15,646 and the other resources, which are inputs to neXtProt, have high-quality identifications and additional annotations for 14,012 in PeptideAtlas, 14,869 in GPMdb, and 10,976 in HPA. We propose to remove 638 genes from the denominator that are "uncertain" or "dubious" in Ensembl, UniProt/SwissProt, and neXtProt. That leaves 3844 "missing proteins", currently having no or inadequate documentation, to be found from a new denominator of 19,490 protein-coding genes. We present those tabulations and web links and discuss current strategies to find the missing proteins.


Asunto(s)
Proteoma , Cromosomas Humanos , Humanos , Espectrometría de Masas
11.
J Proteome Res ; 12(6): 2805-17, 2013 Jun 07.
Artículo en Inglés | MEDLINE | ID: mdl-23647160

RESUMEN

In this study we selected three breast cancer cell lines (SKBR3, SUM149 and SUM190) with different oncogene expression levels involved in ERBB2 and EGFR signaling pathways as a model system for the evaluation of selective integration of subsets of transcriptomic and proteomic data. We assessed the oncogene status with reads per kilobase per million mapped reads (RPKM) values for ERBB2 (14.4, 400, and 300 for SUM149, SUM190, and SKBR3, respectively) and for EGFR (60.1, not detected, and 1.4 for the same 3 cell lines). We then used RNA-Seq data to identify those oncogenes with significant transcript levels in these cell lines (total 31) and interrogated the corresponding proteomics data sets for proteins with significant interaction values with these oncogenes. The number of observed interactors for each oncogene showed a significant range, e.g., 4.2% (JAK1) to 27.3% (MYC). The percentage is measured as a fraction of the total protein interactions in a given data set vs total interactors for that oncogene in STRING (Search Tool for the Retrieval of Interacting Genes/Proteins, version 9.0) and I2D (Interologous Interaction Database, version 1.95). This approach allowed us to focus on 4 main oncogenes, ERBB2, EGFR, MYC, and GRB2, for pathway analysis. We used bioinformatics sites GeneGo, PathwayCommons and NCI receptor signaling networks to identify pathways that contained the four main oncogenes and had good coverage in the transcriptomic and proteomic data sets as well as a significant number of oncogene interactors. The four pathways identified were ERBB signaling, EGFR1 signaling, integrin outside-in signaling, and validated targets of C-MYC transcriptional activation. The greater dynamic range of the RNA-Seq values allowed the use of transcript ratios to correlate observed protein values with the relative levels of the ERBB2 and EGFR transcripts in each of the four pathways. This provided us with potential proteomic signatures for the SUM149 and 190 cell lines, growth factor receptor-bound protein 7 (GRB7), Crk-like protein (CRKL) and Catenin delta-1 (CTNND1) for ERBB signaling; caveolin 1 (CAV1), plectin (PLEC) for EGFR signaling; filamin A (FLNA) and actinin alpha1 (ACTN1) (associated with high levels of EGFR transcript) for integrin signalings; branched chain amino-acid transaminase 1 (BCAT1), carbamoyl-phosphate synthetase (CAD), nucleolin (NCL) (high levels of EGFR transcript); transferrin receptor (TFRC), metadherin (MTDH) (high levels of ERBB2 transcript) for MYC signaling; S100-A2 protein (S100A2), caveolin 1 (CAV1), Serpin B5 (SERPINB5), stratifin (SFN), PYD and CARD domain containing (PYCARD), and EPH receptor A2 (EPHA2) for PI3K signaling, p53 subpathway. Future studies of inflammatory breast cancer (IBC), from which the cell lines were derived, will be used to explore the significance of these observations.


Asunto(s)
Neoplasias de la Mama/genética , Receptores ErbB/genética , Regulación Neoplásica de la Expresión Génica , Proteínas de Neoplasias/genética , ARN Mensajero/genética , Receptor ErbB-2/genética , Neoplasias de la Mama/metabolismo , Neoplasias de la Mama/patología , Línea Celular Tumoral , Receptores ErbB/metabolismo , Femenino , Perfilación de la Expresión Génica , Estudio de Asociación del Genoma Completo , Humanos , Inflamación , Anotación de Secuencia Molecular , Proteínas de Neoplasias/metabolismo , Proteómica , ARN Mensajero/metabolismo , Receptor ErbB-2/metabolismo , Transducción de Señal
12.
J Proteome Res ; 12(1): 45-57, 2013 Jan 04.
Artículo en Inglés | MEDLINE | ID: mdl-23259914

RESUMEN

We report progress assembling the parts list for chromosome 17 and illustrate the various processes that we have developed to integrate available data from diverse genomic and proteomic knowledge bases. As primary resources, we have used GPMDB, neXtProt, PeptideAtlas, Human Protein Atlas (HPA), and GeneCards. All sites share the common resource of Ensembl for the genome modeling information. We have defined the chromosome 17 parts list with the following information: 1169 protein-coding genes, the numbers of proteins confidently identified by various experimental approaches as documented in GPMDB, neXtProt, PeptideAtlas, and HPA, examples of typical data sets obtained by RNASeq and proteomic studies of epithelial derived tumor cell lines (disease proteome) and a normal proteome (peripheral mononuclear cells), reported evidence of post-translational modifications, and examples of alternative splice variants (ASVs). We have constructed a list of the 59 "missing" proteins as well as 201 proteins that have inconclusive mass spectrometric (MS) identifications. In this report we have defined a process to establish a baseline for the incorporation of new evidence on protein identification and characterization as well as related information from transcriptome analyses. This initial list of "missing" proteins that will guide the selection of appropriate samples for discovery studies as well as antibody reagents. Also we have illustrated the significant diversity of protein variants (including post-translational modifications, PTMs) using regions on chromosome 17 that contain important oncogenes. We emphasize the need for mandated deposition of proteomics data in public databases, the further development of improved PTM, ASV, and single nucleotide variant (SNV) databases, and the construction of Web sites that can integrate and regularly update such information. In addition, we describe the distribution of both clustered and scattered sets of protein families on the chromosome. Since chromosome 17 is rich in cancer-associated genes, we have focused the clustering of cancer-associated genes in such genomic regions and have used the ERBB2 amplicon as an example of the value of a proteogenomic approach in which one integrates transcriptomic with proteomic information and captures evidence of coexpression through coordinated regulation.


Asunto(s)
Cromosomas Humanos Par 17 , Genoma Humano , Proteínas , Proteómica , Secuencia de Aminoácidos , Cromosomas Humanos Par 17/genética , Cromosomas Humanos Par 17/metabolismo , Bases de Datos de Proteínas , Expresión Génica , Proyecto Genoma Humano , Humanos , Proteínas/clasificación , Proteínas/genética , Proteínas/metabolismo
13.
Proc Natl Acad Sci U S A ; 106(33): 13785-90, 2009 Aug 18.
Artículo en Inglés | MEDLINE | ID: mdl-19666589

RESUMEN

Acetylation is a well-studied posttranslational modification that has been associated with a broad spectrum of biological processes, notably gene regulation. Many studies have contributed to our knowledge of the enzymology underlying acetylation, including efforts to understand the molecular mechanism of substrate recognition by several acetyltransferases, but traditional experiments to determine intrinsic features of substrate site specificity have proven challenging. Here, we combine experimental methods with clustering analysis of protein sequences to predict protein acetylation based on the sequence characteristics of acetylated lysines within histones with our unique prediction tool PredMod. We define a local amino acid sequence composition that represents potential acetylation sites by implementing a clustering analysis of histone and nonhistone sequences. We show that this sequence composition has predictive power on 2 independent experimental datasets of acetylation marks. Finally, we detect acetylation for selected putative substrates using mass spectrometry, and report several nonhistone acetylated substrates in budding yeast. Our approach, combined with more traditional experimental methods, may be useful for identifying acetylated substrates proteome-wide.


Asunto(s)
Acetilación , Proteómica/métodos , Saccharomyces cerevisiae/genética , Algoritmos , Secuencias de Aminoácidos , Secuencia de Aminoácidos , Análisis por Conglomerados , Biología Computacional/métodos , Histonas/química , Humanos , Espectrometría de Masas/métodos , Datos de Secuencia Molecular , Proteínas/química , Proteoma , Curva ROC , Saccharomyces cerevisiae/fisiología
14.
J Proteome Res ; 10(2): 656-68, 2011 Feb 04.
Artículo en Inglés | MEDLINE | ID: mdl-21067242

RESUMEN

Experiments to probe for protein-protein interactions are the focus of functional proteomic studies, thus proteomic data repositories are increasingly likely to contain a large cross-section of such information. Here, we use the Global Proteome Machine database (GPMDB), which is the largest curated and publicly available proteomic data repository derived from tandem mass spectrometry, to develop an in silico protein interaction analysis tool. Using a human histone protein for method development, we positively identified an interaction partner from each histone protein family that forms the histone octameric complex. Moreover, this method, applied to the α subunits of the human proteasome, identified all of the subunits in the 20S core particle. Furthermore, we applied this approach to human integrin αIIb and integrin ß3, a major receptor involved in the activation of platelets. We identified 28 proteins, including a protein network for integrin and platelet activation. In addition, proteins interacting with integrin ß1 obtained using this method were validated by comparing them to those identified in a formaldehyde-supported coimmunoprecipitation experiment, protein-protein interaction databases and the literature. Our results demonstrate that in silico protein interaction analysis is a novel tool for identifying known/candidate protein-protein interactions and proteins with shared functions in a protein network.


Asunto(s)
Bases de Datos de Proteínas , Mapeo de Interacción de Proteínas/métodos , Proteoma/análisis , Proteómica/métodos , Simulación por Computador , Humanos , Proteoma/metabolismo , Reproducibilidad de los Resultados , Espectrometría de Masas en Tándem
15.
Bioinformatics ; 22(22): 2830-2, 2006 Nov 15.
Artículo en Inglés | MEDLINE | ID: mdl-16877754

RESUMEN

MOTIVATION: Tandem mass spectrometry (MS/MS) identifies protein sequences using database search engines, at the core of which is a score that measures the similarity between peptide MS/MS spectra and a protein sequence database. The TANDEM application was developed as a freely available database search engine for the proteomics research community. To extend TANDEM as a platform for further research on developing improved database scoring methods, we modified the software to allow users to redefine the scoring function and replace the native TANDEM scoring function while leaving the remaining core application intact. Redefinition is performed at run time so multiple scoring functions are available to be selected and applied from a single search engine binary. We introduce the implementation of the pluggable scoring algorithm and also provide implementations of two TANDEM compatible scoring functions, one previously described scoring function compatible with PeptideProphet and one very simple scoring function that quantitative researchers may use to begin their development. This extension builds on the open-source TANDEM project and will facilitate research into and dissemination of novel algorithms for matching MS/MS spectra to peptide sequences. The pluggable scoring schema is also compatible with related search applications P3 and Hunter, which are part of the X! suite of database matching algorithms. The pluggable scores and the X! suite of applications are all written in C++. AVAILABILITY: Source code for the scoring functions is available from http://proteomics.fhcrc.org


Asunto(s)
Biología Computacional/métodos , Bases de Datos Factuales , Espectrometría de Masas/métodos , Algoritmos , Secuencia de Aminoácidos , Interpretación Estadística de Datos , Bases de Datos de Proteínas , Almacenamiento y Recuperación de la Información , Péptidos , Lenguajes de Programación , Proteómica , Programas Informáticos
16.
Methods Mol Biol ; 328: 217-28, 2006.
Artículo en Inglés | MEDLINE | ID: mdl-16785652

RESUMEN

This chapter describes the use of an open-source, freely available informatics system for the identification of proteins using tandem mass spectra of peptides derived from an enzymatic digest of a mixture of mature proteins. The chapter describes the use of features of the Global Proteome Machine (GPM) interface that assist in making comprehensive assignments between spectra and sequences, including the detection of point mutations, posttranslational modifications, and experimental artifacts. The use of this interface to validate results using the GPM Database is also described. This data repository allows analysts to compare their own results to those obtained by other scientists to determine the degree to which their data are consistent with previous measurements.


Asunto(s)
Proteínas/química , Proteómica/métodos , Análisis de Secuencia de Proteína/métodos , Secuencia de Aminoácidos , Animales , Biología Computacional/métodos , Humanos , Datos de Secuencia Molecular , Péptidos/química , Lenguajes de Programación , Análisis de Secuencia de Proteína/instrumentación , Programas Informáticos
17.
Trends Biotechnol ; 20(12 Suppl): S35-8, 2002 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-12570158

RESUMEN

Proteomics has become dominated by large amounts of experimental data and interpreted results. This experimental data cannot be effectively used without understanding the fundamental structure of its information content and representing that information in such a way that knowledge can be extracted from it. This review explores the structure of this information with regard to three fundamental issues: the extraction of relevant information from raw data, the scale of the projects involved and the statistical significance of protein identification results.


Asunto(s)
Biología Computacional/métodos , Biología Computacional/tendencias , Proteoma , Algoritmos , Cromatografía Líquida de Alta Presión , Bases de Datos como Asunto , Electroforesis en Gel Bidimensional , Espectrometría de Masas , Programas Informáticos
19.
J Proteomics ; 72(5): 838-52, 2009 Jul 21.
Artículo en Inglés | MEDLINE | ID: mdl-19121650

RESUMEN

Multiple reaction monitoring (MRM), commonly employed for the mass spectrometric detection of small molecules, is rapidly gaining ground in proteomics. Its high sensitivity and specificity makes this targeted approach particularly useful when sample throughput or proteome coverage limits global studies. Existing tools to design MRM assays rely exclusively on theoretical predictions, or combine them with previous observations on the same type of sample. The additional mass spectrometric experimentation this requires can pose significant demands on time and material. To overcome these challenges, a new MRM worksheet was introduced into The Global Proteome Machine database (GPMDB) that provided all of the information needed to design MRM transitions based solely on archived observations made by other researchers in previous experiments. This required replacing the precursor ion intensity by the number of peptide observations, which proved to be an adequate substitute if peptides did not occur in multiple forms. While the absence of collision energy information proved largely inconsequential, successful prediction of unique transitions depended on the type of fragment ion involved. The design of MRM assays for iTRAQ-labeled tryptic peptides obtained from human platelet proteins demonstrated the usefulness of the MRM worksheet also for quantitative applications. This workflow, which relies exclusively on experimental observations stored in data repositories, therefore represents an attractive alternative for the prediction of MRM transitions prior to experimental validation and optimization.


Asunto(s)
Proteómica/instrumentación , Proteómica/métodos , Plaquetas/metabolismo , Proteínas Sanguíneas/química , Biología Computacional/métodos , Bases de Datos de Proteínas , Humanos , Iones , Espectrometría de Masas/métodos , Péptidos/química , Proteínas/química , Proteoma
20.
Mass Spectrom Rev ; 27(1): 1-19, 2008.
Artículo en Inglés | MEDLINE | ID: mdl-17979143

RESUMEN

Matrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS) has been successfully applied to elucidating biological questions trough the analysis of proteins, peptides, and nucleic acids. Here, we review the different approaches for analyzing the data that is generated by MALDI-MS. The first step in the analysis is the processing of the raw data to find peaks that correspond to the analytes. The peaks are characterized by their areas (or heights) and their centroids. The peak area can be used as a measure of the quantity of the analyte, and the centroid can be used to determine the mass of the analyte. The masses are then compared to models of the analyte, and these models are ranked according to how well they fit the data and their significance is calculated. This allows the determination of the identity (sequence and modifications) of the analytes. We show how this general data analysis workflow is applied to protein and nucleic acid chemistry as well as proteomics.


Asunto(s)
Mezclas Complejas/análisis , Biología Computacional/tendencias , Ácidos Nucleicos/análisis , Mapeo Peptídico/métodos , Proteómica/tendencias , Espectrometría de Masa por Láser de Matriz Asistida de Ionización Desorción/tendencias
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA