Búsqueda | Portal de Búsqueda de la BVS Colombia

1.

Beyond protein expression, MOPED goes multi-omics.

Montague, Elizabeth; Janko, Imre; Stanberry, Larissa; Lee, Elaine; Choiniere, John; Anderson, Nathaniel; Stewart, Elizabeth; Broomall, William; Higdon, Roger; Kolker, Natali; Kolker, Eugene.

Nucleic Acids Res ; 43(Database issue): D1145-51, 2015 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-25404128

RESUMEN

MOPED (Multi-Omics Profiling Expression Database; http://moped.proteinspire.org) has transitioned from solely a protein expression database to a multi-omics resource for human and model organisms. Through a web-based interface, MOPED presents consistently processed data for gene, protein and pathway expression. To improve data quality, consistency and use, MOPED includes metadata detailing experimental design and analysis methods. The multi-omics data are integrated through direct links between genes and proteins and further connected to pathways and experiments. MOPED now contains over 5 million records, information for approximately 75,000 genes and 50,000 proteins from four organisms (human, mouse, worm, yeast). These records correspond to 670 unique combinations of experiment, condition, localization and tissue. MOPED includes the following new features: pathway expression, Pathway Details pages, experimental metadata checklists, experiment summary statistics and more advanced searching tools. Advanced searching enables querying for genes, proteins, experiments, pathways and keywords of interest. The system is enhanced with visualizations for comparing across different data types. In the future MOPED will expand the number of organisms, increase integration with pathways and provide connections to disease.

Asunto(s)

Bases de Datos Genéticas , Perfilación de la Expresión Génica , Proteómica , Animales , Humanos , Internet , Ratones , Proteínas/genética , Proteínas/metabolismo

2.

Can "normal" protein expression ranges be estimated with high-throughput proteomics?

Higdon, Roger; Kolker, Eugene.

J Proteome Res ; 14(6): 2398-407, 2015 Jun 05.

Artículo en Inglés | MEDLINE | ID: mdl-25877823

RESUMEN

Although biological science discovery often involves comparing conditions to a normal state, in proteomics little is actually known about normal. Two Human Proteome studies featured in Nature offer new insights into protein expression and an opportunity to assess how high-throughput proteomics measures normal protein ranges. We use data from these studies to estimate technical and biological variability in protein expression and compare them to other expression data sets from normal tissue. Results show that measured protein expression across same-tissue replicates vary by ±4- to 10-fold for most proteins. Coefficients of variation (CV) for protein expression measurements range from 62% to 117% across different tissue experiments; however, adjusting for technical variation reduced this variability by as much as 50%. In addition, the CV could also be reduced by limiting comparisons to proteins with at least 3 or more unique peptide identifications as the CV was on average 33% lower than for proteins with 2 or fewer peptide identifications. We also selected 13 housekeeping proteins and genes that were expressed across all tissues with low variability to determine their utility as a reference set for normalization and comparative purposes. These results present the first step toward estimating normal protein ranges by determining the variability in expression measurements through combining publicly available data. They support an approach that combines standard protocols with replicates of normal tissues to estimate normal protein ranges for large numbers of proteins and tissues. This would be a tremendous resource for normal cellular physiology and comparisons of proteomics studies.

Asunto(s)

Ensayos Analíticos de Alto Rendimiento , Proteínas/metabolismo , Proteómica , Humanos , Valores de Referencia , Reproducibilidad de los Resultados

3.

OMICS studies: How about metadata checklist and data publications?

Kolker, Eugene; Stewart, Elizabeth.

J Proteome Res ; 13(3): 1783-4, 2014 Mar 07.

Artículo en Inglés | MEDLINE | ID: mdl-24494788

RESUMEN

Data fully utilized by the community resources promote progress rather than repetition. Effective data sharing can accelerate the transition from data to actionable knowledge, yet barriers to data sharing remain, both technological and procedural. The DELSA community has tackled the sharing barrier by creating a multi-omics metadata checklist for the life sciences. The checklist and associated data publication examples are now jointly published in Big Data and OMICS: A Journal of Integrative Biology. The checklist will enable diverse datasets to be easily harmonized and reused for richer analyses. It will facilitate data deposits, stand alone as a data publication, and grant appropriate credit to researchers. We invite the broader life sciences community to test the checklist for feedback and improvements.

Asunto(s)

Lista de Verificación/estadística & datos numéricos , Biología Computacional/organización & administración , Difusión de la Información , Humanos , Edición/organización & administración

4.

MOPED enables discoveries through consistently processed proteomics data.

Higdon, Roger; Stewart, Elizabeth; Stanberry, Larissa; Haynes, Winston; Choiniere, John; Montague, Elizabeth; Anderson, Nathaniel; Yandl, Gregory; Janko, Imre; Broomall, William; Fishilevich, Simon; Lancet, Doron; Kolker, Natali; Kolker, Eugene.

J Proteome Res ; 13(1): 107-13, 2014 Jan 03.

Artículo en Inglés | MEDLINE | ID: mdl-24350770

RESUMEN

The Model Organism Protein Expression Database (MOPED, http://moped.proteinspire.org) is an expanding proteomics resource to enable biological and biomedical discoveries. MOPED aggregates simple, standardized and consistently processed summaries of protein expression and metadata from proteomics (mass spectrometry) experiments from human and model organisms (mouse, worm, and yeast). The latest version of MOPED adds new estimates of protein abundance and concentration as well as relative (differential) expression data. MOPED provides a new updated query interface that allows users to explore information by organism, tissue, localization, condition, experiment, or keyword. MOPED supports the Human Proteome Project's efforts to generate chromosome- and diseases-specific proteomes by providing links from proteins to chromosome and disease information as well as many complementary resources. MOPED supports a new omics metadata checklist to harmonize data integration, analysis, and use. MOPED's development is driven by the user community, which spans 90 countries and guides future development that will transform MOPED into a multiomics resource. MOPED encourages users to submit data in a simple format. They can use the metadata checklist to generate a data publication for this submission. As a result, MOPED will provide even greater insights into complex biological processes and systems and enable deeper and more comprehensive biological and biomedical discoveries.

Asunto(s)

Bases de Datos de Proteínas , Proteómica , Animales , Humanos , Interfaz Usuario-Computador

5.

Differential expression analysis for pathways.

Haynes, Winston A; Higdon, Roger; Stanberry, Larissa; Collins, Dwayne; Kolker, Eugene.

PLoS Comput Biol ; 9(3): e1002967, 2013.

Artículo en Inglés | MEDLINE | ID: mdl-23516350

RESUMEN

Life science technologies generate a deluge of data that hold the keys to unlocking the secrets of important biological functions and disease mechanisms. We present DEAP, Differential Expression Analysis for Pathways, which capitalizes on information about biological pathways to identify important regulatory patterns from differential expression data. DEAP makes significant improvements over existing approaches by including information about pathway structure and discovering the most differentially expressed portion of the pathway. On simulated data, DEAP significantly outperformed traditional methods: with high differential expression, DEAP increased power by two orders of magnitude; with very low differential expression, DEAP doubled the power. DEAP performance was illustrated on two different gene and protein expression studies. DEAP discovered fourteen important pathways related to chronic obstructive pulmonary disease and interferon treatment that existing approaches omitted. On the interferon study, DEAP guided focus towards a four protein path within the 26 protein Notch signalling pathway.

Asunto(s)

Biología Computacional/métodos , Perfilación de la Expresión Génica/métodos , Modelos Biológicos , Transducción de Señal , Algoritmos , Simulación por Computador , Bases de Datos Genéticas , Enfermedad/genética , Humanos , Reproducibilidad de los Resultados

6.

Reproducibility: In praise of open research measures.

Kolker, Eugene; Altintas, Ilkay; Bourne, Philip; Faris, Jack; Fox, Geoffrey; Frishman, Dmitrij; Geraci, Christy; Hancock, William; Lin, Biaoyang; Lancet, Doron; Lisitsa, Andrey; Knight, Rob; Martens, Lennart; Mesirov, Jill; Özdemir, Vural; Schultes, Erik; Smith, Todd; Snyder, Michael; Srivastava, Sanjeeva; Toppo, Stefano; Wilmes, Paul.

Nature ; 498(7453): 170, 2013 Jun 13.

Artículo en Inglés | MEDLINE | ID: mdl-23765483

Asunto(s)

Acceso a la Información , Disciplinas de las Ciencias Biológicas/normas , Publicaciones Periódicas como Asunto/normas , Investigación/normas , Reproducibilidad de los Resultados

7.

MOPED: Model Organism Protein Expression Database.

Kolker, Eugene; Higdon, Roger; Haynes, Winston; Welch, Dean; Broomall, William; Lancet, Doron; Stanberry, Larissa; Kolker, Natali.

Nucleic Acids Res ; 40(Database issue): D1093-9, 2012 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-22139914

RESUMEN

Large numbers of mass spectrometry proteomics studies are being conducted to understand all types of biological processes. The size and complexity of proteomics data hinders efforts to easily share, integrate, query and compare the studies. The Model Organism Protein Expression Database (MOPED, htttp://moped.proteinspire.org) is a new and expanding proteomics resource that enables rapid browsing of protein expression information from publicly available studies on humans and model organisms. MOPED is designed to simplify the comparison and sharing of proteomics data for the greater research community. MOPED uniquely provides protein level expression data, meta-analysis capabilities and quantitative data from standardized analysis. Data can be queried for specific proteins, browsed based on organism, tissue, localization and condition and sorted by false discovery rate and expression. MOPED empowers users to visualize their own expression data and compare it with existing studies. Further, MOPED links to various protein and pathway databases, including GeneCards, Entrez, UniProt, KEGG and Reactome. The current version of MOPED contains over 43,000 proteins with at least one spectral match and more than 11 million high certainty spectra.

Asunto(s)

Bases de Datos de Proteínas , Proteínas/metabolismo , Animales , Humanos , Espectrometría de Masas , Ratones , Modelos Animales , Proteómica , Interfaz Usuario-Computador

8.

In-silico human genomics with GeneCards.

Stelzer, Gil; Dalah, Irina; Stein, Tsippi Iny; Satanower, Yigeal; Rosen, Naomi; Nativ, Noam; Oz-Levi, Danit; Olender, Tsviya; Belinky, Frida; Bahir, Iris; Krug, Hagit; Perco, Paul; Mayer, Bernd; Kolker, Eugene; Safran, Marilyn; Lancet, Doron.

Hum Genomics ; 5(6): 709-17, 2011 Oct.

Artículo en Inglés | MEDLINE | ID: mdl-22155609

RESUMEN

Since 1998, the bioinformatics, systems biology, genomics and medical communities have enjoyed a synergistic relationship with the GeneCards database of human genes (http://www.genecards.org). This human gene compendium was created to help to introduce order into the increasing chaos of information flow. As a consequence of viewing details and deep links related to specific genes, users have often requested enhanced capabilities, such that, over time, GeneCards has blossomed into a suite of tools (including GeneDecks, GeneALaCart, GeneLoc, GeneNote and GeneAnnot) for a variety of analyses of both single human genes and sets thereof. In this paper, we focus on inhouse and external research activities which have been enabled, enhanced, complemented and, in some cases, motivated by GeneCards. In turn, such interactions have often inspired and propelled improvements in GeneCards. We describe here the evolution and architecture of this project, including examples of synergistic applications in diverse areas such as synthetic lethality in cancer, the annotation of genetic variations in disease, omics integration in a systems biology approach to kidney disease, and bioinformatics tools.

Asunto(s)

Bases de Datos Genéticas , Genes/genética , Genoma Humano , Genómica , Biología Computacional , Humanos

9.

The necessity of adjusting tests of protein category enrichment in discovery proteomics.

Louie, Brenton; Higdon, Roger; Kolker, Eugene.

Bioinformatics ; 26(24): 3007-11, 2010 Dec 15.

Artículo en Inglés | MEDLINE | ID: mdl-21068002

RESUMEN

MOTIVATION: Enrichment tests are used in high-throughput experimentation to measure the association between gene or protein expression and membership in groups or pathways. The Fisher's exact test is commonly used. We specifically examined the associations produced by the Fisher test between protein identification by mass spectrometry discovery proteomics, and their Gene Ontology (GO) term assignments in a large yeast dataset. We found that direct application of the Fisher test is misleading in proteomics due to the bias in mass spectrometry to preferentially identify proteins based on their biochemical properties. False inference about associations can be made if this bias is not corrected. Our method adjusts Fisher tests for these biases and produces associations more directly attributable to protein expression rather than experimental bias. RESULTS: Using logistic regression, we modeled the association between protein identification and GO term assignments while adjusting for identification bias in mass spectrometry. The model accounts for five biochemical properties of peptides: (i) hydrophobicity, (ii) molecular weight, (iii) transfer energy, (iv) beta turn frequency and (v) isoelectric point. The model was fit on 181 060 peptides from 2678 proteins identified in 24 yeast proteomics datasets with a 1% false discovery rate. In analyzing the association between protein identification and their GO term assignments, we found that 25% (134 out of 544) of Fisher tests that showed significant association (q-value ≤0.05) were non-significant after adjustment using our model. Simulations generating yeast protein sets enriched for identification propensity show that unadjusted enrichment tests were biased while our approach worked well.

Asunto(s)

Espectrometría de Masas/métodos , Proteínas/clasificación , Proteómica/métodos , Proteínas Fúngicas/química , Proteínas Fúngicas/metabolismo , Interacciones Hidrofóbicas e Hidrofílicas , Modelos Logísticos , Péptidos/química , Proteínas/química , Proteínas/genética

10.

Estimating false discovery rates for peptide and protein identification using randomized databases.

Hather, Gregory; Higdon, Roger; Bauman, Andrew; von Haller, Priska D; Kolker, Eugene.

Proteomics ; 10(12): 2369-76, 2010 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-20391536

RESUMEN

MS-based proteomics characterizes protein contents of biological samples. The most common approach is to first match observed MS/MS peptide spectra against theoretical spectra from a protein sequence database and then to score these matches. The false discovery rate (FDR) can be estimated as a function of the score by searching together the protein sequence database and its randomized version and comparing the score distributions of the randomized versus nonrandomized matches. This work introduces a straightforward isotonic regression-based method to estimate the cumulative FDRs and local FDRs (LFDRs) of peptide identification. Our isotonic method not only performed as well as other methods used for comparison, but also has the advantages of being: (i) monotonic in the score, (ii) computationally simple, and (iii) not dependent on assumptions about score distributions. We demonstrate the flexibility of our approach by using it to estimate FDRs and LFDRs for protein identification using summaries of the peptide spectra scores. We reconfirmed that several of these methods were superior to a two-peptide rule. Finally, by estimating both the FDRs and LFDRs, we showed for both peptide and protein identification, moderate FDR values (5%) corresponded to large LFDR values (53 and 60%).

Asunto(s)

Biología Computacional , Bases de Datos de Proteínas , Péptidos/análisis , Proteínas/análisis

11.

A note on the false discovery rate and inconsistent comparisons between experiments.

Higdon, Roger; van Belle, Gerald; Kolker, Eugene.

Bioinformatics ; 24(10): 1225-8, 2008 May 15.

Artículo en Inglés | MEDLINE | ID: mdl-18424815

RESUMEN

MOTIVATION: The false discovery rate (FDR) has been widely adopted to address the multiple comparisons issue in high-throughput experiments such as microarray gene-expression studies. However, while the FDR is quite useful as an approach to limit false discoveries within a single experiment, like other multiple comparison corrections it may be an inappropriate way to compare results across experiments. This article uses several examples based on gene-expression data to demonstrate the potential misinterpretations that can arise from using FDR to compare across experiments. Researchers should be aware of these pitfalls and wary of using FDR to compare experimental results. FDR should be augmented with other measures such as p-values and expression ratios. It is worth including standard error and variance information for meta-analyses and, if possible, the raw data for re-analyses. This is especially important for high-throughput studies because data are often re-used for different objectives, including comparing common elements across many experiments. No single error rate or data summary may be appropriate for all of the different objectives.

Asunto(s)

Algoritmos , Artefactos , Interpretación Estadística de Datos , Reacciones Falso Positivas , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Reproducibilidad de los Resultados , Sensibilidad y Especificidad

12.

Big data and ethics review for health systems research in LMICs: understanding risk, uncertainty and ignorance -- and catching the black swans?

Dereli, Türkay; Coskun, Yavuz; Kolker, Eugene; Güner, Oner; Agirbasli, Mehmet; Ozdemir, Vural.

Am J Bioeth ; 14(2): 48-50, 2014.

Artículo en Inglés | MEDLINE | ID: mdl-24521341

Asunto(s)

Países en Desarrollo , Investigación sobre Servicios de Salud/ética , Consentimiento Informado/ética , Sujetos de Investigación , Humanos

13.

Host airway proteins interact with Staphylococcus aureus during early pneumonia.

Ventura, Christy L; Higdon, Roger; Kolker, Eugene; Skerrett, Shawn J; Rubens, Craig E.

Infect Immun ; 76(3): 888-98, 2008 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-18195024

RESUMEN

Staphylococcus aureus is a major cause of hospital-acquired pneumonia and is emerging as an important etiological agent of community-acquired pneumonia. Little is known about the specific host-pathogen interactions that occur when S. aureus first enters the airway. A shotgun proteomics approach was utilized to identify the airway proteins associated with S. aureus during the first 6 h of infection. Host proteins eluted from bacteria recovered from the airways of mice 30 min or 6 h following intranasal inoculation under anesthesia were subjected to liquid chromatography and tandem mass spectrometry. A total of 513 host proteins were associated with S. aureus 30 min and/or 6 h postinoculation. A majority of the identified proteins were host cytosolic proteins, suggesting that S. aureus was rapidly internalized by phagocytes in the airway and that significant host cell lysis occurred during early infection. In addition, extracellular matrix and secreted proteins, including fibronectin, antimicrobial peptides, and complement components, were associated with S. aureus at both time points. The interaction of 12 host proteins shown to bind to S. aureus in vitro was demonstrated in vivo for the first time. The association of hemoglobin, which is thought to be the primary staphylococcal iron source during infection, with S. aureus in the airway was validated by immunoblotting. Thus, we used our recently developed S. aureus pneumonia model and shotgun proteomics to validate previous in vitro findings and to identify nearly 500 other proteins that interact with S. aureus in vivo. The data presented here provide novel insights into the host-pathogen interactions that occur when S. aureus enters the airway.

Asunto(s)

Interacciones Huésped-Patógeno , Neumonía/microbiología , Proteínas/aislamiento & purificación , Infecciones Estafilocócicas/microbiología , Staphylococcus aureus/química , Animales , Líquido del Lavado Bronquioalveolar/química , Líquido del Lavado Bronquioalveolar/microbiología , Cromatografía Liquida , Femenino , Humanos , Immunoblotting , Masculino , Ratones , Ratones Endogámicos C57BL , Unión Proteica , Proteínas/química , Proteoma/análisis , Proteoma/aislamiento & purificación , Espectrometría de Masas en Tándem

14.

Staphylococcus aureus elicits marked alterations in the airway proteome during early pneumonia.

Ventura, Christy L; Higdon, Roger; Hohmann, Laura; Martin, Daniel; Kolker, Eugene; Liggitt, H Denny; Skerrett, Shawn J; Rubens, Craig E.

Infect Immun ; 76(12): 5862-72, 2008 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-18852243

RESUMEN

Pneumonia caused by Staphylococcus aureus is a growing concern in the health care community. We hypothesized that characterization of the early innate immune response to bacteria in the lungs would provide insight into the mechanisms used by the host to protect itself from infection. An adult mouse model of Staphylococcus aureus pneumonia was utilized to define the early events in the innate immune response and to assess the changes in the airway proteome during the first 6 h of pneumonia. S. aureus actively replicated in the lungs of mice inoculated intranasally under anesthesia to cause significant morbidity and mortality. By 6 h postinoculation, the release of proinflammatory cytokines caused effective recruitment of neutrophils to the airway. Neutrophil influx, loss of alveolar architecture, and consolidated pneumonia were observed histologically 6 h postinoculation. Bronchoalveolar lavage fluids from mice inoculated with phosphate-buffered saline (PBS) or S. aureus were depleted of overabundant proteins and subjected to strong cation exchange fractionation followed by liquid chromatography and tandem mass spectrometry to identify the proteins present in the airway. No significant changes in response to PBS inoculation or 30 min following S. aureus inoculation were observed. However, a dramatic increase in extracellular proteins was observed 6 h postinoculation with S. aureus, with the increase dominated by inflammatory and coagulation proteins. The data presented here provide a comprehensive evaluation of the rapid and vigorous innate immune response mounted in the host airway during the earliest stages of S. aureus pneumonia.

Asunto(s)

Neumonía Estafilocócica/inmunología , Proteoma/inmunología , Infecciones Estafilocócicas/inmunología , Animales , Western Blotting , Líquido del Lavado Bronquioalveolar/química , Líquido del Lavado Bronquioalveolar/citología , Cromatografía Liquida , Citocinas/análisis , Citocinas/inmunología , Femenino , Pulmón/microbiología , Pulmón/patología , Masculino , Ratones , Ratones Endogámicos C57BL , Infiltración Neutrófila/inmunología , Neumonía Estafilocócica/microbiología , Neumonía Estafilocócica/patología , Infecciones Estafilocócicas/patología , Staphylococcus aureus

15.

Toward a standards-compliant genomic and metagenomic publication record.

Garrity, George M; Field, Dawn; Kyrpides, Nikos; Hirschman, Lynette; Sansone, Susanna-Assunta; Angiuoli, Samuel; Cole, James R; Glöckner, Frank Oliver; Kolker, Eugene; Kowalchuk, George; Moran, Mary Ann; Ussery, Dave; White, Owen.

OMICS ; 12(2): 157-60, 2008 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-18564916

RESUMEN

Increasingly, we are aware as a community of the growing need to manage the avalanche of genomic and metagenomic data, in addition to related data types like ribosomal RNA and barcode sequences, in a way that tightly integrates contextual data with traditional literature in a machine-readable way. It is for this reason that the Genomic Standards Consortium (GSC) formed in 2005. Here we suggest that we move beyond the development of standards and tackle standards compliance and improved data capture at the level of the scientific publication. We are supported in this goal by the fact that the scientific community is in the midst of a publishing revolution. This revolution is marked by a growing shift away from a traditional dichotomy between "journal articles" and "database entries" and an increasing adoption of hybrid models of collecting and disseminating scientific information. With respect to genomes and metagenomes and related data types, we feel the scientific community would be best served by the immediate launch of a central repository of short, highly structured "Genome Notes" that must be standards compliant. This could be done in the context of an existing journal, but we also suggest the more radical solution of launching a new journal. Such a journal could be designed to cater to a wide range of standards-related content types that are not currently centralized in the published literature. It could also support the demand for centralizing aspects of the "gray literature" (documents developed by institutions or communities) such as the call by the GSC for a central repository of Standard Operating Procedures describing the genomic annotation pipelines of the major sequencing centers. We argue that such an "eJournal," published under the Open Access paradigm by the GSC, could be an attractive publishing forum for a broader range of standardization initiatives within, and beyond, the GSC and thereby fill an unoccupied yet increasingly important niche within the current research landscape.

Asunto(s)

Genómica/normas , Adhesión a Directriz , Publicaciones

16.

Meeting report: the fourth Genomic Standards Consortium (GSC) workshop.

Field, Dawn; Glöckner, Frank Oliver; Garrity, George M; Gray, Tanya; Sterk, Peter; Cochrane, Guy; Vaughan, Robert; Kolker, Eugene; Kottmann, Renzo; Kyrpides, Nikos; Angiuoli, Sam; Dawyndt, Peter; Guralnick, Robert; Goldstein, Philip; Hall, Neil; Hirschman, Lynette; Kravitz, Saul; Lister, Allyson L; Markowitz, Victor; Thomson, Nick; Whetzel, Trish.

OMICS ; 12(2): 101-8, 2008 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-18564914

RESUMEN

This meeting report summarizes the proceedings of the "eGenomics: Cataloguing our Complete Genome Collection IV" workshop held June 6-8, 2007, at the National Institute for Environmental eScience (NIEeS), Cambridge, United Kingdom. This fourth workshop of the Genomic Standards Consortium (GSC) was a mix of short presentations, strategy discussions, and technical sessions. Speakers provided progress reports on the development of the "Minimum Information about a Genome Sequence" (MIGS) specification and the closely integrated "Minimum Information about a Metagenome Sequence" (MIMS) specification. The key outcome of the workshop was consensus on the next version of the MIGS/MIMS specification (v1.2). This drove further definition and restructuring of the MIGS/MIMS XML schema (syntax). With respect to semantics, a term vetting group was established to ensure that terms are properly defined and submitted to the appropriate ontology projects. Perhaps the single most important outcome of the workshop was a proposal to move beyond the concept of "minimum" to create a far richer XML schema that would define a "Genomic Contextual Data Markup Language" (GCDML) suitable for wider semantic integration across databases. GCDML will contain not only curated information (e.g., compliant with MIGS/MIMS), but also be extended to include a variety of data processing and calculations. Further information about the Genomic Standards Consortium and its range of activities can be found at http://gensc.org.

Asunto(s)

Bases de Datos Genéticas , Genómica , Educación , Lenguajes de Programación , Estándares de Referencia

17.

A predictive model for identifying proteins by a single peptide match.

Higdon, Roger; Kolker, Eugene.

Bioinformatics ; 23(3): 277-80, 2007 Feb 01.

Artículo en Inglés | MEDLINE | ID: mdl-17121779

RESUMEN

MOTIVATION: Tandem mass-spectrometry of trypsin digests, followed by database searching, is one of the most popular approaches in high-throughput proteomics studies. Peptides are considered identified if they pass certain scoring thresholds. To avoid false positive protein identification, > or = 2 unique peptides identified within a single protein are generally recommended. Still, in a typical high-throughput experiment, hundreds of proteins are identified only by a single peptide. We introduce here a method for distinguishing between true and false identifications among single-hit proteins. The approach is based on randomized database searching and usage of logistic regression models with cross-validation. This approach is implemented to analyze three bacterial samples enabling recovery 68-98% of the correct single-hit proteins with an error rate of < 2%. This results in a 22-65% increase in number of identified proteins. Identifying true single-hit proteins will lead to discovering many crucial regulators, biomarkers and other low abundance proteins. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Bases de Datos de Proteínas , Almacenamiento y Recuperación de la Información/métodos , Espectrometría de Masas/métodos , Mapeo Peptídico/métodos , Proteínas/análisis , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Algoritmos , Secuencia de Aminoácidos , Simulación por Computador , Sistemas de Administración de Bases de Datos , Modelos Logísticos , Modelos Químicos , Modelos Moleculares , Datos de Secuencia Molecular , Reconocimiento de Normas Patrones Automatizadas , Proteínas/química , Análisis de Regresión

18.

Protein identification and expression analysis using mass spectrometry.

Kolker, Eugene; Higdon, Roger; Hogan, Jason M.

Trends Microbiol ; 14(5): 229-35, 2006 May.

Artículo en Inglés | MEDLINE | ID: mdl-16603360

RESUMEN

The identification and quantification of the proteins that a whole organism expresses under certain conditions is a main focus of high-throughput proteomics. Advanced proteomics approaches generate new biologically relevant data and potent hypotheses. A practical report of what proteome studies can and cannot accomplish in common laboratory settings is presented here. The review discusses the most popular tandem mass-spectrometry-based methods and focuses on how to produce reliable results. A step-by-step description of proteome experiments is given, including sample preparation, digestion, labeling, liquid chromatography, data processing, database searching and statistical analysis. The difficulties and bottlenecks of proteome analysis are addressed and the requirements for further improvements are discussed. Several diverse high-throughput proteomics-based studies of microorganisms are described.

Asunto(s)

Proteínas/análisis , Proteómica/métodos , Espectrometría de Masa por Ionización de Electrospray/métodos , Secuencia de Aminoácidos , Interpretación Estadística de Datos , Datos de Secuencia Molecular

19.

Experiment-specific estimation of peptide identification probabilities using a randomized database.

Higdon, Roger; Hogan, Jason M; Kolker, Natali; van Belle, Gerald; Kolker, Eugene.

OMICS ; 11(4): 351-65, 2007.

Artículo en Inglés | MEDLINE | ID: mdl-18092908

RESUMEN

Determining the error rate for peptide and protein identification accurately and reliably is necessary to enable evaluation and crosscomparisons of high throughput proteomics experiments. Currently, peptide identification is based either on preset scoring thresholds or on probabilistic models trained on datasets that are often dissimilar to experimental results. The false discovery rates (FDR) and peptide identification probabilities for these preset thresholds or models often vary greatly across different experimental treatments, organisms, or instruments used in specific experiments. To overcome these difficulties, randomized databases have been used to estimate the FDR. However, the cumulative FDR may include low probability identifications when there are a large number of peptide identifications and exclude high probability identifications when there are few. To overcome this logical inconsistency, this study expands the use of randomized databases to generate experiment-specific estimates of peptide identification probabilities. These experiment-specific probabilities are generated by logistic and Loess regression models of the peptide scores obtained from original and reshuffled database matches. These experiment-specific probabilities are shown to very well approximate "true" probabilities based on known standard protein mixtures across different experiments. Probabilities generated by the earlier Peptide_Prophet and more recent LIPS models are shown to differ significantly from this study's experiment-specific probabilities, especially for unknown samples. The experiment-specific probabilities reliably estimate the accuracy of peptide identifications and overcome potential logical inconsistencies of the cumulative FDR. This estimation method is demonstrated using a Sequest database search, LIPS model, and a reshuffled database. However, this approach is generally applicable to any search algorithm, peptide scoring, and statistical model when using a randomized database.

Asunto(s)

Bases de Datos de Proteínas , Péptidos/química , Algoritmos , Modelos Biológicos , Probabilidad , Distribución Aleatoria , Análisis de Regresión , Programas Informáticos

20.

New metrics for comparative genomics.

Galperin, Michael Y; Kolker, Eugene.

Curr Opin Biotechnol ; 17(5): 440-7, 2006 Oct.

Artículo en Inglés | MEDLINE | ID: mdl-16978854

RESUMEN

The availability of genome sequences from a variety of organisms presents an opportunity to apply this sequence information to solving the key problems of molecular biology. One of the principal roadblocks on this path is the lack of appropriate descriptors and metrics that could succinctly represent the new knowledge stemming from the genomic data. Several new metrics have recently been used in comparative genome analysis, yet challenges remain in finding an appropriate language for the emerging discipline of systems biology.

Asunto(s)

Biología Computacional/métodos , Genómica/métodos , Animales , Biología Computacional/normas , Genoma/genética , Genómica/normas , Humanos , Filogenia , Proteómica/métodos , Proteómica/normas , Biología de Sistemas/métodos , Biología de Sistemas/normas

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA