Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
Mais filtros











Base de dados
Intervalo de ano de publicação
1.
Proteomes ; 5(1)2017 Feb 03.
Artigo em Inglês | MEDLINE | ID: mdl-28248256

RESUMO

Medulloblastoma (MB) is the most common malignant pediatric brain tumor. Patient survival has remained largely the same for the past 20 years, with therapies causing significant health, cognitive, behavioral and developmental complications for those who survive the tumor. In this study, we profiled the total transcriptome and proteome of two established MB cell lines, Daoy and UW228, using high-throughput RNA sequencing (RNA-Seq) and label-free nano-LC-MS/MS-based quantitative proteomics, coupled with advanced pathway analysis. While Daoy has been suggested to belong to the sonic hedgehog (SHH) subtype, the exact UW228 subtype is not yet clearly established. Thus, a goal of this study was to identify protein markers and pathways that would help elucidate their subtype classification. A number of differentially expressed genes and proteins, including a number of adhesion, cytoskeletal and signaling molecules, were observed between the two cell lines. While several cancer-associated genes/proteins exhibited similar expression across the two cell lines, upregulation of a number of signature proteins and enrichment of key components of SHH and WNT signaling pathways were uniquely observed in Daoy and UW228, respectively. The novel information on differentially expressed genes/proteins and enriched pathways provide insights into the biology of MB, which could help elucidate their subtype classification.

2.
OMICS ; 19(4): 197-208, 2015 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-25831060

RESUMO

Complex diseases are caused by a combination of genetic and environmental factors, creating a difficult challenge for diagnosis and defining subtypes. This review article describes how distinct disease subtypes can be identified through integration and analysis of clinical and multi-omics data. A broad shift toward molecular subtyping of disease using genetic and omics data has yielded successful results in cancer and other complex diseases. To determine molecular subtypes, patients are first classified by applying clustering methods to different types of omics data, then these results are integrated with clinical data to characterize distinct disease subtypes. An example of this molecular-data-first approach is in research on Autism Spectrum Disorder (ASD), a spectrum of social communication disorders marked by tremendous etiological and phenotypic heterogeneity. In the case of ASD, omics data such as exome sequences and gene and protein expression data are combined with clinical data such as psychometric testing and imaging to enable subtype identification. Novel ASD subtypes have been proposed, such as CHD8, using this molecular subtyping approach. Broader use of molecular subtyping in complex disease research is impeded by data heterogeneity, diversity of standards, and ineffective analysis tools. The future of molecular subtyping for ASD and other complex diseases calls for an integrated resource to identify disease mechanisms, classify new patients, and inform effective treatment options. This in turn will empower and accelerate precision medicine and personalized healthcare.


Assuntos
Transtorno do Espectro Autista/genética , Genômica , Medicina de Precisão , Transtorno do Espectro Autista/classificação , Transtorno do Espectro Autista/terapia , Análise por Conglomerados , Humanos , Tipagem Molecular
3.
Hum Genomics ; 5(6): 709-17, 2011 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-22155609

RESUMO

Since 1998, the bioinformatics, systems biology, genomics and medical communities have enjoyed a synergistic relationship with the GeneCards database of human genes (http://www.genecards.org). This human gene compendium was created to help to introduce order into the increasing chaos of information flow. As a consequence of viewing details and deep links related to specific genes, users have often requested enhanced capabilities, such that, over time, GeneCards has blossomed into a suite of tools (including GeneDecks, GeneALaCart, GeneLoc, GeneNote and GeneAnnot) for a variety of analyses of both single human genes and sets thereof. In this paper, we focus on inhouse and external research activities which have been enabled, enhanced, complemented and, in some cases, motivated by GeneCards. In turn, such interactions have often inspired and propelled improvements in GeneCards. We describe here the evolution and architecture of this project, including examples of synergistic applications in diverse areas such as synthetic lethality in cancer, the annotation of genetic variations in disease, omics integration in a systems biology approach to kidney disease, and bioinformatics tools.


Assuntos
Bases de Dados Genéticas , Genes/genética , Genoma Humano , Genômica , Biologia Computacional , Humanos
4.
Bioinformatics ; 26(24): 3007-11, 2010 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-21068002

RESUMO

MOTIVATION: Enrichment tests are used in high-throughput experimentation to measure the association between gene or protein expression and membership in groups or pathways. The Fisher's exact test is commonly used. We specifically examined the associations produced by the Fisher test between protein identification by mass spectrometry discovery proteomics, and their Gene Ontology (GO) term assignments in a large yeast dataset. We found that direct application of the Fisher test is misleading in proteomics due to the bias in mass spectrometry to preferentially identify proteins based on their biochemical properties. False inference about associations can be made if this bias is not corrected. Our method adjusts Fisher tests for these biases and produces associations more directly attributable to protein expression rather than experimental bias. RESULTS: Using logistic regression, we modeled the association between protein identification and GO term assignments while adjusting for identification bias in mass spectrometry. The model accounts for five biochemical properties of peptides: (i) hydrophobicity, (ii) molecular weight, (iii) transfer energy, (iv) beta turn frequency and (v) isoelectric point. The model was fit on 181 060 peptides from 2678 proteins identified in 24 yeast proteomics datasets with a 1% false discovery rate. In analyzing the association between protein identification and their GO term assignments, we found that 25% (134 out of 544) of Fisher tests that showed significant association (q-value ≤0.05) were non-significant after adjustment using our model. Simulations generating yeast protein sets enriched for identification propensity show that unadjusted enrichment tests were biased while our approach worked well.


Assuntos
Espectrometria de Massas/métodos , Proteínas/classificação , Proteômica/métodos , Proteínas Fúngicas/química , Proteínas Fúngicas/metabolismo , Interações Hidrofóbicas e Hidrofílicas , Modelos Logísticos , Peptídeos/química , Proteínas/química , Proteínas/genética
5.
Proteomics ; 10(12): 2369-76, 2010 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-20391536

RESUMO

MS-based proteomics characterizes protein contents of biological samples. The most common approach is to first match observed MS/MS peptide spectra against theoretical spectra from a protein sequence database and then to score these matches. The false discovery rate (FDR) can be estimated as a function of the score by searching together the protein sequence database and its randomized version and comparing the score distributions of the randomized versus nonrandomized matches. This work introduces a straightforward isotonic regression-based method to estimate the cumulative FDRs and local FDRs (LFDRs) of peptide identification. Our isotonic method not only performed as well as other methods used for comparison, but also has the advantages of being: (i) monotonic in the score, (ii) computationally simple, and (iii) not dependent on assumptions about score distributions. We demonstrate the flexibility of our approach by using it to estimate FDRs and LFDRs for protein identification using summaries of the peptide spectra scores. We reconfirmed that several of these methods were superior to a two-peptide rule. Finally, by estimating both the FDRs and LFDRs, we showed for both peptide and protein identification, moderate FDR values (5%) corresponded to large LFDR values (53 and 60%).


Assuntos
Biologia Computacional , Bases de Dados de Proteínas , Peptídeos/análise , Proteínas/análise
6.
OMICS ; 13(4): 285-9, 2009 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-19691437

RESUMO

Personalized medicine heralds the future of medicine. The 2009 Westlake International Conference on Personalized Medicine held from May 29-31, 2009, in Hangzhou, China (www.westlakeconference.org) brought together approximately 200 participants and over 40 presenters from a dozen countries. The conference was discussed in detail, with focus on the following major areas of personalized medicine: preventative and predictive medicine, cancer systems biology and therapy, physiology and artificial liver systems, nanotechnology and informatics, proteomics and systems pathology, genomics and pharmacogenomics, and epigenetics and new techniques. During the conference, the Asian Association of Systems Biology and Medicine (AASBM) was officially formed, and the ad hoc steering members for each of the major Asian countries were appointed to recruit key players from each country. The IBC (Interdisciplinary Bio Central) journal was suggested as the official journal for the AASBM, as it is a truly open journal (www.ibc7.org). The Web site for AASBM will be aasbm.org.


Assuntos
Congressos como Assunto , Genômica , Cooperação Internacional , Nanotecnologia , Neoplasias , Farmacogenética/métodos , Proteômica , Sociedades Médicas , Biologia de Sistemas/métodos , Humanos
7.
OMICS ; 11(4): 351-65, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-18092908

RESUMO

Determining the error rate for peptide and protein identification accurately and reliably is necessary to enable evaluation and crosscomparisons of high throughput proteomics experiments. Currently, peptide identification is based either on preset scoring thresholds or on probabilistic models trained on datasets that are often dissimilar to experimental results. The false discovery rates (FDR) and peptide identification probabilities for these preset thresholds or models often vary greatly across different experimental treatments, organisms, or instruments used in specific experiments. To overcome these difficulties, randomized databases have been used to estimate the FDR. However, the cumulative FDR may include low probability identifications when there are a large number of peptide identifications and exclude high probability identifications when there are few. To overcome this logical inconsistency, this study expands the use of randomized databases to generate experiment-specific estimates of peptide identification probabilities. These experiment-specific probabilities are generated by logistic and Loess regression models of the peptide scores obtained from original and reshuffled database matches. These experiment-specific probabilities are shown to very well approximate "true" probabilities based on known standard protein mixtures across different experiments. Probabilities generated by the earlier Peptide_Prophet and more recent LIPS models are shown to differ significantly from this study's experiment-specific probabilities, especially for unknown samples. The experiment-specific probabilities reliably estimate the accuracy of peptide identifications and overcome potential logical inconsistencies of the cumulative FDR. This estimation method is demonstrated using a Sequest database search, LIPS model, and a reshuffled database. However, this approach is generally applicable to any search algorithm, peptide scoring, and statistical model when using a randomized database.


Assuntos
Bases de Dados de Proteínas , Peptídeos/química , Algoritmos , Modelos Biológicos , Probabilidade , Distribuição Aleatória , Análise de Regressão , Software
8.
OMICS ; 9(3): 233-50, 2005.
Artigo em Inglês | MEDLINE | ID: mdl-16209638

RESUMO

High-throughput protein analysis by tandem mass spectrometry produces anywhere from thousands to millions of spectra that are being used for peptide and protein identifications. Though each spectrum corresponds only to one charged peptide (ion) state, repetitive database searches of multiple charge states are typically conducted since the resolution of many common mass spectrometers is not sufficient to determine the charge state. The resulting database searches are both error-prone and time-consuming. We describe a straightforward, accurate approach on charge state estimation (CHASTE). CHASTE relies on fragment ion peak distributions, and by using reliable logistic regression models, combines different measurements to improve its accuracy. CHASTE's performance has been validated on data sets, comprised of known peptide dissociation spectra, obtained by replicate analyses of our earlier developed protein standard mixture using ion trap mass spectrometers at different laboratories. CHASTE was able to reduce number of needed database searches by at least 60% and the number of redundant searches by at least 90% virtually without any informational loss. This greatly alleviates one of the major bottlenecks in high throughput peptide and protein identifications. Thresholds and parameter estimates can be tailored to specific analysis situations, pipelines, and instrumentations. CHASTE was implemented in Java GUI-based and command-line-based interfaces.


Assuntos
Espectrometria de Massas , Proteômica/métodos , Gráficos por Computador , Bases de Dados de Proteínas , Peptídeos/análise , Valor Preditivo dos Testes , Proteínas/análise , Reprodutibilidade dos Testes , Software , Interface Usuário-Computador
9.
OMICS ; 9(4): 364-79, 2005.
Artigo em Inglês | MEDLINE | ID: mdl-16402894

RESUMO

Tandem mass spectrometry (MS/MS) combined with database searching is currently the most widely used method for high-throughput peptide and protein identification. Many different algorithms, scoring criteria, and statistical models have been used to identify peptides and proteins in complex biological samples, and many studies, including our own, describe the accuracy of these identifications, using at best generic terms such as "high confidence." False positive identification rates for these criteria can vary substantially with changing organisms under study, growth conditions, sequence databases, experimental protocols, and instrumentation; therefore, study-specific methods are needed to estimate the accuracy (false positive rates) of these peptide and protein identifications. We present and evaluate methods for estimating false positive identification rates based on searches of randomized databases (reversed and reshuffled). We examine the use of separate searches of a forward then a randomized database and combined searches of a randomized database appended to a forward sequence database. Estimated error rates from randomized database searches are first compared against actual error rates from MS/MS runs of known protein standards. These methods are then applied to biological samples of the model microorganism Shewanella oneidensis strain MR-1. Based on the results obtained in this study, we recommend the use of use of combined searches of a reshuffled database appended to a forward sequence database as a means providing quantitative estimates of false positive identification rates of peptides and proteins. This will allow researchers to set criteria and thresholds to achieve a desired error rate and provide the scientific community with direct and quantifiable measures of peptide and protein identification accuracy as opposed to vague assessments such as "high confidence."


Assuntos
Bases de Dados de Proteínas , Espectrometria de Massas/métodos , Peptídeos/química , Proteínas/química
10.
OMICS ; 8(1): 79-92, 2004.
Artigo em Inglês | MEDLINE | ID: mdl-15107238

RESUMO

Mixtures of moderate complexity were formed from 23 peptides and 12 proteins digested with trypsin, all individually characterized. These mixtures were analyzed with replicates in full and windowed m/z ranges using online high-performance reverse phase liquid chromatography coupled via electrospray ionization to an ion trap mass spectrometer. The resulting spectra were searched using SEQUEST against databases of different sizes and contents and confidences of the observed identifications were evaluated by our earlier statistical model. These data were then combined with biologically derived spectral data, searched, and further evaluated. All peptides but one and all proteins were identified with high confidence. Additionally, the presence and behavior of quadruply charged peptides was analyzed. The properties of the proposed peptide and protein mixtures as well as the performance of the statistical model were carefully investigated. These mixtures mimic the complexity seen in large-scale proteomics experiments, and are proposed to serve as quality assessment standards for future proteome studies.


Assuntos
Proteômica/métodos , Tripsina/farmacologia , Animais , Bases de Dados como Assunto , Espectrometria de Massas , Peptídeos/química , Proteínas/química , Proteoma , Espectrometria de Massas por Ionização por Electrospray , Estatística como Assunto
11.
OMICS ; 8(4): 357-69, 2004.
Artigo em Inglês | MEDLINE | ID: mdl-15703482

RESUMO

This study addresses the issue of peptide identification resulting from tandem mass spectrometry proteomics analysis followed by database search. This work shows that the Logistic Identification of Peptides (LIP) Index achieves high sensitivity and specificity for peptide classification relative to a manually verified "gold" standard and also accurately estimates the probability of a correct peptide match. The LIP Index is a weighted average of SEQUEST output variables based on logistic regression models and is a transparent, easy to use, inclusive, extendable, and statistically sound approach to classify correct peptide identifications. Modifications, such as normalizing cross-correlations (Xcorr) for peptide length, adjusting for charge state, and the number of tryptic termini, significantly improve the fit the logistic regression models, as well as increase sensitivity and specificity. The LIP Index also incorporates earlier developed statistical models on spectral quality assessment and peptide identification, which further improves sensitivity and specificity.


Assuntos
Biologia Computacional/métodos , Espectrometria de Massas/métodos , Peptídeos/química , Software , Algoritmos , Proteínas de Bactérias/química , Cromatografia Líquida , Bases de Dados como Assunto , Bases de Dados de Proteínas , Modelos Logísticos , Modelos Estatísticos , Modelos Teóricos , Probabilidade , Proteínas/química , Proteômica , Curva ROC , Sensibilidade e Especificidade , Tripsina/farmacologia
12.
Anal Chem ; 75(17): 4646-58, 2003 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-14632076

RESUMO

A statistical model is presented for computing probabilities that proteins are present in a sample on the basis of peptides assigned to tandem mass (MS/MS) spectra acquired from a proteolytic digest of the sample. Peptides that correspond to more than a single protein in the sequence database are apportioned among all corresponding proteins, and a minimal protein list sufficient to account for the observed peptide assignments is derived using the expectation-maximization algorithm. Using peptide assignments to spectra generated from a sample of 18 purified proteins, as well as complex H. influenzae and Halobacterium samples, the model is shown to produce probabilities that are accurate and have high power to discriminate correct from incorrect protein identifications. This method allows filtering of large-scale proteomics data sets with predictable sensitivity and false positive identification error rates. Fast, consistent, and transparent, it provides a standard for publishing large-scale protein identification data sets in the literature and for comparing the results obtained from different experiments.


Assuntos
Espectrometria de Massas/métodos , Modelos Estatísticos , Proteínas/análise , Proteínas/química , Sequência de Aminoácidos , Humanos , Dados de Sequência Molecular , Peptídeos/análise , Peptídeos/química
13.
Anal Chem ; 74(20): 5383-92, 2002 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-12403597

RESUMO

We present a statistical model to estimate the accuracy of peptide assignments to tandem mass (MS/MS) spectra made by database search applications such as SEQUEST. Employing the expectation maximization algorithm, the analysis learns to distinguish correct from incorrect database search results, computing probabilities that peptide assignments to spectra are correct based upon database search scores and the number of tryptic termini of peptides. Using SEQUEST search results for spectra generated from a sample of known protein components, we demonstrate that the computed probabilities are accurate and have high power to discriminate between correctly and incorrectly assigned peptides. This analysis makes it possible to filter large volumes of MS/MS database search results with predictable false identification error rates and can serve as a common standard by which the results of different research groups are compared.


Assuntos
Peptídeos/análise , Algoritmos , Bases de Dados Factuais , Espectrometria de Massas , Modelos Estatísticos
14.
OMICS ; 6(2): 207-12, 2002.
Artigo em Inglês | MEDLINE | ID: mdl-12143966

RESUMO

Several methods have been used to identify peptides that correspond to tandem mass spectra. In this work, we describe a data set of low energy tandem mass spectra generated from a control mixture of known protein components that can be used to evaluate the accuracy of these methods. As an example, these spectra were searched by the SEQUEST application against a human peptide sequence database. The numbers of resulting correct and incorrect peptide assignments were then determined. We show how the sensitivity and error rate are affected by the use of various filtering criteria based upon SEQUEST scores and the number of tryptic termini of assigned peptides.


Assuntos
Espectrometria de Massas/métodos , Peptídeos/química , Sequência de Aminoácidos , Animais , Humanos , Dados de Sequência Molecular , Sensibilidade e Especificidade , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA