RESUMO
MOPED (Multi-Omics Profiling Expression Database; http://moped.proteinspire.org) has transitioned from solely a protein expression database to a multi-omics resource for human and model organisms. Through a web-based interface, MOPED presents consistently processed data for gene, protein and pathway expression. To improve data quality, consistency and use, MOPED includes metadata detailing experimental design and analysis methods. The multi-omics data are integrated through direct links between genes and proteins and further connected to pathways and experiments. MOPED now contains over 5 million records, information for approximately 75,000 genes and 50,000 proteins from four organisms (human, mouse, worm, yeast). These records correspond to 670 unique combinations of experiment, condition, localization and tissue. MOPED includes the following new features: pathway expression, Pathway Details pages, experimental metadata checklists, experiment summary statistics and more advanced searching tools. Advanced searching enables querying for genes, proteins, experiments, pathways and keywords of interest. The system is enhanced with visualizations for comparing across different data types. In the future MOPED will expand the number of organisms, increase integration with pathways and provide connections to disease.
Assuntos
Bases de Dados Genéticas , Perfilação da Expressão Gênica , Proteômica , Animais , Humanos , Internet , Camundongos , Proteínas/genética , Proteínas/metabolismoRESUMO
Although biological science discovery often involves comparing conditions to a normal state, in proteomics little is actually known about normal. Two Human Proteome studies featured in Nature offer new insights into protein expression and an opportunity to assess how high-throughput proteomics measures normal protein ranges. We use data from these studies to estimate technical and biological variability in protein expression and compare them to other expression data sets from normal tissue. Results show that measured protein expression across same-tissue replicates vary by ±4- to 10-fold for most proteins. Coefficients of variation (CV) for protein expression measurements range from 62% to 117% across different tissue experiments; however, adjusting for technical variation reduced this variability by as much as 50%. In addition, the CV could also be reduced by limiting comparisons to proteins with at least 3 or more unique peptide identifications as the CV was on average 33% lower than for proteins with 2 or fewer peptide identifications. We also selected 13 housekeeping proteins and genes that were expressed across all tissues with low variability to determine their utility as a reference set for normalization and comparative purposes. These results present the first step toward estimating normal protein ranges by determining the variability in expression measurements through combining publicly available data. They support an approach that combines standard protocols with replicates of normal tissues to estimate normal protein ranges for large numbers of proteins and tissues. This would be a tremendous resource for normal cellular physiology and comparisons of proteomics studies.
Assuntos
Ensaios de Triagem em Larga Escala , Proteínas/metabolismo , Proteômica , Humanos , Valores de Referência , Reprodutibilidade dos TestesRESUMO
The Model Organism Protein Expression Database (MOPED, http://moped.proteinspire.org) is an expanding proteomics resource to enable biological and biomedical discoveries. MOPED aggregates simple, standardized and consistently processed summaries of protein expression and metadata from proteomics (mass spectrometry) experiments from human and model organisms (mouse, worm, and yeast). The latest version of MOPED adds new estimates of protein abundance and concentration as well as relative (differential) expression data. MOPED provides a new updated query interface that allows users to explore information by organism, tissue, localization, condition, experiment, or keyword. MOPED supports the Human Proteome Project's efforts to generate chromosome- and diseases-specific proteomes by providing links from proteins to chromosome and disease information as well as many complementary resources. MOPED supports a new omics metadata checklist to harmonize data integration, analysis, and use. MOPED's development is driven by the user community, which spans 90 countries and guides future development that will transform MOPED into a multiomics resource. MOPED encourages users to submit data in a simple format. They can use the metadata checklist to generate a data publication for this submission. As a result, MOPED will provide even greater insights into complex biological processes and systems and enable deeper and more comprehensive biological and biomedical discoveries.
Assuntos
Bases de Dados de Proteínas , Proteômica , Animais , Humanos , Interface Usuário-ComputadorRESUMO
Life science technologies generate a deluge of data that hold the keys to unlocking the secrets of important biological functions and disease mechanisms. We present DEAP, Differential Expression Analysis for Pathways, which capitalizes on information about biological pathways to identify important regulatory patterns from differential expression data. DEAP makes significant improvements over existing approaches by including information about pathway structure and discovering the most differentially expressed portion of the pathway. On simulated data, DEAP significantly outperformed traditional methods: with high differential expression, DEAP increased power by two orders of magnitude; with very low differential expression, DEAP doubled the power. DEAP performance was illustrated on two different gene and protein expression studies. DEAP discovered fourteen important pathways related to chronic obstructive pulmonary disease and interferon treatment that existing approaches omitted. On the interferon study, DEAP guided focus towards a four protein path within the 26 protein Notch signalling pathway.
Assuntos
Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Modelos Biológicos , Transdução de Sinais , Algoritmos , Simulação por Computador , Bases de Dados Genéticas , Doença/genética , Humanos , Reprodutibilidade dos TestesRESUMO
Large numbers of mass spectrometry proteomics studies are being conducted to understand all types of biological processes. The size and complexity of proteomics data hinders efforts to easily share, integrate, query and compare the studies. The Model Organism Protein Expression Database (MOPED, htttp://moped.proteinspire.org) is a new and expanding proteomics resource that enables rapid browsing of protein expression information from publicly available studies on humans and model organisms. MOPED is designed to simplify the comparison and sharing of proteomics data for the greater research community. MOPED uniquely provides protein level expression data, meta-analysis capabilities and quantitative data from standardized analysis. Data can be queried for specific proteins, browsed based on organism, tissue, localization and condition and sorted by false discovery rate and expression. MOPED empowers users to visualize their own expression data and compare it with existing studies. Further, MOPED links to various protein and pathway databases, including GeneCards, Entrez, UniProt, KEGG and Reactome. The current version of MOPED contains over 43,000 proteins with at least one spectral match and more than 11 million high certainty spectra.
Assuntos
Bases de Dados de Proteínas , Proteínas/metabolismo , Animais , Humanos , Espectrometria de Massas , Camundongos , Modelos Animais , Proteômica , Interface Usuário-ComputadorRESUMO
MOTIVATION: Enrichment tests are used in high-throughput experimentation to measure the association between gene or protein expression and membership in groups or pathways. The Fisher's exact test is commonly used. We specifically examined the associations produced by the Fisher test between protein identification by mass spectrometry discovery proteomics, and their Gene Ontology (GO) term assignments in a large yeast dataset. We found that direct application of the Fisher test is misleading in proteomics due to the bias in mass spectrometry to preferentially identify proteins based on their biochemical properties. False inference about associations can be made if this bias is not corrected. Our method adjusts Fisher tests for these biases and produces associations more directly attributable to protein expression rather than experimental bias. RESULTS: Using logistic regression, we modeled the association between protein identification and GO term assignments while adjusting for identification bias in mass spectrometry. The model accounts for five biochemical properties of peptides: (i) hydrophobicity, (ii) molecular weight, (iii) transfer energy, (iv) beta turn frequency and (v) isoelectric point. The model was fit on 181 060 peptides from 2678 proteins identified in 24 yeast proteomics datasets with a 1% false discovery rate. In analyzing the association between protein identification and their GO term assignments, we found that 25% (134 out of 544) of Fisher tests that showed significant association (q-value ≤0.05) were non-significant after adjustment using our model. Simulations generating yeast protein sets enriched for identification propensity show that unadjusted enrichment tests were biased while our approach worked well.
Assuntos
Espectrometria de Massas/métodos , Proteínas/classificação , Proteômica/métodos , Proteínas Fúngicas/química , Proteínas Fúngicas/metabolismo , Interações Hidrofóbicas e Hidrofílicas , Modelos Logísticos , Peptídeos/química , Proteínas/química , Proteínas/genéticaRESUMO
MS-based proteomics characterizes protein contents of biological samples. The most common approach is to first match observed MS/MS peptide spectra against theoretical spectra from a protein sequence database and then to score these matches. The false discovery rate (FDR) can be estimated as a function of the score by searching together the protein sequence database and its randomized version and comparing the score distributions of the randomized versus nonrandomized matches. This work introduces a straightforward isotonic regression-based method to estimate the cumulative FDRs and local FDRs (LFDRs) of peptide identification. Our isotonic method not only performed as well as other methods used for comparison, but also has the advantages of being: (i) monotonic in the score, (ii) computationally simple, and (iii) not dependent on assumptions about score distributions. We demonstrate the flexibility of our approach by using it to estimate FDRs and LFDRs for protein identification using summaries of the peptide spectra scores. We reconfirmed that several of these methods were superior to a two-peptide rule. Finally, by estimating both the FDRs and LFDRs, we showed for both peptide and protein identification, moderate FDR values (5%) corresponded to large LFDR values (53 and 60%).
Assuntos
Biologia Computacional , Bases de Dados de Proteínas , Peptídeos/análise , Proteínas/análiseRESUMO
MOTIVATION: The false discovery rate (FDR) has been widely adopted to address the multiple comparisons issue in high-throughput experiments such as microarray gene-expression studies. However, while the FDR is quite useful as an approach to limit false discoveries within a single experiment, like other multiple comparison corrections it may be an inappropriate way to compare results across experiments. This article uses several examples based on gene-expression data to demonstrate the potential misinterpretations that can arise from using FDR to compare across experiments. Researchers should be aware of these pitfalls and wary of using FDR to compare experimental results. FDR should be augmented with other measures such as p-values and expression ratios. It is worth including standard error and variance information for meta-analyses and, if possible, the raw data for re-analyses. This is especially important for high-throughput studies because data are often re-used for different objectives, including comparing common elements across many experiments. No single error rate or data summary may be appropriate for all of the different objectives.
Assuntos
Algoritmos , Artefatos , Interpretação Estatística de Dados , Reações Falso-Positivas , Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Reprodutibilidade dos Testes , Sensibilidade e EspecificidadeRESUMO
Staphylococcus aureus is a major cause of hospital-acquired pneumonia and is emerging as an important etiological agent of community-acquired pneumonia. Little is known about the specific host-pathogen interactions that occur when S. aureus first enters the airway. A shotgun proteomics approach was utilized to identify the airway proteins associated with S. aureus during the first 6 h of infection. Host proteins eluted from bacteria recovered from the airways of mice 30 min or 6 h following intranasal inoculation under anesthesia were subjected to liquid chromatography and tandem mass spectrometry. A total of 513 host proteins were associated with S. aureus 30 min and/or 6 h postinoculation. A majority of the identified proteins were host cytosolic proteins, suggesting that S. aureus was rapidly internalized by phagocytes in the airway and that significant host cell lysis occurred during early infection. In addition, extracellular matrix and secreted proteins, including fibronectin, antimicrobial peptides, and complement components, were associated with S. aureus at both time points. The interaction of 12 host proteins shown to bind to S. aureus in vitro was demonstrated in vivo for the first time. The association of hemoglobin, which is thought to be the primary staphylococcal iron source during infection, with S. aureus in the airway was validated by immunoblotting. Thus, we used our recently developed S. aureus pneumonia model and shotgun proteomics to validate previous in vitro findings and to identify nearly 500 other proteins that interact with S. aureus in vivo. The data presented here provide novel insights into the host-pathogen interactions that occur when S. aureus enters the airway.
Assuntos
Interações Hospedeiro-Patógeno , Pneumonia/microbiologia , Proteínas/isolamento & purificação , Infecções Estafilocócicas/microbiologia , Staphylococcus aureus/química , Animais , Líquido da Lavagem Broncoalveolar/química , Líquido da Lavagem Broncoalveolar/microbiologia , Cromatografia Líquida , Feminino , Humanos , Immunoblotting , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Ligação Proteica , Proteínas/química , Proteoma/análise , Proteoma/isolamento & purificação , Espectrometria de Massas em TandemRESUMO
Pneumonia caused by Staphylococcus aureus is a growing concern in the health care community. We hypothesized that characterization of the early innate immune response to bacteria in the lungs would provide insight into the mechanisms used by the host to protect itself from infection. An adult mouse model of Staphylococcus aureus pneumonia was utilized to define the early events in the innate immune response and to assess the changes in the airway proteome during the first 6 h of pneumonia. S. aureus actively replicated in the lungs of mice inoculated intranasally under anesthesia to cause significant morbidity and mortality. By 6 h postinoculation, the release of proinflammatory cytokines caused effective recruitment of neutrophils to the airway. Neutrophil influx, loss of alveolar architecture, and consolidated pneumonia were observed histologically 6 h postinoculation. Bronchoalveolar lavage fluids from mice inoculated with phosphate-buffered saline (PBS) or S. aureus were depleted of overabundant proteins and subjected to strong cation exchange fractionation followed by liquid chromatography and tandem mass spectrometry to identify the proteins present in the airway. No significant changes in response to PBS inoculation or 30 min following S. aureus inoculation were observed. However, a dramatic increase in extracellular proteins was observed 6 h postinoculation with S. aureus, with the increase dominated by inflammatory and coagulation proteins. The data presented here provide a comprehensive evaluation of the rapid and vigorous innate immune response mounted in the host airway during the earliest stages of S. aureus pneumonia.
Assuntos
Pneumonia Estafilocócica/imunologia , Proteoma/imunologia , Infecções Estafilocócicas/imunologia , Animais , Western Blotting , Líquido da Lavagem Broncoalveolar/química , Líquido da Lavagem Broncoalveolar/citologia , Cromatografia Líquida , Citocinas/análise , Citocinas/imunologia , Feminino , Pulmão/microbiologia , Pulmão/patologia , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Infiltração de Neutrófilos/imunologia , Pneumonia Estafilocócica/microbiologia , Pneumonia Estafilocócica/patologia , Infecções Estafilocócicas/patologia , Staphylococcus aureusRESUMO
MOTIVATION: Tandem mass-spectrometry of trypsin digests, followed by database searching, is one of the most popular approaches in high-throughput proteomics studies. Peptides are considered identified if they pass certain scoring thresholds. To avoid false positive protein identification, > or = 2 unique peptides identified within a single protein are generally recommended. Still, in a typical high-throughput experiment, hundreds of proteins are identified only by a single peptide. We introduce here a method for distinguishing between true and false identifications among single-hit proteins. The approach is based on randomized database searching and usage of logistic regression models with cross-validation. This approach is implemented to analyze three bacterial samples enabling recovery 68-98% of the correct single-hit proteins with an error rate of < 2%. This results in a 22-65% increase in number of identified proteins. Identifying true single-hit proteins will lead to discovering many crucial regulators, biomarkers and other low abundance proteins. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Bases de Dados de Proteínas , Armazenamento e Recuperação da Informação/métodos , Espectrometria de Massas/métodos , Mapeamento de Peptídeos/métodos , Proteínas/análise , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Algoritmos , Sequência de Aminoácidos , Simulação por Computador , Sistemas de Gerenciamento de Base de Dados , Modelos Logísticos , Modelos Químicos , Modelos Moleculares , Dados de Sequência Molecular , Reconhecimento Automatizado de Padrão , Proteínas/química , Análise de RegressãoRESUMO
Distinguishing Alzheimer's disease (AD) and frontotemporal dementia (FTD) currently relies on a clinical history and examination, but positron emission tomography with [(18)F] fluorodeoxyglucose (FDG-PET) shows different patterns of hypometabolism in these disorders that might aid differential diagnosis. Six dementia experts with variable FDG-PET experience made independent, forced choice, diagnostic decisions in 45 patients with pathologically confirmed AD (n = 31) or FTD (n = 14) using five separate methods: (1) review of clinical summaries, (2) a diagnostic checklist alone, (3) summary and checklist, (4) transaxial FDG-PET scans and (5) FDG-PET stereotactic surface projection (SSP) metabolic and statistical maps. In addition, we evaluated the effect of the sequential review of a clinical summary followed by SSP. Visual interpretation of SSP images was superior to clinical assessment and had the best inter-rater reliability (mean kappa = 0.78) and diagnostic accuracy (89.6%). It also had the highest specificity (97.6%) and sensitivity (86%), and positive likelihood ratio for FTD (36.5). The addition of FDG-PET to clinical summaries increased diagnostic accuracy and confidence for both AD and FTD. It was particularly helpful when raters were uncertain in their clinical diagnosis. Visual interpretation of FDG-PET after brief training is more reliable and accurate in distinguishing FTD from AD than clinical methods alone. FDG-PET adds important information that appropriately increases diagnostic confidence, even among experienced dementia specialists.
Assuntos
Encéfalo/diagnóstico por imagem , Demência/diagnóstico por imagem , Adulto , Idoso , Doença de Alzheimer/diagnóstico , Doença de Alzheimer/diagnóstico por imagem , Demência/diagnóstico , Diagnóstico Diferencial , Progressão da Doença , Feminino , Fluordesoxiglucose F18 , Humanos , Processamento de Imagem Assistida por Computador/métodos , Masculino , Pessoa de Meia-Idade , Variações Dependentes do Observador , Tomografia por Emissão de Pósitrons , Compostos Radiofarmacêuticos , Sensibilidade e EspecificidadeRESUMO
The identification and quantification of the proteins that a whole organism expresses under certain conditions is a main focus of high-throughput proteomics. Advanced proteomics approaches generate new biologically relevant data and potent hypotheses. A practical report of what proteome studies can and cannot accomplish in common laboratory settings is presented here. The review discusses the most popular tandem mass-spectrometry-based methods and focuses on how to produce reliable results. A step-by-step description of proteome experiments is given, including sample preparation, digestion, labeling, liquid chromatography, data processing, database searching and statistical analysis. The difficulties and bottlenecks of proteome analysis are addressed and the requirements for further improvements are discussed. Several diverse high-throughput proteomics-based studies of microorganisms are described.
Assuntos
Proteínas/análise , Proteômica/métodos , Espectrometria de Massas por Ionização por Electrospray/métodos , Sequência de Aminoácidos , Interpretação Estatística de Dados , Dados de Sequência MolecularRESUMO
Determining the error rate for peptide and protein identification accurately and reliably is necessary to enable evaluation and crosscomparisons of high throughput proteomics experiments. Currently, peptide identification is based either on preset scoring thresholds or on probabilistic models trained on datasets that are often dissimilar to experimental results. The false discovery rates (FDR) and peptide identification probabilities for these preset thresholds or models often vary greatly across different experimental treatments, organisms, or instruments used in specific experiments. To overcome these difficulties, randomized databases have been used to estimate the FDR. However, the cumulative FDR may include low probability identifications when there are a large number of peptide identifications and exclude high probability identifications when there are few. To overcome this logical inconsistency, this study expands the use of randomized databases to generate experiment-specific estimates of peptide identification probabilities. These experiment-specific probabilities are generated by logistic and Loess regression models of the peptide scores obtained from original and reshuffled database matches. These experiment-specific probabilities are shown to very well approximate "true" probabilities based on known standard protein mixtures across different experiments. Probabilities generated by the earlier Peptide_Prophet and more recent LIPS models are shown to differ significantly from this study's experiment-specific probabilities, especially for unknown samples. The experiment-specific probabilities reliably estimate the accuracy of peptide identifications and overcome potential logical inconsistencies of the cumulative FDR. This estimation method is demonstrated using a Sequest database search, LIPS model, and a reshuffled database. However, this approach is generally applicable to any search algorithm, peptide scoring, and statistical model when using a randomized database.
Assuntos
Bases de Dados de Proteínas , Peptídeos/química , Algoritmos , Modelos Biológicos , Probabilidade , Distribuição Aleatória , Análise de Regressão , SoftwareRESUMO
Medulloblastoma (MB) is the most common malignant pediatric brain tumor. Patient survival has remained largely the same for the past 20 years, with therapies causing significant health, cognitive, behavioral and developmental complications for those who survive the tumor. In this study, we profiled the total transcriptome and proteome of two established MB cell lines, Daoy and UW228, using high-throughput RNA sequencing (RNA-Seq) and label-free nano-LC-MS/MS-based quantitative proteomics, coupled with advanced pathway analysis. While Daoy has been suggested to belong to the sonic hedgehog (SHH) subtype, the exact UW228 subtype is not yet clearly established. Thus, a goal of this study was to identify protein markers and pathways that would help elucidate their subtype classification. A number of differentially expressed genes and proteins, including a number of adhesion, cytoskeletal and signaling molecules, were observed between the two cell lines. While several cancer-associated genes/proteins exhibited similar expression across the two cell lines, upregulation of a number of signature proteins and enrichment of key components of SHH and WNT signaling pathways were uniquely observed in Daoy and UW228, respectively. The novel information on differentially expressed genes/proteins and enriched pathways provide insights into the biology of MB, which could help elucidate their subtype classification.
RESUMO
Proteome analysis, utilizing high-throughput proteomics approaches, involves studying proteins that a whole organism (or specific tissue or cellular compartment) expresses under certain conditions. Intrinsic difficulties of these studies, as well as the enormous volumes of data they typically produce, make the proteome analysis and interpretation very difficult. As with any high-throughput approach, proteomics experiments should be carefully designed, analyzed, and verified. In addition to computational standards,experimental standards--simple and complex mixtures of known proteins--for high-throughput proteomics have to be developed and utilized. This article discusses such experimental standards and their implementations.
Assuntos
Proteoma/análise , Proteoma/normas , Proteômica/normas , Animais , Humanos , Proteômica/instrumentação , Avaliação da Tecnologia Biomédica/normasRESUMO
Current approaches in human embryonic stem cell (hESC) to pancreatic beta cell differentiation have largely been based on knowledge gained from developmental studies of the epithelial pancreas, while the potential roles of other supporting tissue compartments have not been fully explored. One such tissue is the pancreatic mesenchyme that supports epithelial organogenesis throughout embryogenesis. We hypothesized that detailed characterization of the pancreatic mesenchyme might result in the identification of novel factors not used in current differentiation protocols. Supplementing existing hESC differentiation conditions with such factors might create a more comprehensive simulation of normal development in cell culture. To validate our hypothesis, we took advantage of a novel transgenic mouse model to isolate the pancreatic mesenchyme at distinct embryonic and postnatal stages for subsequent proteomic analysis. Refined sample preparation and analysis conditions across four embryonic and prenatal time points resulted in the identification of 21,498 peptides with high-confidence mapping to 1,502 proteins. Expression analysis of pancreata confirmed the presence of three potentially important factors in cell differentiation: Galectin-1 (LGALS1), Neuroplastin (NPTN), and the Laminin α-2 subunit (LAMA2). Two of the three factors (LGALS1 and LAMA2) increased expression of pancreatic progenitor transcript levels in a published hESC to beta cell differentiation protocol. In addition, LAMA2 partially blocks cell culture induced beta cell dedifferentiation. Summarily, we provide evidence that proteomic analysis of supporting tissues such as the pancreatic mesenchyme allows for the identification of potentially important factors guiding hESC to pancreas differentiation.
RESUMO
This case study evaluates and tracks vitality of a city (Seattle), based on a data-driven approach, using strategic, robust, and sustainable metrics. This case study was collaboratively conducted by the Downtown Seattle Association (DSA) and CDO Analytics teams. The DSA is a nonprofit organization focused on making the city of Seattle and its Downtown a healthy and vibrant place to Live, Work, Shop, and Play. DSA primarily operates through public policy advocacy, community and business development, and marketing. In 2010, the organization turned to CDO Analytics ( cdoanalytics.org ) to develop a process that can guide and strategically focus DSA efforts and resources for maximal benefit to the city of Seattle and its Downtown. CDO Analytics was asked to develop clear, easily understood, and robust metrics for a baseline evaluation of the health of the city, as well as for ongoing monitoring and comparisons of the vitality, sustainability, and growth. The DSA and CDO Analytics teams strategized on how to effectively assess and track the vitality of Seattle and its Downtown. The two teams filtered a variety of data sources, and evaluated the veracity of multiple diverse metrics. This iterative process resulted in the development of a small number of strategic, simple, reliable, and sustainable metrics across four pillars of activity: Live, Work, Shop, and Play. Data during the 5 years before 2010 were used for the development of the metrics and model and its training, and data during the 5 years from 2010 and on were used for testing and validation. This work enabled DSA to routinely track these strategic metrics, use them to monitor the vitality of Downtown Seattle, prioritize improvements, and identify new value-added programs. As a result, the four-pillar approach became an integral part of the data-driven decision-making and execution of the Seattle community's improvement activities. The approach described in this case study is actionable, robust, inexpensive, and easy to adopt and sustain. It can be applied to cities, districts, counties, regions, states, or countries, enabling cross-comparisons and improvements of vitality, sustainability, and growth.
Assuntos
Planejamento de Cidades/métodos , Estudos de Casos Organizacionais , Humanos , Aprendizado de Máquina , WashingtonRESUMO
BACKGROUND: Latino individuals are the largest minority group and the fastest growing population group in the United States, yet there are few studies comparing the clinical features of Alzheimer disease (AD) in this population with those found in Anglo (white non-Latino) patients. OBJECTIVE: To compare the age at AD symptom onset in Latino and Anglo individuals. DESIGN: Cross-sectional assessment using standardized methods to collect and compare age at AD symptom onset, demographic variables, and medical variables. SETTING: Five National Institute on Aging-sponsored Alzheimer's Disease Centers with experience evaluating Spanish-speaking individuals. PATIENTS: We evaluated 119 Latino and 55 Anglo patients who had a diagnosis of AD. MAIN OUTCOME MEASURE: Age at symptom onset. RESULTS: After adjusting for center, sex, and years of education, Latino patients had a mean age at symptom onset 6.8 years earlier (95% confidence interval, 3.5-10.3 years earlier) than Anglo patients. CONCLUSIONS: An earlier age at symptom onset suggests that US mainland Latino individuals may experience an increased burden of AD compared with Anglo individuals. The basis for the younger age at symptom onset remains obscure.
Assuntos
Doença de Alzheimer/etnologia , Doença de Alzheimer/epidemiologia , Avaliação Geriátrica , Hispânico ou Latino/estatística & dados numéricos , População Branca/estatística & dados numéricos , Idade de Início , Idoso , Estudos de Casos e Controles , Efeitos Psicossociais da Doença , Estudos Transversais , Escolaridade , Feminino , Humanos , Masculino , Testes NeuropsicológicosRESUMO
Tandem mass spectrometry (MS/MS) combined with database searching is currently the most widely used method for high-throughput peptide and protein identification. Many different algorithms, scoring criteria, and statistical models have been used to identify peptides and proteins in complex biological samples, and many studies, including our own, describe the accuracy of these identifications, using at best generic terms such as "high confidence." False positive identification rates for these criteria can vary substantially with changing organisms under study, growth conditions, sequence databases, experimental protocols, and instrumentation; therefore, study-specific methods are needed to estimate the accuracy (false positive rates) of these peptide and protein identifications. We present and evaluate methods for estimating false positive identification rates based on searches of randomized databases (reversed and reshuffled). We examine the use of separate searches of a forward then a randomized database and combined searches of a randomized database appended to a forward sequence database. Estimated error rates from randomized database searches are first compared against actual error rates from MS/MS runs of known protein standards. These methods are then applied to biological samples of the model microorganism Shewanella oneidensis strain MR-1. Based on the results obtained in this study, we recommend the use of use of combined searches of a reshuffled database appended to a forward sequence database as a means providing quantitative estimates of false positive identification rates of peptides and proteins. This will allow researchers to set criteria and thresholds to achieve a desired error rate and provide the scientific community with direct and quantifiable measures of peptide and protein identification accuracy as opposed to vague assessments such as "high confidence."