RESUMO
Execution of the DNA damage response (DDR) relies upon a dynamic array of protein modifications. Using quantitative proteomics, we have globally profiled ubiquitination, acetylation, and phosphorylation in response to UV and ionizing radiation. To improve acetylation site profiling, we developed the strategy FACET-IP. Our datasets of 33,500 ubiquitination and 16,740 acetylation sites provide valuable insight into DDR remodeling of the proteome. We find that K6- and K33-linked polyubiquitination undergo bulk increases in response to DNA damage, raising the possibility that these linkages are largely dedicated to DDR function. We also show that Cullin-RING ligases mediate 10% of DNA damage-induced ubiquitination events and that EXO1 is an SCF-Cyclin F substrate in the response to UV radiation. Our extensive datasets uncover additional regulated sites on known DDR players such as PCNA and identify previously unknown DDR targets such as CENPs, underscoring the broad impact of the DDR on cellular physiology.
Assuntos
Dano ao DNA , Proteômica/métodos , Acetilação/efeitos da radiação , Proteínas Culina/metabolismo , Reparo do DNA , Enzimas Reparadoras do DNA/metabolismo , Bases de Dados de Proteínas , Exodesoxirribonucleases/metabolismo , Células HeLa , Humanos , Fosforilação/efeitos da radiação , Complexo de Endopeptidases do Proteassoma/metabolismo , Análise Serial de Proteínas/estatística & dados numéricos , Proteoma/metabolismo , Proteoma/efeitos da radiação , Proteômica/estatística & dados numéricos , Fuso Acromático/metabolismo , Ubiquitinação/efeitos da radiaçãoRESUMO
Histomorphology and immunohistochemistry are the most common ways of cancer classification in routine cancer diagnostics, but often reach their limits in determining the organ origin in metastasis. These cancers of unknown primary, which are mostly adenocarcinomas or squamous cell carcinomas, therefore require more sophisticated methodologies of classification. Here, we report a multiplex protein profiling-based approach for the classification of fresh frozen and formalin-fixed paraffin-embedded (FFPE) cancer tissue samples using the digital western blot technique DigiWest. A DigiWest-compatible FFPE extraction protocol was developed, and a total of 634 antibodies were tested in an initial set of 16 FFPE samples covering tumors from different origins. Of the 303 detected antibodies, 102 yielded significant correlation of signals in 25 pairs of fresh frozen and FFPE primary tumor samples, including head and neck squamous cell carcinomas (HNSC), lung squamous cell carcinomas (LUSC), lung adenocarcinomas (LUAD), colorectal adenocarcinomas (COAD), and pancreatic adenocarcinomas (PAAD). For this signature of 102 analytes (covering 88 total proteins and 14 phosphoproteins), a support vector machine (SVM) algorithm was developed. This allowed for the classification of the tissue of origin for all five tumor types studied here with high overall accuracies in both fresh frozen (90.4%) and FFPE (77.6%) samples. In addition, the SVM classifier reached an overall accuracy of 88% in an independent validation cohort of 25 FFPE tumor samples. Our results indicate that DigiWest-based protein profiling represents a valuable method for cancer classification, yielding conclusive and decisive data not only from fresh frozen specimens but also FFPE samples, thus making this approach attractive for routine clinical applications.
Assuntos
Western Blotting/métodos , Neoplasias/classificação , Análise Serial de Proteínas/métodos , Algoritmos , Biomarcadores Tumorais/metabolismo , Western Blotting/estatística & dados numéricos , Criopreservação , Formaldeído , Humanos , Proteínas de Neoplasias/metabolismo , Neoplasias/diagnóstico , Neoplasias/metabolismo , Especificidade de Órgãos , Inclusão em Parafina , Análise Serial de Proteínas/estatística & dados numéricos , Máquina de Vetores de Suporte , Fixação de TecidosRESUMO
The assessment of programmed death 1 ligand 1 (PD-L1) expression by Immunohistochemistry (IHC) is the US Food and Drug Administration (FDA)-approved predictive marker to select responders to checkpoint blockade anti-PD-1/PD-L1 axis immunotherapies. Different PD-L1 immunohistochemistry (IHC) assays use different antibodies and different scoring methods in tumor cells and immune cells. Multiple studies have compared the performance of these assays with variable results. Here, we investigate an alternative method for assessment of PD-L1 using a new technology known as digital spatial profiling. We use a previously described standardization tissue microarray (TMA) to assess the accuracy of the method and compare digital spatial profiler (DSP) to each FDA-approved PD-L1 assays, one LDT assay and three quantitative fluorescence assays. The standardized cell line Index tissue microarray contains 10 isogenic cells lines in triplicates expressing various ranges of PD-L1. The dynamic range of PD-L1 digital counts was measured in the ten cell lines on the Index TMA using the GeoMx DSP assay and read on the nCounter platform. The digital method shows very high correlation with immunohistochemistry scored with quantitative software and with quantitative fluorescence. High correlation of PD-L1 digital DSP counts were seen between rows on the same Index TMA. Finally, experiments from two Index TMAs showed reproducibility of DSP counts were independent of variable slide storage time over a three-week period after antibody labeling but before collection of cleaved tags. In summary, DSP appears to have quantitative potential comparable to quantitative immunohistochemistry. It is possible that this technology could be used as a PD-L1 protein measurement system for companion diagnostic testing for immune therapy.
Assuntos
Antígeno B7-H1/metabolismo , Análise Serial de Tecidos/métodos , Antígeno B7-H1/análise , Biomarcadores/análise , Biomarcadores/metabolismo , Linhagem Celular , Humanos , Imuno-Histoquímica/métodos , Imuno-Histoquímica/estatística & dados numéricos , Análise Serial de Proteínas/métodos , Análise Serial de Proteínas/estatística & dados numéricos , Reprodutibilidade dos Testes , Análise Serial de Tecidos/estatística & dados numéricosRESUMO
With the advent of high-throughput proteomics, the type and amount of data pose a significant challenge to statistical approaches used to validate current quantitative analysis. Whereas many studies focus on the analysis at the protein level, the analysis of peptide-level data provides insight into changes at the sub-protein level, including splice variants, isoforms and a range of post-translational modifications. Statistical evaluation of liquid chromatography-mass spectrometry/mass spectrometry peptide-based label-free differential data is most commonly performed using a t-test or analysis of variance, often after the application of data imputation to reduce the number of missing values. In high-throughput proteomics, statistical analysis methods and imputation techniques are difficult to evaluate, given the lack of gold standard data sets. Here, we use experimental and resampled data to evaluate the performance of four statistical analysis methods and the added value of imputation, for different numbers of biological replicates. We find that three or four replicates are the minimum requirement for high-throughput data analysis and confident assignment of significant changes. Data imputation does increase sensitivity in some cases, but leads to a much higher actual false discovery rate. Additionally, we find that empirical Bayes method (limma) achieves the highest sensitivity, and we thus recommend its use for performing differential expression analysis at the peptide level.
Assuntos
Peptídeos/genética , Peptídeos/metabolismo , Proteômica/métodos , Teorema de Bayes , Cromatografia Líquida , Biologia Computacional/métodos , Simulação por Computador , Interpretação Estatística de Dados , Bases de Dados de Proteínas/estatística & dados numéricos , Humanos , Análise Serial de Proteínas/estatística & dados numéricos , Proteômica/estatística & dados numéricos , Análise de Sequência de Proteína/métodos , Análise de Sequência de Proteína/estatística & dados numéricos , Espectrometria de Massas em TandemRESUMO
Accurate prognostic prediction using molecular information is a challenging area of research, which is essential to develop precision medicine. In this paper, we develop translational models to identify major actionable proteins that are associated with clinical outcomes, like the survival time of patients. There are considerable statistical and computational challenges due to the large dimension of the problems. Furthermore, data are available for different tumor types; hence data integration for various tumors is desirable. Having censored survival outcomes escalates one more level of complexity in the inferential procedure. We develop Bayesian hierarchical survival models, which accommodate all the challenges mentioned here. We use the hierarchical Bayesian accelerated failure time model for survival regression. Furthermore, we assume sparse horseshoe prior distribution for the regression coefficients to identify the major proteomic drivers. We borrow strength across tumor groups by introducing a correlation structure among the prior distributions. The proposed methods have been used to analyze data from the recently curated "The Cancer Proteome Atlas" (TCPA), which contains reverse-phase protein arrays-based high-quality protein expression data as well as detailed clinical annotation, including survival times. Our simulation and the TCPA data analysis illustrate the efficacy of the proposed integrative model, which links different tumors with the correlated prior structures.
Assuntos
Biometria/métodos , Neoplasias/metabolismo , Neoplasias/mortalidade , Proteoma/metabolismo , Proteômica/estatística & dados numéricos , Teorema de Bayes , Simulação por Computador , Interpretação Estatística de Dados , Humanos , Neoplasias Renais/metabolismo , Neoplasias Renais/mortalidade , Cadeias de Markov , Modelos Estatísticos , Método de Monte Carlo , Prognóstico , Análise Serial de Proteínas/estatística & dados numéricos , Análise de SobrevidaRESUMO
Lung squamous cell carcinoma (LUSCC), as the major type of lung cancer, has high morbidity and mortality rates. The prognostic markers for LUSCC are much fewer than lung adenocarcinoma. Besides, protein biomarkers have advantages of economy, accuracy and stability. The aim of this study was to construct a protein prognostic model for LUSCC. The protein expression data of LUSCC were downloaded from The Cancer Protein Atlas (TCPA) database. Clinical data of LUSCC patients were downloaded from The Cancer Genome Atlas (TCGA) database. A total of 237 proteins were identified from 325 cases of LUSCC patients based on the TCPA and TCGA database. According to Kaplan-Meier survival analysis, univariate and multivariate Cox analysis, a prognostic prediction model was established which was consisted of 6 proteins (CHK1_pS345, CHK2, IRS1, PAXILLIN, BRCA2 and BRAF_pS445). After calculating the risk values of each patient according to the coefficient of each protein in the risk model, the LUSCC patients were divided into high risk group and low risk group. The survival analysis demonstrated that there was significant difference between these two groups (p= 4.877e-05). The area under the curve (AUC) value of the receiver operating characteristic (ROC) curve was 0.699, which suggesting that the prognostic risk model could effectively predict the survival of LUSCC patients. Univariate and multivariate analysis indicated that this prognostic model could be used as independent prognosis factors for LUSCC patients. Proteins co-expression analysis showed that there were 21 proteins co-expressed with the proteins in the risk model. In conclusion, our study constructed a protein prognostic model, which could effectively predict the prognosis of LUSCC patients.
Assuntos
Biomarcadores Tumorais/genética , Carcinoma de Células Escamosas/mortalidade , Perfilação da Expressão Gênica , Neoplasias Pulmonares/mortalidade , Análise Serial de Proteínas/estatística & dados numéricos , Carcinoma de Células Escamosas/diagnóstico , Carcinoma de Células Escamosas/genética , Carcinoma de Células Escamosas/patologia , Linhagem Celular Tumoral , Estudos de Coortes , Conjuntos de Dados como Assunto , Feminino , Regulação Neoplásica da Expressão Gênica , Humanos , Estimativa de Kaplan-Meier , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/patologia , Masculino , Estadiamento de Neoplasias , Prognóstico , Curva ROC , Medição de Risco/métodosRESUMO
Integrating data from multiple regulatory layers across cancer types could elucidate additional mechanisms of oncogenesis. Using antibody-based protein profiling of 736 cancer cell lines, along with matching transcriptomic data, we show that pan-cancer bimodality in the amounts of mRNA, protein, and protein phosphorylation reveals mechanisms related to the epithelial-mesenchymal transition (EMT). Based on the bimodal expression of E-cadherin, we define an EMT signature consisting of 239 genes, many of which were not previously associated with EMT. By querying gene expression signatures collected from cancer cell lines after small-molecule perturbations, we identify enrichment for histone deacetylase (HDAC) inhibitors as inducers of EMT, and kinase inhibitors as mesenchymal-to-epithelial transition (MET) promoters. Causal modeling of protein-based signaling identifies putative drivers of EMT. In conclusion, integrative analysis of pan-cancer proteomic and transcriptomic data reveals key regulatory mechanisms of oncogenic transformation.
Assuntos
Transição Epitelial-Mesenquimal/genética , Neoplasias/genética , Neoplasias/metabolismo , Antígenos CD , Caderinas/genética , Caderinas/metabolismo , Carcinogênese , Linhagem Celular Tumoral , Biologia Computacional , Transição Epitelial-Mesenquimal/efeitos dos fármacos , Inibidores de Histona Desacetilases/farmacologia , Humanos , Modelos Genéticos , Modelos Estatísticos , Neoplasias/patologia , Fosforilação , Análise Serial de Proteínas/estatística & dados numéricos , Inibidores de Proteínas Quinases/farmacologia , Proteômica , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , RNA Neoplásico/genética , RNA Neoplásico/metabolismo , TranscriptomaRESUMO
Background: The differentially expressed proteins (DEPs) involved in the effect of hydrogen-rich water on myocardial ischemia reperfusion injury (MIRI) and their biological processes and signaling pathway were analyzed. Methods: 20 Wistar rats were randomly and equally divided into a control and a hydrogen-rich group. Hearts were removed and fixed in a Langendorff device. The control group was perfused with K-R solution, and the hydrogen-rich water group was perfused with K-R solution + hydrogen-rich water. Protein was extracted from the ventricular tissues, and GSR-CAA-67 was used to identify the DEPs between two groups. DEPs were analyzed through bioinformatic methods. Results: Compared with the control group, in the treatment group, the expression of 25 proteins was obviously decreased (P<0.05). For the DEPs, 359 biological processes, including the regulation of signaling pathways, immune reaction and formation of cardiovascular endothelial cells, were selected by GO enrichment analysis. Five signaling pathways were selected by KEGG pathway enrichment analysis. Conclusions: 25 proteins that are involved in hydrogen-water reducing MIRI were selected by high-throughput GSR-CAA-67. The biological processes and metabolic pathways involved in the DEPs were summarized, providing theoretical evidence for the clinical application of hydrogen-rich water.
Assuntos
Hidrogênio/farmacologia , Traumatismo por Reperfusão Miocárdica/tratamento farmacológico , Traumatismo por Reperfusão Miocárdica/metabolismo , Miocárdio/metabolismo , Proteínas/metabolismo , Animais , Análise por Conglomerados , Biologia Computacional , Ontologia Genética , Masculino , Análise Serial de Proteínas/estatística & dados numéricos , Proteínas/análise , Ratos Wistar , Água/químicaRESUMO
In this paper, we introduce a novel computational method for constructing protein networks based on reverse phase protein array (RPPA) data to identify complex patterns in protein signaling. The method is applied to phosphoproteomic profiles of basal expression and activation/phosphorylation of 76 key signaling proteins in three breast cancer cell lines (MCF7, LCC1, and LCC9). Temporal RPPA data are acquired at 48h, 96h, and 144h after knocking down four genes in separate experiments. These genes are selected from a previous study as important determinants for breast cancer survival. Interaction networks are constructed by analyzing the expression levels of protein pairs using a multivariate analysis of variance model. A new scoring criterion is introduced to determine relevant protein pairs. Through a network topology based analysis, we search for wiring patterns to identify key proteins that are associated with significant changes in expression levels across various experimental conditions.
Assuntos
Neoplasias da Mama/genética , Regulação Neoplásica da Expressão Gênica , Redes Reguladoras de Genes , Proteínas de Neoplasias/genética , Análise Serial de Proteínas/estatística & dados numéricos , Processamento de Proteína Pós-Traducional , ATPases Associadas a Diversas Atividades Celulares/antagonistas & inibidores , ATPases Associadas a Diversas Atividades Celulares/genética , ATPases Associadas a Diversas Atividades Celulares/metabolismo , Neoplasias da Mama/metabolismo , Neoplasias da Mama/patologia , Linhagem Celular Tumoral , Proteína Rica em Cisteína 61/antagonistas & inibidores , Proteína Rica em Cisteína 61/genética , Proteína Rica em Cisteína 61/metabolismo , Feminino , Humanos , Peptídeos e Proteínas de Sinalização Intracelular/antagonistas & inibidores , Peptídeos e Proteínas de Sinalização Intracelular/genética , Peptídeos e Proteínas de Sinalização Intracelular/metabolismo , Células MCF-7 , Análise Multivariada , Proteínas de Neoplasias/antagonistas & inibidores , Proteínas de Neoplasias/metabolismo , Fosforilação , Complexo de Endopeptidases do Proteassoma/genética , Complexo de Endopeptidases do Proteassoma/metabolismo , RNA Polimerase II/antagonistas & inibidores , RNA Polimerase II/genética , RNA Polimerase II/metabolismo , RNA Interferente Pequeno/genética , RNA Interferente Pequeno/metabolismo , Transdução de Sinais , Proteínas Supressoras de Tumor/antagonistas & inibidores , Proteínas Supressoras de Tumor/genética , Proteínas Supressoras de Tumor/metabolismoRESUMO
Mass spectrometry is being used to identify protein biomarkers that can facilitate development of drug treatment. Mass spectrometry-based labeling proteomic experiments result in complex proteomic data that is hierarchical in nature often with small sample size studies. The generalized linear model (GLM) is the most popular approach in proteomics to compare protein abundances between groups. However, GLM does not address all the complexities of proteomics data such as repeated measures and variance heterogeneity. Linear models for microarray data (LIMMA) and mixed models are two approaches that can address some of these data complexities to provide better statistical estimates. We compared these three statistical models (GLM, LIMMA, and mixed models) under two different normalization approaches (quantile normalization and median sweeping) to demonstrate when each approach is the best for tagged proteins. We evaluated these methods using a spiked-in data set of known protein abundances, a systemic lupus erythematosus (SLE) data set, and simulated data from multiplexed labeling experiments that use tandem mass tags (TMT). Data are available via ProteomeXchange with identifier PXD005486. We found median sweeping to be a preferred approach of data normalization, and with this normalization approach there was overlap with findings across all methods with GLM being a subset of mixed models. The conclusion is that the mixed model had the best type I error with median sweeping, whereas LIMMA had the better overall statistical properties regardless of normalization approaches.
Assuntos
Proteínas Sanguíneas/isolamento & purificação , Proteínas de Escherichia coli/isolamento & purificação , Lúpus Eritematoso Sistêmico/genética , Modelos Estatísticos , Análise Serial de Proteínas/estatística & dados numéricos , Proteínas Sanguíneas/química , Escherichia coli/genética , Escherichia coli/metabolismo , Proteínas de Escherichia coli/química , Humanos , Lúpus Eritematoso Sistêmico/sangue , Lúpus Eritematoso Sistêmico/diagnóstico , Lúpus Eritematoso Sistêmico/patologia , Proteômica/métodos , Proteômica/estatística & dados numéricos , Coloração e Rotulagem/métodosRESUMO
Peptide Microarray Immunoassay (PMI for brevity) is a novel technology that enables researchers to map a large number of proteomic measurements at a peptide level, providing information regarding the relationship between antibody response and clinical sensitivity. PMI studies aim at recognizing antigen-specific antibodies from serum samples and at detecting epitope regions of the protein antigen. PMI data present new challenges for statistical analysis mainly due to the structural dependence among peptides. A PMI is made of a complete library of consecutive peptides. They are synthesized by systematically shifting a window of a fixed number of amino acids through the finite sequence of amino acids of the antigen protein as ordered in the primary structure of the protein. This implies that consecutive peptides have a certain number of amino acids in common and hence are structurally dependent. We propose a new flexible Bayesian hierarchical model framework, which allows one to detect recognized peptides and bound epitope regions in a single framework, taking into account the structural dependence between peptides through a suitable latent Markov structure. The proposed model is illustrated using PMI data from a recent study about egg allergy. A simulation study shows that the proposed model is more powerful and robust in terms of epitope detection than simpler models overlooking some of the dependence structure.
Assuntos
Epitopos , Modelos Estatísticos , Análise Serial de Proteínas/estatística & dados numéricos , Teorema de Bayes , Bioestatística , Dessensibilização Imunológica , Hipersensibilidade a Ovo/imunologia , Hipersensibilidade a Ovo/terapia , Proteínas Dietéticas do Ovo/imunologia , Epitopos/genética , Humanos , Cadeias de Markov , Ovalbumina/imunologia , Peptídeos/genética , Peptídeos/imunologia , Proteômica/estatística & dados numéricos , Razão Sinal-RuídoRESUMO
BACKGROUND: The last version of the microarray-based testing ImmunoCAP ISAC 112™ includes the native walnut (Junglans regia) molecules 2S albumin (nJug r 1), vicilin (nJug r 2) and lipid transfer protein (nJug r 3). In view of the many unexpected cases of isolated positivity to nJug r 2 occurring in daily practice, we evaluated the association of these reactivities with clinical symptoms, as well as the relationship between sIgE and nJug r 2 and cross-reactive carbohydrate determinants (CCDs). METHODS: Sera from 320 consecutive allergic outpatients tested by ImmuoCAP ISAC™ 112 were considered. The medical records of all nJug r 2 positive patients were reviewed to assess clinical symptoms related to walnut allergy. A linear regression analysis was performed to evaluate the correlation between nJug r 2 and CCDs (nMUXF3) sIgE values, and a CAP inhibition assay was carried out to confirm the possible cross-reactivity between CCDs and nJug r 2. RESULTS: Thirty-seven out of 320 sera tested (11.6%) were positive to nJug r 2. Among them three (8.1%) and eight (21.6%) scored positive for nJug r 1 and nJug r 3 as well, respectively. Twenty-seven (73%) sera showed isolated nJug r 2 positivity. Only nJug r 1 reactors had symptoms referred to walnut allergy. Twenty-five/37 nJug r 2-positive sera (67.6%) showed a simultaneous positivity to nMUXF3 and a significant correlation (p<0.0001) between the IgE levels to nJug r 2 and nMUXF3 (r²=0.787). After incubation with nMUXF3 a complete inhibition of sIgE reactivity to both nMUXF3 and nJug r 2 was shown. CONCLUSIONS: The unexpected isolated sIgE reactivity to nJug r 2 found by ImmunoCAP ISAC™ 112 is frequently related to reactivity to cross-reactive carbohydrate epitopes and it is lacking clinical significance.
Assuntos
Alérgenos/sangue , Carboidratos/imunologia , Proteínas de Transporte/sangue , Imunoglobulina E/sangue , Hipersensibilidade a Noz/sangue , Análise Serial de Proteínas/estatística & dados numéricos , Proteínas de Armazenamento de Sementes/sangue , Alérgenos/imunologia , Viés , Proteínas de Transporte/imunologia , Reações Cruzadas , Epitopos/imunologia , Humanos , Juglans/química , Juglans/imunologia , Modelos Lineares , Hipersensibilidade a Noz/diagnóstico , Hipersensibilidade a Noz/imunologia , Pacientes Ambulatoriais , Proteínas de Armazenamento de Sementes/imunologiaRESUMO
Using a new type of array technology, the reverse phase protein array (RPPA), we measure time-course protein expression for a set of selected markers that are known to coregulate biological functions in a pathway structure. To accommodate the complex dependent nature of the data, including temporal correlation and pathway dependence for the protein markers, we propose a mixed effects model with temporal and protein-specific components. We develop a sequence of random probability measures (RPM) to account for the dependence in time of the protein expression measurements. Marginally, for each RPM we assume a Dirichlet process model. The dependence is introduced by defining multivariate beta distributions for the unnormalized weights of the stick-breaking representation. We also acknowledge the pathway dependence among proteins via a conditionally autoregressive model. Applying our model to the RPPA data, we reveal a pathway-dependent functional profile for the set of proteins as well as marginal expression profiles over time for individual markers.
Assuntos
Modelos Estatísticos , Análise Serial de Proteínas/estatística & dados numéricos , Proteômica/estatística & dados numéricos , Teorema de Bayes , Biomarcadores Tumorais/metabolismo , Biometria , Linhagem Celular Tumoral , Interpretação Estatística de Dados , Receptores ErbB/antagonistas & inibidores , Receptores ErbB/metabolismo , Feminino , Humanos , Lapatinib , Modelos Lineares , Cadeias de Markov , Método de Monte Carlo , Análise Multivariada , Neoplasias Ovarianas/tratamento farmacológico , Neoplasias Ovarianas/metabolismo , Quinazolinas/farmacologia , Transdução de Sinais/efeitos dos fármacos , Estatísticas não ParamétricasRESUMO
A major challenge in the field of high-throughput proteomics is the conversion of the large volume of experimental data that is generated into biological knowledge. Typically, proteomics experiments involve the combination and comparison of multiple data sets and the analysis and annotation of these combined results. Although there are some commercial applications that provide some of these functions, there is a need for a free, open source, multifunction tool for advanced proteomics data analysis. We have developed the Visualize program that provides users with the abilities to visualize, analyze, and annotate proteomics data; combine data from multiple runs, and quantitate differences between individual runs and combined data sets. Visualize is licensed under GNU GPL and can be downloaded from http://proteomics.mcw.edu/visualize. It is available as compiled client-based executable files for both Windows and Mac OS X platforms as well as PERL source code.
Assuntos
Proteômica/estatística & dados numéricos , Software , Algoritmos , Sequência de Aminoácidos , Biologia Computacional , Simulação por Computador , Interpretação Estatística de Dados , Bases de Dados de Proteínas/estatística & dados numéricos , Humanos , Espectrometria de Massas/estatística & dados numéricos , Análise Serial de Proteínas/estatística & dados numéricos , Proteínas/química , Proteínas/isolamento & purificaçãoRESUMO
Nuisance factors in a protein-array study add obfuscating variation to spot intensity measurements, diminishing the accuracy and precision of protein concentration predictions. The effects of nuisance factors may be reduced by design of experiments, and by estimating and then subtracting nuisance effects. Estimated nuisance effects also inform about the quality of the study and suggest refinements for future studies.We demonstrate a method to reduce nuisance effects by incorporating a non-interfering internal calibration in the study design and its complemental analysis of variance. We illustrate this method by applying a chip-level internal calibration in a biomarker discovery study. The variability of sample intensity estimates was reduced 16% to 92% with a median of 58%; confidence interval widths were reduced 8% to 70% with a median of 35%. Calibration diagnostics revealed processing nuisance trends potentially related to spot print order and chip location on a slide. The accuracy and precision of a protein-array study may be increased by incorporating a non-interfering internal calibration. Internal calibration modeling diagnostics improve confidence in study results and suggest process steps that may need refinement. Though developed for our protein-array studies, this internal calibration method is applicable to other targeted array-based studies.
Assuntos
Análise Serial de Proteínas/estatística & dados numéricos , Análise de Variância , Bioestatística , Ensaio de Imunoadsorção Enzimática/métodos , Ensaio de Imunoadsorção Enzimática/estatística & dados numéricos , Humanos , Modelos Estatísticos , Análise Serial de Proteínas/métodosRESUMO
The biomedical literature has always played a critical role in the development of hypotheses to test, experimental design, and the analysis of study results. Yet, the ever-expanding body of biomedical literature is starting to present new challenges, in which locating pertinent literature from among the millions of published research articles is often a challenging task. A regular expression-based pattern matching method has been developed to profile the various gene and protein factors that may play a role in various tissues contained within an organism. This methodology has been demonstrated through the profiling of the various factors that are involved in the development of the inner ear, and is shown to be both effective and accurate.
Assuntos
Orelha Interna/crescimento & desenvolvimento , Orelha Interna/metabolismo , Perfilação da Expressão Gênica/estatística & dados numéricos , Regulação da Expressão Gênica no Desenvolvimento , Biologia Computacional , Mineração de Dados , Substâncias de Crescimento/genética , Substâncias de Crescimento/metabolismo , Humanos , Análise de Sequência com Séries de Oligonucleotídeos/estatística & dados numéricos , Reconhecimento Automatizado de Padrão , Análise Serial de Proteínas/estatística & dados numéricos , SoftwareRESUMO
Mass spectrometry is one of the main tools for protein identification in complex mixtures. When the sequence of the protein is known, we can check to see if the known mass distribution of peptides for a given protein is present in the recorded mass distribution of the mixture being analyzed. Unfortunately, this general approach suffers from high false-positive rates, since in a complex mixture, the likelihood that we will observe any particular mass distribution is high, whether or not the protein of interest is in the mixture. In this paper, we propose a scoring methodology and algorithm for protein identification that make use of a new experimental technique, which we call receptor arrays, for separating a mixture based on another differentiating property of peptides called isoelectric point (pI). We perform extensive simulation experiments on several genomes and show that additional information about peptides can achieve an average 30% reduction in false-positive rates over existing methods, while achieving very high true-positive identification rates.
Assuntos
Análise Serial de Proteínas/métodos , Proteínas/química , Proteômica/métodos , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz/métodos , Algoritmos , Proteínas Arqueais/química , Proteínas Arqueais/genética , Proteínas Arqueais/isolamento & purificação , Biologia Computacional , Proteínas de Escherichia coli/química , Proteínas de Escherichia coli/genética , Proteínas de Escherichia coli/isolamento & purificação , Ponto Isoelétrico , Análise Serial de Proteínas/estatística & dados numéricos , Proteínas/genética , Proteínas/isolamento & purificação , Proteômica/estatística & dados numéricos , Proteínas de Saccharomyces cerevisiae/química , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/isolamento & purificação , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz/estatística & dados numéricosRESUMO
Protein microarrays are similar to DNA microarrays; both enabling the parallel interrogation of thousands of probes immobilized on a surface. Consequently, they have benefited from technologies previously developed for DNA microarrays. However, assumptions for the analysis of DNA microarrays do not always translate to protein arrays, especially in the case of normalization. Hence, we have developed an experimental and computational framework to assess normalization procedures for protein microarrays. Specifically, we profiled two sera with markedly different autoantibody compositions. To analyze intra- and interarray variability, we compared a set of control proteins across subarrays and the corresponding spots across multiple arrays, respectively. To estimate the degree to which the normalization could help reveal true biological separability, we tested the difference in the signal between the sera relative to the variability within replicates. Next, by mixing the sera in different proportions (titrations), we correlated the reactivity of proteins with serum concentration. Finally, we analyzed the effect of normalization procedures on the list of reactive proteins. We compared global and quantile normalization, techniques that have traditionally been employed for DNA microarrays, with a novel normalization approach based on a robust linear model (RLM) making explicit use of control proteins. We show that RLM normalization is able to reduce both intra- and interarray technical variability while maintaining biological differences. Moreover, in titration experiments, RLM normalization enhances the correlation of protein signals with serum concentration. Conversely, while quantile and global normalization can reduce interarray technical variability, neither is as effective as RLM normalization in maintaining biological differences. Most importantly, both introduce artifacts that distort the signals and affect the correct identification of reactive proteins, impairing their use for biomarker discovery. Hence, we show RLM normalization is better suited to protein arrays than approaches used for DNA microarrays.
Assuntos
Autoanticorpos/sangue , Modelos Lineares , Análise Serial de Proteínas/estatística & dados numéricos , Humanos , Modelos Estatísticos , Distribuição NormalRESUMO
INTRODUCTION: Few studies were concerned about searching for specific biomarkers for thromboembolic (arterial and venous) diseases by the use of Surface-Enhanced Laser Desorption/Ionization Time-of-Flight Mass Spectrometry (SELDI-TOF-MS). MATERIALS AND METHODS: We screened for potential biomarkers in 69 plasma samples, including samples from 20 patients with idiopathetic deep vein thrombosis (DVT), 20 patients with acute myocardial infarction (AMI), and 29 healthy controls without a history of thromboembolism. Pretreated plasma samples were analyzed on the Protein Biology System IIc plus SELDI-TOF-MS (Ciphergen Biosystems, Fremont, CA). Proteomic spectra of mass to charge ratio (m/z) were generated by the application of plasma to immobilized metal affinity capture (IMAC-3) ProteinChip arrays activated with copper. RESULTS: A pattern of three biomarkers (m/z: 2 667, 5 914, and 6 890 Da, respectively) with a total accuracy of 100% was selected based on their collective contribution to the optimal separation between patients with AMI and healthy controls. Another spattern consisting of only one biomarker (m/z: 5 914 Da) could totally discriminate patients with DVT and control subjects. For further analysis between patients with AMI and those with DVT, a pattern of four biomarkers (m/z: 3 418, 5 271, 33 378, and 68 125 Da, respectively) was selected with a total accuracy of 82.5%. CONCLUSIONS: Plasma proteomic profiling with SELDI-TOF-MS and ProteinChip technologies provides high sensitivity and specificity in discriminating patients with thrombosis and healthy subjects. The discovered biomarkers might show great potential for early diagnosis of thromboembolic diseases.
Assuntos
Biomarcadores/sangue , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz/métodos , Tromboembolia/sangue , Tromboembolia/diagnóstico , Adulto , Idoso , Idoso de 80 Anos ou mais , Análise Química do Sangue/métodos , Análise Química do Sangue/estatística & dados numéricos , Proteínas Sanguíneas/análise , Estudos de Casos e Controles , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Infarto do Miocárdio/sangue , Infarto do Miocárdio/diagnóstico , Análise Serial de Proteínas/métodos , Análise Serial de Proteínas/estatística & dados numéricos , Curva ROC , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz/estatística & dados numéricosRESUMO
In recent years, the antibody microarray technology has made significant progress, going from proof-of-concept designs to established high-performing technology platforms capable of targeting non-fractionated complex proteomes. In these cross-disciplinary efforts, a particular focus has lately been placed on two key technological issues: the sample and data handling. To this end, robust protocols have been designed for direct labelling of whole proteomes compatible with a sensitive fluorescent-based sensing. Tagging of the proteins with biotin in a single-colour approach has, in many cases, proven to be the preferred approach. Furthermore, based on modified approaches, adopted from the DNA microarray field, the first bioinformatic standards for performing the antibody microarray data analysis have emerged, though general standard operating procedure(s) remains to be implemented.