Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Sci Data ; 8(1): 115, 2021 04 23.
Artículo en Inglés | MEDLINE | ID: mdl-33893311

RESUMEN

Using 11 proteomics datasets, mostly available through the PRIDE database, we assembled a reference expression map for 191 cancer cell lines and 246 clinical tumour samples, across 13 lineages. We found unique peptides identified only in tumour samples despite a much higher coverage in cell lines. These were mainly mapped to proteins related to regulation of signalling receptor activity. Correlations between baseline expression in cell lines and tumours were calculated. We found these to be highly similar across all samples with most similarity found within a given sample type. Integration of proteomics and transcriptomics data showed median correlation across cell lines to be 0.58 (range between 0.43 and 0.66). Additionally, in agreement with previous studies, variation in mRNA levels was often a poor predictor of changes in protein abundance. To our knowledge, this work constitutes the first meta-analysis focusing on cancer-related public proteomics datasets. We therefore also highlight shortcomings and limitations of such studies. All data is available through PRIDE dataset identifier PXD013455 and in Expression Atlas.


Asunto(s)
Proteínas de Neoplasias/biosíntesis , Neoplasias/metabolismo , Línea Celular Tumoral , Conjuntos de Datos como Asunto , Humanos , Proteínas de Neoplasias/genética , Neoplasias/genética , Proteómica , ARN Mensajero/biosíntesis , ARN Mensajero/genética , Transcriptoma
2.
Proteomics ; 20(21-22): e2000009, 2020 11.
Artículo en Inglés | MEDLINE | ID: mdl-32937025

RESUMEN

Mass spectrometry (MS)-based quantitative proteomics experiments typically assay a subset of up to 60% of the ≈20 000 human protein coding genes. Computational methods for imputing the missing values using RNA expression data usually allow only for imputations of proteins measured in at least some of the samples. In silico methods for comprehensively estimating abundances across all proteins are still missing. Here, a novel method is proposed using deep learning to extrapolate the observed protein expression values in label-free MS experiments to all proteins, leveraging gene functional annotations and RNA measurements as key predictive attributes. This method is tested on four datasets, including human cell lines and human and mouse tissues. This method predicts the protein expression values with average R2 scores between 0.46 and 0.54, which is significantly better than predictions based on correlations using the RNA expression data alone. Moreover, it is demonstrated that the derived models can be "transferred" across experiments and species. For instance, the model derived from human tissues gave a R2=0.51 when applied to mouse tissue data. It is concluded that protein abundances generated in label-free MS experiments can be computationally predicted using functional annotated attributes and can be used to highlight aberrant protein abundance values.


Asunto(s)
Aprendizaje Profundo , Animales , Espectrometría de Masas , Ratones , Anotación de Secuencia Molecular , Proteínas , Proteómica
3.
Nat Commun ; 7: 11778, 2016 06 02.
Artículo en Inglés | MEDLINE | ID: mdl-27250503

RESUMEN

Complete annotation of the human genome is indispensable for medical research. The GENCODE consortium strives to provide this, augmenting computational and experimental evidence with manual annotation. The rapidly developing field of proteogenomics provides evidence for the translation of genes into proteins and can be used to discover and refine gene models. However, for both the proteomics and annotation groups, there is a lack of guidelines for integrating this data. Here we report a stringent workflow for the interpretation of proteogenomic data that could be used by the annotation community to interpret novel proteogenomic evidence. Based on reprocessing of three large-scale publicly available human data sets, we show that a conservative approach, using stringent filtering is required to generate valid identifications. Evidence has been found supporting 16 novel protein-coding genes being added to GENCODE. Despite this many peptide identifications in pseudogenes cannot be annotated due to the absence of orthogonal supporting evidence.


Asunto(s)
Genoma Humano , Anotación de Secuencia Molecular/métodos , Proteínas/genética , Proteogenómica/métodos , Seudogenes , Secuencia de Aminoácidos , Regulación de la Expresión Génica , Ontología de Genes , Humanos , Anotación de Secuencia Molecular/estadística & datos numéricos , Sistemas de Lectura Abierta , Proteínas/metabolismo
4.
Nucleic Acids Res ; 44(D1): D746-52, 2016 Jan 04.
Artículo en Inglés | MEDLINE | ID: mdl-26481351

RESUMEN

Expression Atlas (http://www.ebi.ac.uk/gxa) provides information about gene and protein expression in animal and plant samples of different cell types, organism parts, developmental stages, diseases and other conditions. It consists of selected microarray and RNA-sequencing studies from ArrayExpress, which have been manually curated, annotated with ontology terms, checked for high quality and processed using standardised analysis methods. Since the last update, Atlas has grown seven-fold (1572 studies as of August 2015), and incorporates baseline expression profiles of tissues from Human Protein Atlas, GTEx and FANTOM5, and of cancer cell lines from ENCODE, CCLE and Genentech projects. Plant studies constitute a quarter of Atlas data. For genes of interest, the user can view baseline expression in tissues, and differential expression for biologically meaningful pairwise comparisons-estimated using consistent methodology across all of Atlas. Our first proteomics study in human tissues is now displayed alongside transcriptomics data in the same tissues. Novel analyses and visualisations include: 'enrichment' in each differential comparison of GO terms, Reactome, Plant Reactome pathways and InterPro domains; hierarchical clustering (by baseline expression) of most variable genes and experimental conditions; and, for a given gene-condition, distribution of baseline expression across biological replicates.


Asunto(s)
Bases de Datos Genéticas , Perfilación de la Expresión Génica , Plantas/metabolismo , Proteínas/metabolismo , Proteómica , Animales , Línea Celular Tumoral , Humanos , Plantas/genética , Interfaz Usuario-Computador
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA