Pesquisa | BVS Violência e Saúde

Using Deep Learning to Extrapolate Protein Expression Measurements.

Barzine, Mitra Parissa; Freivalds, Karlis; Wright, James C; Opmanis, Martins; Rituma, Darta; Ghavidel, Fatemeh Zamanzad; Jarnuczak, Andrew F; Celms, Edgars; Cerans, Karlis; Jonassen, Inge; Lace, Lelde; Vizcaíno, Juan Antonio; Choudhary, Jyoti Sharma; Brazma, Alvis; Viksna, Juris.

Proteomics ; 20(21-22): e2000009, 2020 11.

Artigo em Inglês | MEDLINE | ID: mdl-32937025

RESUMO

Mass spectrometry (MS)-based quantitative proteomics experiments typically assay a subset of up to 60% of the ≈20 000 human protein coding genes. Computational methods for imputing the missing values using RNA expression data usually allow only for imputations of proteins measured in at least some of the samples. In silico methods for comprehensively estimating abundances across all proteins are still missing. Here, a novel method is proposed using deep learning to extrapolate the observed protein expression values in label-free MS experiments to all proteins, leveraging gene functional annotations and RNA measurements as key predictive attributes. This method is tested on four datasets, including human cell lines and human and mouse tissues. This method predicts the protein expression values with average R2 scores between 0.46 and 0.54, which is significantly better than predictions based on correlations using the RNA expression data alone. Moreover, it is demonstrated that the derived models can be "transferred" across experiments and species. For instance, the model derived from human tissues gave a R2=0.51 when applied to mouse tissue data. It is concluded that protein abundances generated in label-free MS experiments can be computationally predicted using functional annotated attributes and can be used to highlight aberrant protein abundance values.

Assuntos

Aprendizado Profundo , Animais , Espectrometria de Massas , Camundongos , Anotação de Sequência Molecular , Proteínas , Proteômica

Expression Atlas update--an integrated database of gene and protein expression in humans, animals and plants.

Petryszak, Robert; Keays, Maria; Tang, Y Amy; Fonseca, Nuno A; Barrera, Elisabet; Burdett, Tony; Füllgrabe, Anja; Fuentes, Alfonso Muñoz-Pomer; Jupp, Simon; Koskinen, Satu; Mannion, Oliver; Huerta, Laura; Megy, Karine; Snow, Catherine; Williams, Eleanor; Barzine, Mitra; Hastings, Emma; Weisser, Hendrik; Wright, James; Jaiswal, Pankaj; Huber, Wolfgang; Choudhary, Jyoti; Parkinson, Helen E; Brazma, Alvis.

Nucleic Acids Res ; 44(D1): D746-52, 2016 Jan 04.

Artigo em Inglês | MEDLINE | ID: mdl-26481351

RESUMO

Expression Atlas (http://www.ebi.ac.uk/gxa) provides information about gene and protein expression in animal and plant samples of different cell types, organism parts, developmental stages, diseases and other conditions. It consists of selected microarray and RNA-sequencing studies from ArrayExpress, which have been manually curated, annotated with ontology terms, checked for high quality and processed using standardised analysis methods. Since the last update, Atlas has grown seven-fold (1572 studies as of August 2015), and incorporates baseline expression profiles of tissues from Human Protein Atlas, GTEx and FANTOM5, and of cancer cell lines from ENCODE, CCLE and Genentech projects. Plant studies constitute a quarter of Atlas data. For genes of interest, the user can view baseline expression in tissues, and differential expression for biologically meaningful pairwise comparisons-estimated using consistent methodology across all of Atlas. Our first proteomics study in human tissues is now displayed alongside transcriptomics data in the same tissues. Novel analyses and visualisations include: 'enrichment' in each differential comparison of GO terms, Reactome, Plant Reactome pathways and InterPro domains; hierarchical clustering (by baseline expression) of most variable genes and experimental conditions; and, for a given gene-condition, distribution of baseline expression across biological replicates.

Assuntos

Bases de Dados Genéticas , Perfilação da Expressão Gênica , Plantas/metabolismo , Proteínas/metabolismo , Proteômica , Animais , Linhagem Celular Tumoral , Humanos , Plantas/genética , Interface Usuário-Computador

An integrated landscape of protein expression in human cancer.

Jarnuczak, Andrew F; Najgebauer, Hanna; Barzine, Mitra; Kundu, Deepti J; Ghavidel, Fatemeh; Perez-Riverol, Yasset; Papatheodorou, Irene; Brazma, Alvis; Vizcaíno, Juan Antonio.

Sci Data ; 8(1): 115, 2021 04 23.

Artigo em Inglês | MEDLINE | ID: mdl-33893311

RESUMO

Using 11 proteomics datasets, mostly available through the PRIDE database, we assembled a reference expression map for 191 cancer cell lines and 246 clinical tumour samples, across 13 lineages. We found unique peptides identified only in tumour samples despite a much higher coverage in cell lines. These were mainly mapped to proteins related to regulation of signalling receptor activity. Correlations between baseline expression in cell lines and tumours were calculated. We found these to be highly similar across all samples with most similarity found within a given sample type. Integration of proteomics and transcriptomics data showed median correlation across cell lines to be 0.58 (range between 0.43 and 0.66). Additionally, in agreement with previous studies, variation in mRNA levels was often a poor predictor of changes in protein abundance. To our knowledge, this work constitutes the first meta-analysis focusing on cancer-related public proteomics datasets. We therefore also highlight shortcomings and limitations of such studies. All data is available through PRIDE dataset identifier PXD013455 and in Expression Atlas.

Assuntos

Proteínas de Neoplasias/biossíntese , Neoplasias/metabolismo , Linhagem Celular Tumoral , Conjuntos de Dados como Assunto , Humanos , Proteínas de Neoplasias/genética , Neoplasias/genética , Proteômica , RNA Mensageiro/biossíntese , RNA Mensageiro/genética , Transcriptoma

Improving GENCODE reference gene annotation using a high-stringency proteogenomics workflow.

Wright, James C; Mudge, Jonathan; Weisser, Hendrik; Barzine, Mitra P; Gonzalez, Jose M; Brazma, Alvis; Choudhary, Jyoti S; Harrow, Jennifer.

Nat Commun ; 7: 11778, 2016 06 02.

Artigo em Inglês | MEDLINE | ID: mdl-27250503

RESUMO

Complete annotation of the human genome is indispensable for medical research. The GENCODE consortium strives to provide this, augmenting computational and experimental evidence with manual annotation. The rapidly developing field of proteogenomics provides evidence for the translation of genes into proteins and can be used to discover and refine gene models. However, for both the proteomics and annotation groups, there is a lack of guidelines for integrating this data. Here we report a stringent workflow for the interpretation of proteogenomic data that could be used by the annotation community to interpret novel proteogenomic evidence. Based on reprocessing of three large-scale publicly available human data sets, we show that a conservative approach, using stringent filtering is required to generate valid identifications. Evidence has been found supporting 16 novel protein-coding genes being added to GENCODE. Despite this many peptide identifications in pseudogenes cannot be annotated due to the absence of orthogonal supporting evidence.

Assuntos

Genoma Humano , Anotação de Sequência Molecular/métodos , Proteínas/genética , Proteogenômica/métodos , Pseudogenes , Sequência de Aminoácidos , Regulação da Expressão Gênica , Ontologia Genética , Humanos , Anotação de Sequência Molecular/estatística & dados numéricos , Fases de Leitura Aberta , Proteínas/metabolismo

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA