Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 3 de 3
Filtrar
Mais filtros











Base de dados
Intervalo de ano de publicação
1.
Hum Mol Genet ; 23(22): 5866-78, 2014 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-24939910

RESUMO

Determining the full complement of protein-coding genes is a key goal of genome annotation. The most powerful approach for confirming protein-coding potential is the detection of cellular protein expression through peptide mass spectrometry (MS) experiments. Here, we mapped peptides detected in seven large-scale proteomics studies to almost 60% of the protein-coding genes in the GENCODE annotation of the human genome. We found a strong relationship between detection in proteomics experiments and both gene family age and cross-species conservation. Most of the genes for which we detected peptides were highly conserved. We found peptides for >96% of genes that evolved before bilateria. At the opposite end of the scale, we identified almost no peptides for genes that have appeared since primates, for genes that did not have any protein-like features or for genes with poor cross-species conservation. These results motivated us to describe a set of 2001 potential non-coding genes based on features such as weak conservation, a lack of protein features, or ambiguous annotations from major databases, all of which correlated with low peptide detection across the seven experiments. We identified peptides for just 3% of these genes. We show that many of these genes behave more like non-coding genes than protein-coding genes and suggest that most are unlikely to code for proteins under normal circumstances. We believe that their inclusion in the human protein-coding gene catalogue should be revised as part of the ongoing human genome annotation effort.


Assuntos
Proteínas/genética , Biologia Computacional , Genoma Humano , Humanos , Fases de Leitura Aberta , Peptídeos/genética , Proteínas/metabolismo , Proteômica
2.
Mol Biol Evol ; 29(9): 2265-83, 2012 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-22446687

RESUMO

Advances in high-throughput mass spectrometry are making proteomics an increasingly important tool in genome annotation projects. Peptides detected in mass spectrometry experiments can be used to validate gene models and verify the translation of putative coding sequences (CDSs). Here, we have identified peptides that cover 35% of the genes annotated by the GENCODE consortium for the human genome as part of a comprehensive analysis of experimental spectra from two large publicly available mass spectrometry databases. We detected the translation to protein of "novel" and "putative" protein-coding transcripts as well as transcripts annotated as pseudogenes and nonsense-mediated decay targets. We provide a detailed overview of the population of alternatively spliced protein isoforms that are detectable by peptide identification methods. We found that 150 genes expressed multiple alternative protein isoforms. This constitutes the largest set of reliably confirmed alternatively spliced proteins yet discovered. Three groups of genes were highly overrepresented. We detected alternative isoforms for 10 of the 25 possible heterogeneous nuclear ribonucleoproteins, proteins with a key role in the splicing process. Alternative isoforms generated from interchangeable homologous exons and from short indels were also significantly enriched, both in human experiments and in parallel analyses of mouse and Drosophila proteomics experiments. Our results show that a surprisingly high proportion (almost 25%) of the detected alternative isoforms are only subtly different from their constitutive counterparts. Many of the alternative splicing events that give rise to these alternative isoforms are conserved in mouse. It was striking that very few of these conserved splicing events broke Pfam functional domains or would damage globular protein structures. This evidence of a strong bias toward subtle differences in CDS and likely conserved cellular function and structure is remarkable and strongly suggests that the translation of alternative transcripts may be subject to selective constraints.


Assuntos
Processamento Alternativo , Proteínas/química , Proteínas/genética , Proteômica , Sequência de Aminoácidos , Animais , Domínio Catalítico , Drosophila , Genoma , Humanos , Camundongos , Modelos Moleculares , Anotação de Sequência Molecular , Dados de Sequência Molecular , Degradação do RNAm Mediada por Códon sem Sentido , Peptídeos/química , Peptídeos/genética , Complexo de Endopeptidases do Proteassoma/química , Biossíntese de Proteínas , Conformação Proteica , Domínios e Motivos de Interação entre Proteínas , Isoformas de Proteínas , Proteínas/metabolismo , Alinhamento de Sequência
3.
Genome Res ; 21(5): 756-67, 2011 May.
Artigo em Inglês | MEDLINE | ID: mdl-21460061

RESUMO

Recent advances in proteomic mass spectrometry (MS) offer the chance to marry high-throughput peptide sequencing to transcript models, allowing the validation, refinement, and identification of new protein-coding loci. We present a novel pipeline that integrates highly sensitive and statistically robust peptide spectrum matching with genome-wide protein-coding predictions to perform large-scale gene validation and discovery in the mouse genome for the first time. In searching an excess of 10 million spectra, we have been able to validate 32%, 17%, and 7% of all protein-coding genes, exons, and splice boundaries, respectively. Moreover, we present strong evidence for the identification of multiple alternatively spliced translations from 53 genes and have uncovered 10 entirely novel protein-coding genes, which are not covered in any mouse annotation data sources. One such novel protein-coding gene is a fusion protein that spans the Ins2 and Igf2 loci to produce a transcript encoding the insulin II and the insulin-like growth factor 2-derived peptides. We also report nine processed pseudogenes that have unique peptide hits, demonstrating, for the first time, that they are not just transcribed but are translated and are therefore resurrected into new coding loci. This work not only highlights an important utility for MS data in genome annotation but also provides unique insights into the gene structure and propagation in the mouse genome. All these data have been subsequently used to improve the publicly available mouse annotation available in both the Vega and Ensembl genome browsers (http://vega.sanger.ac.uk).


Assuntos
Processamento Alternativo , Genes , Peptídeos/genética , Proteômica/métodos , Pseudogenes/genética , Espectrometria de Massas em Tandem/métodos , Animais , Genoma , Genômica/métodos , Camundongos , Peptídeos/química
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA