Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
Mais filtros











Base de dados
Intervalo de ano de publicação
1.
Nucleic Acids Res ; 2024 Jul 02.
Artigo em Inglês | MEDLINE | ID: mdl-38953162

RESUMO

Ribosome profiling experiments support the translation of a range of novel human open reading frames. By contrast, most peptides from large-scale proteomics experiments derive from just one source, 5' untranslated regions. Across the human genome we find evidence for 192 translated upstream regions, most of which would produce protein isoforms with extended N-terminal ends. Almost all of these N-terminal extensions are from highly abundant genes, which suggests that the novel regions we detect are just the tip of the iceberg. These upstream regions have characteristics that are not typical of coding exons. Their GC-content is remarkably high, even higher than 5' regions in other genes, and a large majority have non-canonical start codons. Although some novel upstream regions have cross-species conservation - five have orthologues in invertebrates for example - the reading frames of two thirds are not conserved beyond simians. These non-conserved regions also have no evidence of purifying selection, which suggests that much of this translation is not functional. In addition, non-conserved upstream regions have significantly more peptides in cancer cell lines than would be expected, a strong indication that an aberrant or noisy translation initiation process may play an important role in translation from upstream regions.

2.
Genome Biol Evol ; 14(12)2022 Dec 07.
Artigo em Inglês | MEDLINE | ID: mdl-36346145

RESUMO

The mutually exclusive splicing of tandem duplicated exons produces protein isoforms that are identical save for a homologous region that allows for the fine tuning of protein function. Tandem duplicated exon substitution events are rare, yet highly important alternative splicing events. Most events are ancient, their isoforms are highly expressed, and they have significantly more pathogenic mutations than other splice events. Here, we analyzed the physicochemical properties and functional roles of the homologous polypeptide regions produced by the 236 tandem duplicated exon substitutions annotated in the human gene set. We find that the most important structural and functional residues in these homologous regions are maintained, and that most changes are conservative rather than drastic. Three quarters of the isoforms produced from tandem duplicated exon substitution events are tissue-specific, particularly in nervous and cardiac tissues, and tandem duplicated exon substitution events are enriched in functional terms related to structures in the brain and skeletal muscle. We find considerable evidence for the convergent evolution of tandem duplicated exon substitution events in vertebrates, arthropods, and nematodes. Twelve human gene families have orthologues with tandem duplicated exon substitution events in both Drosophila melanogaster and Caenorhabditis elegans. Six of these gene families are ion transporters, suggesting that tandem exon duplication in genes that control the flow of ions into the cell has an adaptive benefit. The ancient origins, the strong indications of tissue-specific functions, and the evidence of convergent evolution suggest that these events may have played important roles in the evolution of animal tissues and organs.


Assuntos
Processamento Alternativo , Drosophila melanogaster , Animais , Humanos , Drosophila melanogaster/genética , Drosophila melanogaster/metabolismo , Éxons , Splicing de RNA , Isoformas de Proteínas/genética , Evolução Molecular
3.
Expert Rev Proteomics ; 12(6): 579-93, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26496066

RESUMO

The authors have carried out an investigation of the two "draft maps of the human proteome" published in 2014 in Nature. The findings include an abundance of poor spectra, low-scoring peptide-spectrum matches and incorrectly identified proteins in both these studies, highlighting clear issues with the application of false discovery rates. This noise means that the claims made by the two papers - the identification of high numbers of protein coding genes, the detection of novel coding regions and the draft tissue maps themselves - should be treated with considerable caution. The authors recommend that clinicians and researchers do not use the unfiltered data from these studies. Despite this these studies will inspire further investigation into tissue-based proteomics. As long as this future work has proper quality controls, it could help produce a consensus map of the human proteome and improve our understanding of the processes that underlie health and disease.


Assuntos
Bases de Dados de Proteínas , Proteoma/genética , Humanos , Peptídeos , Proteômica
4.
PLoS Comput Biol ; 11(6): e1004325, 2015 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-26061177

RESUMO

Alternative splicing of messenger RNA can generate a wide variety of mature RNA transcripts, and these transcripts may produce protein isoforms with diverse cellular functions. While there is much supporting evidence for the expression of alternative transcripts, the same is not true for the alternatively spliced protein products. Large-scale mass spectroscopy experiments have identified evidence of alternative splicing at the protein level, but with conflicting results. Here we carried out a rigorous analysis of the peptide evidence from eight large-scale proteomics experiments to assess the scale of alternative splicing that is detectable by high-resolution mass spectroscopy. We find fewer splice events than would be expected: we identified peptides for almost 64% of human protein coding genes, but detected just 282 splice events. This data suggests that most genes have a single dominant isoform at the protein level. Many of the alternative isoforms that we could identify were only subtly different from the main splice isoform. Very few of the splice events identified at the protein level disrupted functional domains, in stark contrast to the two thirds of splice events annotated in the human genome that would lead to the loss or damage of functional domains. The most striking result was that more than 20% of the splice isoforms we identified were generated by substituting one homologous exon for another. This is significantly more than would be expected from the frequency of these events in the genome. These homologous exon substitution events were remarkably conserved--all the homologous exons we identified evolved over 460 million years ago--and eight of the fourteen tissue-specific splice isoforms we identified were generated from homologous exons. The combination of proteomics evidence, ancient origin and tissue-specific splicing indicates that isoforms generated from homologous exons may have important cellular roles.


Assuntos
Processamento Alternativo/genética , Éxons/genética , Isoformas de Proteínas/genética , Sequência de Aminoácidos , Animais , Biologia Computacional , Bases de Dados Genéticas , Humanos , Camundongos , Modelos Moleculares , Dados de Sequência Molecular , Especificidade de Órgãos/genética , Peptídeos/química , Peptídeos/genética , Peptídeos/metabolismo , Conformação Proteica , Isoformas de Proteínas/química , Isoformas de Proteínas/metabolismo , Alinhamento de Sequência , Análise de Sequência de DNA
5.
Bioinformatics ; 31(14): 2257-61, 2015 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-25735770

RESUMO

Transposable elements constitute a large fraction of vertebrate genomes and, during evolution, may be co-opted for new functions. Exonization of transposable elements inserted within or close to host genes is one possible way to generate new genes, and alternative splicing of the new exons may represent an intermediate step in this process. The genes TMPO and ZNF451 are present in all vertebrate lineages. Although they are not evolutionarily related, mammalian TMPO and ZNF451 do have something in common-they both code for splice isoforms that contain LAP2alpha domains. We found that these LAP2alpha domains have sequence similarity to repetitive sequences in non-mammalian genomes, which are in turn related to the first ORF from a DIRS1-like retrotransposon. This retrotransposon domestication happened separately and resulted in proteins that combine retrotransposon and host protein domains. The alternative splicing of the retrotransposed sequence allowed the production of both the new and the untouched original isoforms, which may have contributed to the success of the colonization process. The LAP2alpha-specific isoform of TMPO (LAP2α) has been co-opted for important roles in the cell, whereas the ZNF451 LAP2alpha isoform is evolving under strong purifying selection but remains uncharacterized.


Assuntos
Processamento Alternativo , Proteínas de Ligação a DNA/genética , Mamíferos/genética , Proteínas de Membrana/genética , Retroelementos , Fatores de Transcrição/genética , Aminoaciltransferases , Animais , Evolução Molecular , Éxons , Genoma , Humanos , Isoformas de Proteínas/genética , Vertebrados/genética
6.
J Proteome Res ; 14(4): 1880-7, 2015 Apr 03.
Artigo em Inglês | MEDLINE | ID: mdl-25732134

RESUMO

Although eukaryotic cells express a wide range of alternatively spliced transcripts, it is not clear whether genes tend to express a range of transcripts simultaneously across cells, or produce dominant isoforms in a manner that is either tissue-specific or regardless of tissue. To date, large-scale investigations into the pattern of transcript expression across distinct tissues have produced contradictory results. Here, we attempt to determine whether genes express a dominant splice variant at the protein level. We interrogate peptides from eight large-scale human proteomics experiments and databases and find that there is a single dominant protein isoform, irrespective of tissue or cell type, for the vast majority of the protein-coding genes in these experiments, in partial agreement with the conclusions from the most recent large-scale RNAseq study. Remarkably, the dominant isoforms from the experimental proteomics analyses coincided overwhelmingly with the reference isoforms selected by two completely orthogonal sources, the consensus coding sequence variants, which are agreed upon by separate manual genome curation teams, and the principal isoforms from the APPRIS database, predicted automatically from the conservation of protein sequence, structure, and function.


Assuntos
Fases de Leitura Aberta/genética , Peptídeos/genética , Isoformas de Proteínas/genética , Proteômica/métodos , Biologia Computacional , Bases de Dados de Proteínas , Humanos
7.
Hum Mol Genet ; 23(22): 5866-78, 2014 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-24939910

RESUMO

Determining the full complement of protein-coding genes is a key goal of genome annotation. The most powerful approach for confirming protein-coding potential is the detection of cellular protein expression through peptide mass spectrometry (MS) experiments. Here, we mapped peptides detected in seven large-scale proteomics studies to almost 60% of the protein-coding genes in the GENCODE annotation of the human genome. We found a strong relationship between detection in proteomics experiments and both gene family age and cross-species conservation. Most of the genes for which we detected peptides were highly conserved. We found peptides for >96% of genes that evolved before bilateria. At the opposite end of the scale, we identified almost no peptides for genes that have appeared since primates, for genes that did not have any protein-like features or for genes with poor cross-species conservation. These results motivated us to describe a set of 2001 potential non-coding genes based on features such as weak conservation, a lack of protein features, or ambiguous annotations from major databases, all of which correlated with low peptide detection across the seven experiments. We identified peptides for just 3% of these genes. We show that many of these genes behave more like non-coding genes than protein-coding genes and suggest that most are unlikely to code for proteins under normal circumstances. We believe that their inclusion in the human protein-coding gene catalogue should be revised as part of the ongoing human genome annotation effort.


Assuntos
Proteínas/genética , Biologia Computacional , Genoma Humano , Humanos , Fases de Leitura Aberta , Peptídeos/genética , Proteínas/metabolismo , Proteômica
8.
Mol Biol Evol ; 29(9): 2265-83, 2012 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-22446687

RESUMO

Advances in high-throughput mass spectrometry are making proteomics an increasingly important tool in genome annotation projects. Peptides detected in mass spectrometry experiments can be used to validate gene models and verify the translation of putative coding sequences (CDSs). Here, we have identified peptides that cover 35% of the genes annotated by the GENCODE consortium for the human genome as part of a comprehensive analysis of experimental spectra from two large publicly available mass spectrometry databases. We detected the translation to protein of "novel" and "putative" protein-coding transcripts as well as transcripts annotated as pseudogenes and nonsense-mediated decay targets. We provide a detailed overview of the population of alternatively spliced protein isoforms that are detectable by peptide identification methods. We found that 150 genes expressed multiple alternative protein isoforms. This constitutes the largest set of reliably confirmed alternatively spliced proteins yet discovered. Three groups of genes were highly overrepresented. We detected alternative isoforms for 10 of the 25 possible heterogeneous nuclear ribonucleoproteins, proteins with a key role in the splicing process. Alternative isoforms generated from interchangeable homologous exons and from short indels were also significantly enriched, both in human experiments and in parallel analyses of mouse and Drosophila proteomics experiments. Our results show that a surprisingly high proportion (almost 25%) of the detected alternative isoforms are only subtly different from their constitutive counterparts. Many of the alternative splicing events that give rise to these alternative isoforms are conserved in mouse. It was striking that very few of these conserved splicing events broke Pfam functional domains or would damage globular protein structures. This evidence of a strong bias toward subtle differences in CDS and likely conserved cellular function and structure is remarkable and strongly suggests that the translation of alternative transcripts may be subject to selective constraints.


Assuntos
Processamento Alternativo , Proteínas/química , Proteínas/genética , Proteômica , Sequência de Aminoácidos , Animais , Domínio Catalítico , Drosophila , Genoma , Humanos , Camundongos , Modelos Moleculares , Anotação de Sequência Molecular , Dados de Sequência Molecular , Degradação do RNAm Mediada por Códon sem Sentido , Peptídeos/química , Peptídeos/genética , Complexo de Endopeptidases do Proteassoma/química , Biossíntese de Proteínas , Conformação Proteica , Domínios e Motivos de Interação entre Proteínas , Isoformas de Proteínas , Proteínas/metabolismo , Alinhamento de Sequência
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA