Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 45
Filtrar
1.
Nucleic Acids Res ; 52(14): 8112-8126, 2024 Aug 12.
Artigo em Inglês | MEDLINE | ID: mdl-38953162

RESUMO

Ribosome profiling experiments support the translation of a range of novel human open reading frames. By contrast, most peptides from large-scale proteomics experiments derive from just one source, 5' untranslated regions. Across the human genome we find evidence for 192 translated upstream regions, most of which would produce protein isoforms with extended N-terminal ends. Almost all of these N-terminal extensions are from highly abundant genes, which suggests that the novel regions we detect are just the tip of the iceberg. These upstream regions have characteristics that are not typical of coding exons. Their GC-content is remarkably high, even higher than 5' regions in other genes, and a large majority have non-canonical start codons. Although some novel upstream regions have cross-species conservation - five have orthologues in invertebrates for example - the reading frames of two thirds are not conserved beyond simians. These non-conserved regions also have no evidence of purifying selection, which suggests that much of this translation is not functional. In addition, non-conserved upstream regions have significantly more peptides in cancer cell lines than would be expected, a strong indication that an aberrant or noisy translation initiation process may play an important role in translation from upstream regions.


Assuntos
Regiões 5' não Traduzidas , Biossíntese de Proteínas , Humanos , Códon de Iniciação/genética , Composição de Bases , Genoma Humano , Animais , Fases de Leitura Aberta/genética , Sequência Conservada , Peptídeos/genética , Peptídeos/metabolismo
2.
Nucleic Acids Res ; 50(D1): D54-D59, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34755885

RESUMO

APPRIS (https://appris.bioinfo.cnio.es) is a well-established database housing annotations for protein isoforms for a range of species. APPRIS selects principal isoforms based on protein structure and function features and on cross-species conservation. Most coding genes produce a single main protein isoform and the principal isoforms chosen by the APPRIS database best represent this main cellular isoform. Human genetic data, experimental protein evidence and the distribution of clinical variants all support the relevance of APPRIS principal isoforms. APPRIS annotations and principal isoforms have now been expanded to 10 model organisms. In this paper we highlight the most recent updates to the database. APPRIS annotations have been generated for two new species, cow and chicken, the protein structural information has been augmented with reliable models from the EMBL-EBI AlphaFold database, and we have substantially expanded the confirmatory proteomics evidence available for the human genome. The most significant change in APPRIS has been the implementation of TRIFID functional isoform scores. TRIFID functional scores are assigned to all splice isoforms, and APPRIS uses the TRIFID functional scores and proteomics evidence to determine principal isoforms when core methods cannot.


Assuntos
Bases de Dados de Proteínas , Isoformas de Proteínas/genética , Proteínas/genética , Proteômica , Animais , Bovinos , Galinhas/genética , Humanos , Conformação Proteica , Isoformas de Proteínas/classificação , Proteínas/química , Proteínas/classificação
3.
Bioinformatics ; 38(Suppl_2): ii89-ii94, 2022 09 16.
Artigo em Inglês | MEDLINE | ID: mdl-36124785

RESUMO

MOTIVATION: Selecting the splice variant that best represents a coding gene is a crucial first step in many experimental analyses, and vital for mapping clinically relevant variants. This study compares the longest isoforms, MANE Select transcripts, APPRIS principal isoforms, and expression data, and aims to determine which method is best for selecting biological important reference splice variants for large-scale analyses. RESULTS: Proteomics analyses and human genetic variation data suggest that most coding genes have a single main protein isoform. We show that APPRIS principal isoforms and MANE Select transcripts best describe these main cellular isoforms, and find that using the longest splice variant as the representative is a poor strategy. Exons unique to the longest splice isoforms are not under selective pressure, and so are unlikely to be functionally relevant. Expression data are also a poor means of selecting the main splice variant. APPRIS principal and MANE Select exons are under purifying selection, while exons specific to alternative transcripts are not. There are MANE and APPRIS representatives for almost 95% of genes, and where they agree they are particularly effective, coinciding with the main proteomics isoform for over 98.2% of genes. AVAILABILITY AND IMPLEMENTATION: APPRIS principal isoforms for human, mouse and other model species can be downloaded from the APPRIS database (https://appris.bioinfo.cnio.es), GENCODE genes (https://www.gencodegenes.org/) and the Ensembl website (https://www.ensembl.org). MANE Select transcripts for the human reference set are available from the Ensembl, GENCODE and RefSeq databases (https://www.ncbi.nlm.nih.gov/refseq/). Lists of splice variants where MANE and APPRIS coincide are available from the APPRIS database. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Proteômica , Animais , Éxons , Humanos , Camundongos , Mutação , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo
4.
Nucleic Acids Res ; 49(14): 8232-8246, 2021 08 20.
Artigo em Inglês | MEDLINE | ID: mdl-34302486

RESUMO

Most coding genes in the human genome are annotated with multiple alternative transcripts. However, clear evidence for the functional relevance of the protein isoforms produced by these alternative transcripts is often hard to find. Alternative isoforms generated from tandem exon duplication-derived substitutions are an exception. These splice events are rare, but have important functional consequences. Here, we have catalogued the 236 tandem exon duplication-derived substitutions annotated in the GENCODE human reference set. We find that more than 90% of the events have a last common ancestor in teleost fish, so are at least 425 million years old, and twenty-one can be traced back to the Bilateria clade. Alternative isoforms generated from tandem exon duplication-derived substitutions also have significantly more clinical impact than other alternative isoforms. Tandem exon duplication-derived substitutions have >25 times as many pathogenic and likely pathogenic mutations as other alternative events. Tandem exon duplication-derived substitutions appear to have vital functional roles in the cell and may have played a prominent part in metazoan evolution.


Assuntos
Evolução Molecular , Peixes/genética , Genoma Humano/genética , Isoformas de Proteínas/genética , Processamento Alternativo/genética , Animais , Éxons/genética , Duplicação Gênica/genética , Humanos , Anotação de Sequência Molecular , Alinhamento de Sequência
5.
Genet Med ; 24(11): 2351-2366, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-36083290

RESUMO

PURPOSE: Germline loss-of-function variants in CTNNB1 cause neurodevelopmental disorder with spastic diplegia and visual defects (NEDSDV; OMIM 615075) and are the most frequent, recurrent monogenic cause of cerebral palsy (CP). We investigated the range of clinical phenotypes owing to disruptions of CTNNB1 to determine the association between NEDSDV and CP. METHODS: Genetic information from 404 individuals with collectively 392 pathogenic CTNNB1 variants were ascertained for the study. From these, detailed phenotypes for 52 previously unpublished individuals were collected and combined with 68 previously published individuals with comparable clinical information. The functional effects of selected CTNNB1 missense variants were assessed using TOPFlash assay. RESULTS: The phenotypes associated with pathogenic CTNNB1 variants were similar. A diagnosis of CP was not significantly associated with any set of traits that defined a specific phenotypic subgroup, indicating that CP is not additional to NEDSDV. Two CTNNB1 missense variants were dominant negative regulators of WNT signaling, highlighting the utility of the TOPFlash assay to functionally assess variants. CONCLUSION: NEDSDV is a clinically homogeneous disorder irrespective of initial clinical diagnoses, including CP, or entry points for genetic testing.


Assuntos
Deficiência Intelectual , Transtornos do Neurodesenvolvimento , Humanos , Fenótipo , Transtornos do Neurodesenvolvimento/genética , Via de Sinalização Wnt/genética , Deficiência Intelectual/genética , Genômica , beta Catenina/genética
6.
Trends Biochem Sci ; 42(2): 98-110, 2017 02.
Artigo em Inglês | MEDLINE | ID: mdl-27712956

RESUMO

Alternative splicing is commonly believed to be a major source of cellular protein diversity. However, although many thousands of alternatively spliced transcripts are routinely detected in RNA-seq studies, reliable large-scale mass spectrometry-based proteomics analyses identify only a small fraction of annotated alternative isoforms. The clearest finding from proteomics experiments is that most human genes have a single main protein isoform, while those alternative isoforms that are identified tend to be the most biologically plausible: those with the most cross-species conservation and those that do not compromise functional domains. Indeed, most alternative exons do not seem to be under selective pressure, suggesting that a large majority of predicted alternative transcripts may not even be translated into proteins.


Assuntos
Processamento Alternativo/genética , Proteoma/genética , Éxons , Isoformas de Proteínas/genética , Proteômica
7.
PLoS Comput Biol ; 16(10): e1008287, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-33017396

RESUMO

The role of alternative splicing is one of the great unanswered questions in cellular biology. There is strong evidence for alternative splicing at the transcript level, and transcriptomics experiments show that many splice events are tissue specific. It has been suggested that alternative splicing evolved in order to remodel tissue-specific protein-protein networks. Here we investigated the evidence for tissue-specific splicing among splice isoforms detected in a large-scale proteomics analysis. Although the data supporting alternative splicing is limited at the protein level, clear patterns emerged among the small numbers of alternative splice events that we could detect in the proteomics data. More than a third of these splice events were tissue-specific and most were ancient: over 95% of splice events that were tissue-specific in both proteomics and RNAseq analyses evolved prior to the ancestors of lobe-finned fish, at least 400 million years ago. By way of contrast, three in four alternative exons in the human gene set arose in the primate lineage, so our results cannot be extrapolated to the whole genome. Tissue-specific alternative protein forms in the proteomics analysis were particularly abundant in nervous and muscle tissues and their genes had roles related to the cytoskeleton and either the structure of muscle fibres or cell-cell connections. Our results suggest that this conserved tissue-specific alternative splicing may have played a role in the development of the vertebrate brain and heart.


Assuntos
Processamento Alternativo/genética , Especificidade de Órgãos/genética , Isoformas de Proteínas , Animais , Biologia Computacional , Genoma/genética , Humanos , Isoformas de Proteínas/química , Isoformas de Proteínas/classificação , Isoformas de Proteínas/genética , Proteômica
8.
Nucleic Acids Res ; 46(D1): D213-D217, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-29069475

RESUMO

The APPRIS database (http://appris-tools.org) uses protein structural and functional features and information from cross-species conservation to annotate splice isoforms in protein-coding genes. APPRIS selects a single protein isoform, the 'principal' isoform, as the reference for each gene based on these annotations. A single main splice isoform reflects the biological reality for most protein coding genes and APPRIS principal isoforms are the best predictors of these main proteins isoforms. Here, we present the updates to the database, new developments that include the addition of three new species (chimpanzee, Drosophila melangaster and Caenorhabditis elegans), the expansion of APPRIS to cover the RefSeq gene set and the UniProtKB proteome for six species and refinements in the core methods that make up the annotation pipeline. In addition APPRIS now provides a measure of reliability for individual principal isoforms and updates with each release of the GENCODE/Ensembl and RefSeq reference sets. The individual GENCODE/Ensembl, RefSeq and UniProtKB reference gene sets for six organisms have been merged to produce common sets of splice variants.


Assuntos
Bases de Dados Genéticas , Isoformas de Proteínas/genética , Processamento Alternativo , Sequência de Aminoácidos , Animais , Humanos , Modelos Moleculares , Anotação de Sequência Molecular , Conformação Proteica , Isoformas de Proteínas/química , Proteoma/genética , Reprodutibilidade dos Testes , Alinhamento de Sequência
9.
Nucleic Acids Res ; 46(14): 7070-7084, 2018 08 21.
Artigo em Inglês | MEDLINE | ID: mdl-29982784

RESUMO

Seventeen years after the sequencing of the human genome, the human proteome is still under revision. One in eight of the 22 210 coding genes listed by the Ensembl/GENCODE, RefSeq and UniProtKB reference databases are annotated differently across the three sets. We have carried out an in-depth investigation on the 2764 genes classified as coding by one or more sets of manual curators and not coding by others. Data from large-scale genetic variation analyses suggests that most are not under protein-like purifying selection and so are unlikely to code for functional proteins. A further 1470 genes annotated as coding in all three reference sets have characteristics that are typical of non-coding genes or pseudogenes. These potential non-coding genes also appear to be undergoing neutral evolution and have considerably less supporting transcript and protein evidence than other coding genes. We believe that the three reference databases currently overestimate the number of human coding genes by at least 2000, complicating and adding noise to large-scale biomedical experiments. Determining which potential non-coding genes do not code for proteins is a difficult but vitally important task since the human reference proteome is a fundamental pillar of most basic research and supports almost all large-scale biomedical projects.


Assuntos
Genes , Anticorpos , Variações do Número de Cópias de DNA , Variação Genética , Genoma Humano , Humanos , Anotação de Sequência Molecular , Proteínas/genética , Proteínas/imunologia , Proteínas/metabolismo , Pseudogenes
10.
Nucleic Acids Res ; 43(W1): W455-9, 2015 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-25990727

RESUMO

This paper introduces the APPRIS WebServer (http://appris.bioinfo.cnio.es) and WebServices (http://apprisws.bioinfo.cnio.es). Both the web servers and the web services are based around the APPRIS Database, a database that presently houses annotations of splice isoforms for five different vertebrate genomes. The APPRIS WebServer and WebServices provide access to the computational methods implemented in the APPRIS Database, while the APPRIS WebServices also allows retrieval of the annotations. The APPRIS WebServer and WebServices annotate splice isoforms with protein structural and functional features, and with data from cross-species alignments. In addition they can use the annotations of structure, function and conservation to select a single reference isoform for each protein-coding gene (the principal protein isoform). APPRIS principal isoforms have been shown to agree overwhelmingly with the main protein isoform detected in proteomics experiments. The APPRIS WebServer allows for the annotation of splice isoforms for individual genes, and provides a range of visual representations and tools to allow researchers to identify the likely effect of splicing events. The APPRIS WebServices permit users to generate annotations automatically in high throughput mode and to interrogate the annotations in the APPRIS Database. The APPRIS WebServices have been implemented using REST architecture to be flexible, modular and automatic.


Assuntos
Anotação de Sequência Molecular , Isoformas de Proteínas/genética , Software , Processamento Alternativo , Animais , Gatos , Bovinos , Cães , Humanos , Internet , Camundongos , Isoformas de Proteínas/química , Isoformas de Proteínas/metabolismo , Ratos
12.
Hum Mol Genet ; 23(22): 5866-78, 2014 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-24939910

RESUMO

Determining the full complement of protein-coding genes is a key goal of genome annotation. The most powerful approach for confirming protein-coding potential is the detection of cellular protein expression through peptide mass spectrometry (MS) experiments. Here, we mapped peptides detected in seven large-scale proteomics studies to almost 60% of the protein-coding genes in the GENCODE annotation of the human genome. We found a strong relationship between detection in proteomics experiments and both gene family age and cross-species conservation. Most of the genes for which we detected peptides were highly conserved. We found peptides for >96% of genes that evolved before bilateria. At the opposite end of the scale, we identified almost no peptides for genes that have appeared since primates, for genes that did not have any protein-like features or for genes with poor cross-species conservation. These results motivated us to describe a set of 2001 potential non-coding genes based on features such as weak conservation, a lack of protein features, or ambiguous annotations from major databases, all of which correlated with low peptide detection across the seven experiments. We identified peptides for just 3% of these genes. We show that many of these genes behave more like non-coding genes than protein-coding genes and suggest that most are unlikely to code for proteins under normal circumstances. We believe that their inclusion in the human protein-coding gene catalogue should be revised as part of the ongoing human genome annotation effort.


Assuntos
Proteínas/genética , Biologia Computacional , Genoma Humano , Humanos , Fases de Leitura Aberta , Peptídeos/genética , Proteínas/metabolismo , Proteômica
13.
Bioinformatics ; 31(14): 2257-61, 2015 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-25735770

RESUMO

Transposable elements constitute a large fraction of vertebrate genomes and, during evolution, may be co-opted for new functions. Exonization of transposable elements inserted within or close to host genes is one possible way to generate new genes, and alternative splicing of the new exons may represent an intermediate step in this process. The genes TMPO and ZNF451 are present in all vertebrate lineages. Although they are not evolutionarily related, mammalian TMPO and ZNF451 do have something in common-they both code for splice isoforms that contain LAP2alpha domains. We found that these LAP2alpha domains have sequence similarity to repetitive sequences in non-mammalian genomes, which are in turn related to the first ORF from a DIRS1-like retrotransposon. This retrotransposon domestication happened separately and resulted in proteins that combine retrotransposon and host protein domains. The alternative splicing of the retrotransposed sequence allowed the production of both the new and the untouched original isoforms, which may have contributed to the success of the colonization process. The LAP2alpha-specific isoform of TMPO (LAP2α) has been co-opted for important roles in the cell, whereas the ZNF451 LAP2alpha isoform is evolving under strong purifying selection but remains uncharacterized.


Assuntos
Processamento Alternativo , Proteínas de Ligação a DNA/genética , Mamíferos/genética , Proteínas de Membrana/genética , Retroelementos , Fatores de Transcrição/genética , Aminoaciltransferases , Animais , Evolução Molecular , Éxons , Genoma , Humanos , Isoformas de Proteínas/genética , Vertebrados/genética
14.
PLoS Comput Biol ; 11(6): e1004325, 2015 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-26061177

RESUMO

Alternative splicing of messenger RNA can generate a wide variety of mature RNA transcripts, and these transcripts may produce protein isoforms with diverse cellular functions. While there is much supporting evidence for the expression of alternative transcripts, the same is not true for the alternatively spliced protein products. Large-scale mass spectroscopy experiments have identified evidence of alternative splicing at the protein level, but with conflicting results. Here we carried out a rigorous analysis of the peptide evidence from eight large-scale proteomics experiments to assess the scale of alternative splicing that is detectable by high-resolution mass spectroscopy. We find fewer splice events than would be expected: we identified peptides for almost 64% of human protein coding genes, but detected just 282 splice events. This data suggests that most genes have a single dominant isoform at the protein level. Many of the alternative isoforms that we could identify were only subtly different from the main splice isoform. Very few of the splice events identified at the protein level disrupted functional domains, in stark contrast to the two thirds of splice events annotated in the human genome that would lead to the loss or damage of functional domains. The most striking result was that more than 20% of the splice isoforms we identified were generated by substituting one homologous exon for another. This is significantly more than would be expected from the frequency of these events in the genome. These homologous exon substitution events were remarkably conserved--all the homologous exons we identified evolved over 460 million years ago--and eight of the fourteen tissue-specific splice isoforms we identified were generated from homologous exons. The combination of proteomics evidence, ancient origin and tissue-specific splicing indicates that isoforms generated from homologous exons may have important cellular roles.


Assuntos
Processamento Alternativo/genética , Éxons/genética , Isoformas de Proteínas/genética , Sequência de Aminoácidos , Animais , Biologia Computacional , Bases de Dados Genéticas , Humanos , Camundongos , Modelos Moleculares , Dados de Sequência Molecular , Especificidade de Órgãos/genética , Peptídeos/química , Peptídeos/genética , Peptídeos/metabolismo , Conformação Proteica , Isoformas de Proteínas/química , Isoformas de Proteínas/metabolismo , Alinhamento de Sequência , Análise de Sequência de DNA
15.
Nucleic Acids Res ; 42(Database issue): D267-72, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24243844

RESUMO

FireDB (http://firedb.bioinfo.cnio.es) is a curated inventory of catalytic and biologically relevant small ligand-binding residues culled from the protein structures in the Protein Data Bank. Here we present the important new additions since the publication of FireDB in 2007. The database now contains an extensive list of manually curated biologically relevant compounds. Biologically relevant compounds are informative because of their role in protein function, but they are only a small fraction of the entire ligand set. For the remaining ligands, the FireDB provides cross-references to the annotations from publicly available biological, chemical and pharmacological compound databases. FireDB now has external references for 95% of contacting small ligands, making FireDB a more complete database and providing the scientific community with easy access to the pharmacological annotations of PDB ligands. In addition to the manual curation of ligands, FireDB also provides insights into the biological relevance of individual binding sites. Here, biological relevance is calculated from the multiple sequence alignments of related binding sites that are generated from all-against-all comparison of each FireDB binding site. The database can be accessed by RESTful web services and is available for download via MySQL.


Assuntos
Domínio Catalítico , Bases de Dados de Proteínas , Proteínas/química , Sítios de Ligação , Evolução Molecular , Internet , Ligantes , Anotação de Sequência Molecular , Preparações Farmacêuticas/química , Proteínas/genética
16.
J Proteome Res ; 14(4): 1880-7, 2015 Apr 03.
Artigo em Inglês | MEDLINE | ID: mdl-25732134

RESUMO

Although eukaryotic cells express a wide range of alternatively spliced transcripts, it is not clear whether genes tend to express a range of transcripts simultaneously across cells, or produce dominant isoforms in a manner that is either tissue-specific or regardless of tissue. To date, large-scale investigations into the pattern of transcript expression across distinct tissues have produced contradictory results. Here, we attempt to determine whether genes express a dominant splice variant at the protein level. We interrogate peptides from eight large-scale human proteomics experiments and databases and find that there is a single dominant protein isoform, irrespective of tissue or cell type, for the vast majority of the protein-coding genes in these experiments, in partial agreement with the conclusions from the most recent large-scale RNAseq study. Remarkably, the dominant isoforms from the experimental proteomics analyses coincided overwhelmingly with the reference isoforms selected by two completely orthogonal sources, the consensus coding sequence variants, which are agreed upon by separate manual genome curation teams, and the principal isoforms from the APPRIS database, predicted automatically from the conservation of protein sequence, structure, and function.


Assuntos
Fases de Leitura Aberta/genética , Peptídeos/genética , Isoformas de Proteínas/genética , Proteômica/métodos , Biologia Computacional , Bases de Dados de Proteínas , Humanos
17.
Expert Rev Proteomics ; 12(6): 579-93, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26496066

RESUMO

The authors have carried out an investigation of the two "draft maps of the human proteome" published in 2014 in Nature. The findings include an abundance of poor spectra, low-scoring peptide-spectrum matches and incorrectly identified proteins in both these studies, highlighting clear issues with the application of false discovery rates. This noise means that the claims made by the two papers - the identification of high numbers of protein coding genes, the detection of novel coding regions and the draft tissue maps themselves - should be treated with considerable caution. The authors recommend that clinicians and researchers do not use the unfiltered data from these studies. Despite this these studies will inspire further investigation into tissue-based proteomics. As long as this future work has proper quality controls, it could help produce a consensus map of the human proteome and improve our understanding of the processes that underlie health and disease.


Assuntos
Bases de Dados de Proteínas , Proteoma/genética , Humanos , Peptídeos , Proteômica
19.
Nucleic Acids Res ; 41(Database issue): D110-7, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-23161672

RESUMO

Here, we present APPRIS (http://appris.bioinfo.cnio.es), a database that houses annotations of human splice isoforms. APPRIS has been designed to provide value to manual annotations of the human genome by adding reliable protein structural and functional data and information from cross-species conservation. The visual representation of the annotations provided by APPRIS for each gene allows annotators and researchers alike to easily identify functional changes brought about by splicing events. In addition to collecting, integrating and analyzing reliable predictions of the effect of splicing events, APPRIS also selects a single reference sequence for each gene, here termed the principal isoform, based on the annotations of structure, function and conservation for each transcript. APPRIS identifies a principal isoform for 85% of the protein-coding genes in the GENCODE 7 release for ENSEMBL. Analysis of the APPRIS data shows that at least 70% of the alternative (non-principal) variants would lose important functional or structural information relative to the principal isoform.


Assuntos
Processamento Alternativo , Bases de Dados de Proteínas , Anotação de Sequência Molecular , Isoformas de Proteínas/química , Isoformas de Proteínas/genética , Humanos , Internet , Isoformas de Proteínas/metabolismo
20.
Mol Biol Evol ; 29(9): 2265-83, 2012 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-22446687

RESUMO

Advances in high-throughput mass spectrometry are making proteomics an increasingly important tool in genome annotation projects. Peptides detected in mass spectrometry experiments can be used to validate gene models and verify the translation of putative coding sequences (CDSs). Here, we have identified peptides that cover 35% of the genes annotated by the GENCODE consortium for the human genome as part of a comprehensive analysis of experimental spectra from two large publicly available mass spectrometry databases. We detected the translation to protein of "novel" and "putative" protein-coding transcripts as well as transcripts annotated as pseudogenes and nonsense-mediated decay targets. We provide a detailed overview of the population of alternatively spliced protein isoforms that are detectable by peptide identification methods. We found that 150 genes expressed multiple alternative protein isoforms. This constitutes the largest set of reliably confirmed alternatively spliced proteins yet discovered. Three groups of genes were highly overrepresented. We detected alternative isoforms for 10 of the 25 possible heterogeneous nuclear ribonucleoproteins, proteins with a key role in the splicing process. Alternative isoforms generated from interchangeable homologous exons and from short indels were also significantly enriched, both in human experiments and in parallel analyses of mouse and Drosophila proteomics experiments. Our results show that a surprisingly high proportion (almost 25%) of the detected alternative isoforms are only subtly different from their constitutive counterparts. Many of the alternative splicing events that give rise to these alternative isoforms are conserved in mouse. It was striking that very few of these conserved splicing events broke Pfam functional domains or would damage globular protein structures. This evidence of a strong bias toward subtle differences in CDS and likely conserved cellular function and structure is remarkable and strongly suggests that the translation of alternative transcripts may be subject to selective constraints.


Assuntos
Processamento Alternativo , Proteínas/química , Proteínas/genética , Proteômica , Sequência de Aminoácidos , Animais , Domínio Catalítico , Drosophila , Genoma , Humanos , Camundongos , Modelos Moleculares , Anotação de Sequência Molecular , Dados de Sequência Molecular , Degradação do RNAm Mediada por Códon sem Sentido , Peptídeos/química , Peptídeos/genética , Complexo de Endopeptidases do Proteassoma/química , Biossíntese de Proteínas , Conformação Proteica , Domínios e Motivos de Interação entre Proteínas , Isoformas de Proteínas , Proteínas/metabolismo , Alinhamento de Sequência
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA