Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 40
Filter
1.
Nature ; 602(7897): 487-495, 2022 02.
Article in English | MEDLINE | ID: mdl-34942634

ABSTRACT

The emergence of SARS-CoV-2 variants of concern suggests viral adaptation to enhance human-to-human transmission1,2. Although much effort has focused on the characterization of changes in the spike protein in variants of concern, mutations outside of spike are likely to contribute to adaptation. Here, using unbiased abundance proteomics, phosphoproteomics, RNA sequencing and viral replication assays, we show that isolates of the Alpha (B.1.1.7) variant3 suppress innate immune responses in airway epithelial cells more effectively than first-wave isolates. We found that the Alpha variant has markedly increased subgenomic RNA and protein levels of the nucleocapsid protein (N), Orf9b and Orf6-all known innate immune antagonists. Expression of Orf9b alone suppressed the innate immune response through interaction with TOM70, a mitochondrial protein that is required for activation of the RNA-sensing adaptor MAVS. Moreover, the activity of Orf9b and its association with TOM70 was regulated by phosphorylation. We propose that more effective innate immune suppression, through enhanced expression of specific viral antagonist proteins, increases the likelihood of successful transmission of the Alpha variant, and may increase in vivo replication and duration of infection4. The importance of mutations outside the spike coding region in the adaptation of SARS-CoV-2 to humans is underscored by the observation that similar mutations exist in the N and Orf9b regulatory regions of the Delta and Omicron variants.


Subject(s)
COVID-19/immunology , COVID-19/virology , Evolution, Molecular , Immune Evasion , Immunity, Innate/immunology , SARS-CoV-2/genetics , SARS-CoV-2/immunology , COVID-19/transmission , Coronavirus Nucleocapsid Proteins/chemistry , Coronavirus Nucleocapsid Proteins/metabolism , Humans , Immunity, Innate/genetics , Interferons/immunology , Mitochondrial Precursor Protein Import Complex Proteins/metabolism , Phosphoproteins/chemistry , Phosphoproteins/metabolism , Phosphorylation , Proteomics , RNA, Viral/genetics , RNA-Seq , SARS-CoV-2/classification , SARS-CoV-2/growth & development
2.
Mol Cell ; 80(6): 1078-1091.e6, 2020 12 17.
Article in English | MEDLINE | ID: mdl-33290746

ABSTRACT

We report that the SARS-CoV-2 nucleocapsid protein (N-protein) undergoes liquid-liquid phase separation (LLPS) with viral RNA. N-protein condenses with specific RNA genomic elements under physiological buffer conditions and condensation is enhanced at human body temperatures (33°C and 37°C) and reduced at room temperature (22°C). RNA sequence and structure in specific genomic regions regulate N-protein condensation while other genomic regions promote condensate dissolution, potentially preventing aggregation of the large genome. At low concentrations, N-protein preferentially crosslinks to specific regions characterized by single-stranded RNA flanked by structured elements and these features specify the location, number, and strength of N-protein binding sites (valency). Liquid-like N-protein condensates form in mammalian cells in a concentration-dependent manner and can be altered by small molecules. Condensation of N-protein is RNA sequence and structure specific, sensitive to human body temperature, and manipulatable with small molecules, and therefore presents a screenable process for identifying antiviral compounds effective against SARS-CoV-2.


Subject(s)
COVID-19/metabolism , Coronavirus Nucleocapsid Proteins/metabolism , Genome, Viral , Nucleocapsid/metabolism , RNA, Viral/metabolism , SARS-CoV-2/metabolism , Animals , Antiviral Agents/pharmacology , COVID-19/genetics , Chlorocebus aethiops , Coronavirus Nucleocapsid Proteins/genetics , Drug Evaluation, Preclinical , HEK293 Cells , Humans , Nucleocapsid/genetics , Phosphoproteins/genetics , Phosphoproteins/metabolism , SARS-CoV-2/genetics , Vero Cells , COVID-19 Drug Treatment
3.
Nucleic Acids Res ; 51(D1): D942-D949, 2023 01 06.
Article in English | MEDLINE | ID: mdl-36420896

ABSTRACT

GENCODE produces high quality gene and transcript annotation for the human and mouse genomes. All GENCODE annotation is supported by experimental data and serves as a reference for genome biology and clinical genomics. The GENCODE consortium generates targeted experimental data, develops bioinformatic tools and carries out analyses that, along with externally produced data and methods, support the identification and annotation of transcript structures and the determination of their function. Here, we present an update on the annotation of human and mouse genes, including developments in the tools, data, analyses and major collaborations which underpin this progress. For example, we report the creation of a set of non-canonical ORFs identified in GENCODE transcripts, the LRGASP collaboration to assess the use of long transcriptomic data to build transcript models, the progress in collaborations with RefSeq and UniProt to increase convergence in the annotation of human and mouse protein-coding genes, the propagation of GENCODE across the human pan-genome and the development of new tools to support annotation of regulatory features by GENCODE. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.


Subject(s)
Computational Biology , Genome, Human , Humans , Animals , Mice , Molecular Sequence Annotation , Computational Biology/methods , Genome, Human/genetics , Transcriptome/genetics , Gene Expression Profiling , Databases, Genetic
5.
Nucleic Acids Res ; 49(D1): D916-D923, 2021 01 08.
Article in English | MEDLINE | ID: mdl-33270111

ABSTRACT

The GENCODE project annotates human and mouse genes and transcripts supported by experimental data with high accuracy, providing a foundational resource that supports genome biology and clinical genomics. GENCODE annotation processes make use of primary data and bioinformatic tools and analysis generated both within the consortium and externally to support the creation of transcript structures and the determination of their function. Here, we present improvements to our annotation infrastructure, bioinformatics tools, and analysis, and the advances they support in the annotation of the human and mouse genomes including: the completion of first pass manual annotation for the mouse reference genome; targeted improvements to the annotation of genes associated with SARS-CoV-2 infection; collaborative projects to achieve convergence across reference annotation databases for the annotation of human and mouse protein-coding genes; and the first GENCODE manually supervised automated annotation of lncRNAs. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.


Subject(s)
COVID-19/prevention & control , Computational Biology/methods , Databases, Genetic , Genomics/methods , Molecular Sequence Annotation/methods , SARS-CoV-2/genetics , Animals , COVID-19/epidemiology , COVID-19/virology , Epidemics , Humans , Internet , Mice , Pseudogenes/genetics , RNA, Long Noncoding/genetics , SARS-CoV-2/metabolism , SARS-CoV-2/physiology , Transcription, Genetic/genetics
6.
Mol Biol Evol ; 38(2): 486-501, 2021 01 23.
Article in English | MEDLINE | ID: mdl-32946576

ABSTRACT

Bumblebees are a diverse group of globally important pollinators in natural ecosystems and for agricultural food production. With both eusocial and solitary life-cycle phases, and some social parasite species, they are especially interesting models to understand social evolution, behavior, and ecology. Reports of many species in decline point to pathogen transmission, habitat loss, pesticide usage, and global climate change, as interconnected causes. These threats to bumblebee diversity make our reliance on a handful of well-studied species for agricultural pollination particularly precarious. To broadly sample bumblebee genomic and phenotypic diversity, we de novo sequenced and assembled the genomes of 17 species, representing all 15 subgenera, producing the first genus-wide quantification of genetic and genomic variation potentially underlying key ecological and behavioral traits. The species phylogeny resolves subgenera relationships, whereas incomplete lineage sorting likely drives high levels of gene tree discordance. Five chromosome-level assemblies show a stable 18-chromosome karyotype, with major rearrangements creating 25 chromosomes in social parasites. Differential transposable element activity drives changes in genome sizes, with putative domestications of repetitive sequences influencing gene coding and regulatory potential. Dynamically evolving gene families and signatures of positive selection point to genus-wide variation in processes linked to foraging, diet and metabolism, immunity and detoxification, as well as adaptations for life at high altitudes. Our study reveals how bumblebee genes and genomes have evolved across the Bombus phylogeny and identifies variations potentially linked to key ecological and behavioral traits of these important pollinators.


Subject(s)
Adaptation, Biological/genetics , Bees/genetics , Biological Evolution , Genome, Insect , Animals , Codon Usage , DNA Transposable Elements , Diet , Feeding Behavior , Gene Components , Genome Size , Selection, Genetic
7.
Genome Res ; 29(12): 2073-2087, 2019 12.
Article in English | MEDLINE | ID: mdl-31537640

ABSTRACT

The most widely appreciated role of DNA is to encode protein, yet the exact portion of the human genome that is translated remains to be ascertained. We previously developed PhyloCSF, a widely used tool to identify evolutionary signatures of protein-coding regions using multispecies genome alignments. Here, we present the first whole-genome PhyloCSF prediction tracks for human, mouse, chicken, fly, worm, and mosquito. We develop a workflow that uses machine learning to predict novel conserved protein-coding regions and efficiently guide their manual curation. We analyze more than 1000 high-scoring human PhyloCSF regions and confidently add 144 conserved protein-coding genes to the GENCODE gene set, as well as additional coding regions within 236 previously annotated protein-coding genes, and 169 pseudogenes, most of them disabled after primates diverged. The majority of these represent new discoveries, including 70 previously undetected protein-coding genes. The novel coding genes are additionally supported by single-nucleotide variant evidence indicative of continued purifying selection in the human lineage, coding-exon splicing evidence from new GENCODE transcripts using next-generation transcriptomic data sets, and mass spectrometry evidence of translation for several new genes. Our discoveries required simultaneous comparative annotation of other vertebrate genomes, which we show is essential to remove spurious ORFs and to distinguish coding from pseudogene regions. Our new coding regions help elucidate disease-associated regions by revealing that 118 GWAS variants previously thought to be noncoding are in fact protein altering. Altogether, our PhyloCSF data sets and algorithms will help researchers seeking to interpret these genomes, while our new annotations present exciting loci for further experimental characterization.


Subject(s)
Exons , Genome, Human , Genome-Wide Association Study , High-Throughput Nucleotide Sequencing , Open Reading Frames , Sequence Analysis, DNA , Animals , Humans , Pseudogenes
8.
Development ; 146(6)2019 03 28.
Article in English | MEDLINE | ID: mdl-30923056

ABSTRACT

Cell type specification during early nervous system development in Drosophila melanogaster requires precise regulation of gene expression in time and space. Resolving the programs driving neurogenesis has been a major challenge owing to the complexity and rapidity with which distinct cell populations arise. To resolve the cell type-specific gene expression dynamics in early nervous system development, we have sequenced the transcriptomes of purified neurogenic cell types across consecutive time points covering crucial events in neurogenesis. The resulting gene expression atlas comprises a detailed resource of global transcriptome dynamics that permits systematic analysis of how cells in the nervous system acquire distinct fates. We resolve known gene expression dynamics and uncover novel expression signatures for hundreds of genes among diverse neurogenic cell types, most of which remain unstudied. We also identified a set of conserved long noncoding RNAs (lncRNAs) that are regulated in a tissue-specific manner and exhibit spatiotemporal expression during neurogenesis with exquisite specificity. lncRNA expression is highly dynamic and demarcates specific subpopulations within neurogenic cell types. Our spatiotemporal transcriptome atlas provides a comprehensive resource for investigating the function of coding genes and noncoding RNAs during crucial stages of early neurogenesis.


Subject(s)
Drosophila melanogaster/genetics , Gene Expression Regulation, Developmental , Nervous System/embryology , Neurogenesis/genetics , RNA, Long Noncoding/genetics , Animals , Cell Lineage , Drosophila melanogaster/metabolism , Flow Cytometry , Gene Expression Profiling , Gene Regulatory Networks , In Situ Hybridization, Fluorescence , Neuroglia/physiology , Phylogeny , Transcriptome
9.
Nucleic Acids Res ; 47(D1): D766-D773, 2019 01 08.
Article in English | MEDLINE | ID: mdl-30357393

ABSTRACT

The accurate identification and description of the genes in the human and mouse genomes is a fundamental requirement for high quality analysis of data informing both genome biology and clinical genomics. Over the last 15 years, the GENCODE consortium has been producing reference quality gene annotations to provide this foundational resource. The GENCODE consortium includes both experimental and computational biology groups who work together to improve and extend the GENCODE gene annotation. Specifically, we generate primary data, create bioinformatics tools and provide analysis to support the work of expert manual gene annotators and automated gene annotation pipelines. In addition, manual and computational annotation workflows use any and all publicly available data and analysis, along with the research literature to identify and characterise gene loci to the highest standard. GENCODE gene annotations are accessible via the Ensembl and UCSC Genome Browsers, the Ensembl FTP site, Ensembl Biomart, Ensembl Perl and REST APIs as well as https://www.gencodegenes.org.


Subject(s)
Databases, Genetic , Genome, Human/genetics , Genomics , Pseudogenes/genetics , Animals , Computational Biology , Humans , Internet , Mice , Molecular Sequence Annotation , Software
10.
Mol Biol Evol ; 36(10): 2328-2339, 2019 10 01.
Article in English | MEDLINE | ID: mdl-31220870

ABSTRACT

Because of the degeneracy of the genetic code, multiple codons are translated into the same amino acid. Despite being "synonymous," these codons are not equally used. Selective pressures are thought to drive the choice among synonymous codons within a genome, while GC content, which is typically attributed to mutational drift, is the major determinant of variation across species. Here, we find that in addition to GC content, interspecies codon usage signatures can also be detected. More specifically, we show that a single amino acid, arginine, is the major contributor to codon usage bias differences across domains of life. We then exploit this finding and show that domain-specific codon bias signatures can be used to classify a given sequence into its corresponding domain of life with high accuracy. We then wondered whether the inclusion of codon usage codon autocorrelation patterns, which reflects the nonrandom distribution of codon occurrences throughout a transcript, might improve the classification performance of our algorithm. However, we find that autocorrelation patterns are not domain-specific, and surprisingly, are unrelated to tRNA reusage, in contrast to previous reports. Instead, our results suggest that codon autocorrelation patterns are a by-product of codon optimality throughout a sequence, where highly expressed genes display autocorrelated "optimal" codons, whereas lowly expressed genes display autocorrelated "nonoptimal" codons.


Subject(s)
Archaea/genetics , Bacteria/genetics , Codon Usage , Eukaryota/genetics , Arginine/genetics , Base Composition , Humans , Molecular Sequence Annotation , RNA, Transfer/metabolism
11.
BMC Genet ; 21(1): 25, 2020 03 06.
Article in English | MEDLINE | ID: mdl-32138667

ABSTRACT

BACKGROUND: POLG, located on nuclear chromosome 15, encodes the DNA polymerase γ(Pol γ). Pol γ is responsible for the replication and repair of mitochondrial DNA (mtDNA). Pol γ is the only DNA polymerase found in mitochondria for most animal cells. Mutations in POLG are the most common single-gene cause of diseases of mitochondria and have been mapped over the coding region of the POLG ORF. RESULTS: Using PhyloCSF to survey alternative reading frames, we found a conserved coding signature in an alternative frame in exons 2 and 3 of POLG, herein referred to as ORF-Y that arose de novo in placental mammals. Using the synplot2 program, synonymous site conservation was found among mammals in the region of the POLG ORF that is overlapped by ORF-Y. Ribosome profiling data revealed that ORF-Y is translated and that initiation likely occurs at a CUG codon. Inspection of an alignment of mammalian sequences containing ORF-Y revealed that the CUG codon has a strong initiation context and that a well-conserved predicted RNA stem-loop begins 14 nucleotides downstream. Such features are associated with enhanced initiation at near-cognate non-AUG codons. Reanalysis of the Kim et al. (2014) draft human proteome dataset yielded two unique peptides that map unambiguously to ORF-Y. An additional conserved uORF, herein referred to as ORF-Z, was also found in exon 2 of POLG. Lastly, we surveyed Clinvar variants that are synonymous with respect to the POLG ORF and found that most of these variants cause amino acid changes in ORF-Y or ORF-Z. CONCLUSIONS: We provide evidence for a novel coding sequence, ORF-Y, that overlaps the POLG ORF. Ribosome profiling and mass spectrometry data show that ORF-Y is expressed. PhyloCSF and synplot2 analysis show that ORF-Y is subject to strong purifying selection. An abundance of disease-correlated mutations that map to exons 2 and 3 of POLG but also affect ORF-Y provides potential clinical significance to this finding.


Subject(s)
Codon, Initiator/genetics , DNA Polymerase gamma/genetics , Mitochondria/genetics , Ribosomes/genetics , Amino Acid Sequence , DNA, Mitochondrial/genetics , Exons/genetics , Humans , Nucleic Acid Conformation , Open Reading Frames/genetics
12.
Nucleic Acids Res ; 46(14): 7070-7084, 2018 08 21.
Article in English | MEDLINE | ID: mdl-29982784

ABSTRACT

Seventeen years after the sequencing of the human genome, the human proteome is still under revision. One in eight of the 22 210 coding genes listed by the Ensembl/GENCODE, RefSeq and UniProtKB reference databases are annotated differently across the three sets. We have carried out an in-depth investigation on the 2764 genes classified as coding by one or more sets of manual curators and not coding by others. Data from large-scale genetic variation analyses suggests that most are not under protein-like purifying selection and so are unlikely to code for functional proteins. A further 1470 genes annotated as coding in all three reference sets have characteristics that are typical of non-coding genes or pseudogenes. These potential non-coding genes also appear to be undergoing neutral evolution and have considerably less supporting transcript and protein evidence than other coding genes. We believe that the three reference databases currently overestimate the number of human coding genes by at least 2000, complicating and adding noise to large-scale biomedical experiments. Determining which potential non-coding genes do not code for proteins is a difficult but vitally important task since the human reference proteome is a fundamental pillar of most basic research and supports almost all large-scale biomedical projects.


Subject(s)
Genes , Antibodies , DNA Copy Number Variations , Genetic Variation , Genome, Human , Humans , Molecular Sequence Annotation , Proteins/genetics , Proteins/immunology , Proteins/metabolism , Pseudogenes
13.
J Biol Chem ; 293(12): 4434-4444, 2018 03 23.
Article in English | MEDLINE | ID: mdl-29386352

ABSTRACT

Although stop codon readthrough is used extensively by viruses to expand their gene expression, verified instances of mammalian readthrough have only recently been uncovered by systems biology and comparative genomics approaches. Previously, our analysis of conserved protein coding signatures that extend beyond annotated stop codons predicted stop codon readthrough of several mammalian genes, all of which have been validated experimentally. Four mRNAs display highly efficient stop codon readthrough, and these mRNAs have a UGA stop codon immediately followed by CUAG (UGA_CUAG) that is conserved throughout vertebrates. Extending on the identification of this readthrough motif, we here investigated stop codon readthrough, using tissue culture reporter assays, for all previously untested human genes containing UGA_CUAG. The readthrough efficiency of the annotated stop codon for the sequence encoding vitamin D receptor (VDR) was 6.7%. It was the highest of those tested but all showed notable levels of readthrough. The VDR is a member of the nuclear receptor superfamily of ligand-inducible transcription factors, and it binds its major ligand, calcitriol, via its C-terminal ligand-binding domain. Readthrough of the annotated VDR mRNA results in a 67 amino acid-long C-terminal extension that generates a VDR proteoform named VDRx. VDRx may form homodimers and heterodimers with VDR but, compared with VDR, VDRx displayed a reduced transcriptional response to calcitriol even in the presence of its partner retinoid X receptor.


Subject(s)
Calcitriol/pharmacology , Calcium Channel Agonists/pharmacology , Codon, Terminator , Gene Expression Regulation/drug effects , Protein Biosynthesis , RNA, Messenger/metabolism , Receptors, Calcitriol/genetics , HEK293 Cells , HeLa Cells , Humans , Open Reading Frames , RNA, Messenger/genetics , Receptors, Calcitriol/biosynthesis
14.
Mol Biol Evol ; 33(12): 3108-3132, 2016 12.
Article in English | MEDLINE | ID: mdl-27604222

ABSTRACT

Translational stop codon readthrough emerged as a major regulatory mechanism affecting hundreds of genes in animal genomes, based on recent comparative genomics and ribosomal profiling evidence, but its evolutionary properties remain unknown. Here, we leverage comparative genomic evidence across 21 Anopheles mosquitoes to systematically annotate readthrough genes in the malaria vector Anopheles gambiae, and to provide the first study of abundant readthrough evolution, by comparison with 20 Drosophila species. Using improved comparative genomics methods for detecting readthrough, we identify evolutionary signatures of conserved, functional readthrough of 353 stop codons in the malaria vector, Anopheles gambiae, and of 51 additional Drosophila melanogaster stop codons, including several cases of double and triple readthrough and of readthrough of two adjacent stop codons. We find that most differences between the readthrough repertoires of the two species arose from readthrough gain or loss in existing genes, rather than birth of new genes or gene death; that readthrough-associated RNA structures are sometimes gained or lost while readthrough persists; that readthrough is more likely to be lost at TAA and TAG stop codons; and that readthrough is under continued purifying evolutionary selection in mosquito, based on population genetic evidence. We also determine readthrough-associated gene properties that predate readthrough, and identify differences in the characteristic properties of readthrough genes between clades. We estimate more than 600 functional readthrough stop codons in mosquito and 900 in fruit fly, provide evidence of readthrough control of peroxisomal targeting, and refine the phylogenetic extent of abundant readthrough as following divergence from centipede.


Subject(s)
Anopheles/genetics , Anopheles/metabolism , Codon, Terminator , Peptide Chain Termination, Translational , Animals , Biological Evolution , Codon , Drosophila melanogaster , Evolution, Molecular , Genomics , Open Reading Frames , Phylogeny , Protein Biosynthesis , Ribosomes/genetics , Ribosomes/metabolism
16.
Nature ; 478(7370): 476-82, 2011 Oct 12.
Article in English | MEDLINE | ID: mdl-21993624

ABSTRACT

The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5% of the human genome has undergone purifying selection, and locate constrained elements covering ∼4.2% of the genome. We use evolutionary signatures and comparisons with experimental data sets to suggest candidate functions for ∼60% of constrained bases. These elements reveal a small number of new coding exons, candidate stop codon readthrough events and over 10,000 regions of overlapping synonymous constraint within protein-coding exons. We find 220 candidate RNA structural families, and nearly a million elements overlapping potential promoter, enhancer and insulator regions. We report specific amino acid residues that have undergone positive selection, 280,000 non-coding elements exapted from mobile elements and more than 1,000 primate- and human-accelerated elements. Overlap with disease-associated variants indicates that our findings will be relevant for studies of human biology, health and disease.


Subject(s)
Evolution, Molecular , Genome, Human/genetics , Genome/genetics , Mammals/genetics , Animals , Disease , Exons/genetics , Genomics , Health , Humans , Molecular Sequence Annotation , Phylogeny , RNA/classification , RNA/genetics , Selection, Genetic/genetics , Sequence Alignment , Sequence Analysis, DNA
17.
Anal Chem ; 88(7): 3967-75, 2016 Apr 05.
Article in English | MEDLINE | ID: mdl-27010111

ABSTRACT

Computational, genomic, and proteomic approaches have been used to discover nonannotated protein-coding small open reading frames (smORFs). Some novel smORFs have crucial biological roles in cells and organisms, which motivates the search for additional smORFs. Proteomic smORF discovery methods are advantageous because they detect smORF-encoded polypeptides (SEPs) to validate smORF translation and SEP stability. Because SEPs are shorter and less abundant than average proteins, SEP detection using proteomics faces unique challenges. Here, we optimize several steps in the SEP discovery workflow to improve SEP isolation and identification. These changes have led to the detection of several new human SEPs (novel human genes), improved confidence in the SEP assignments, and enabled quantification of SEPs under different cellular conditions. These improvements will allow faster detection and characterization of new SEPs and smORFs.


Subject(s)
Open Reading Frames/genetics , Peptides/analysis , Peptides/genetics , HEK293 Cells , Humans , K562 Cells , Peptides/isolation & purification , Tumor Cells, Cultured
18.
Nucleic Acids Res ; 42(14): 8928-38, 2014 Aug.
Article in English | MEDLINE | ID: mdl-25013167

ABSTRACT

Stop codon readthrough is used extensively by viruses to expand their gene expression. Until recent discoveries in Drosophila, only a very limited number of readthrough cases in chromosomal genes had been reported. Analysis of conserved protein coding signatures that extend beyond annotated stop codons identified potential stop codon readthrough of four mammalian genes. Here we use a modified targeted bioinformatic approach to identify a further three mammalian readthrough candidates. All seven genes were tested experimentally using reporter constructs transfected into HEK-293T cells. Four displayed efficient stop codon readthrough, and these have UGA immediately followed by CUAG. Comparative genomic analysis revealed that in the four readthrough candidates containing UGA-CUAG, this motif is conserved not only in mammals but throughout vertebrates with the first six of the seven nucleotides being universally conserved. The importance of the CUAG motif was confirmed using a systematic mutagenesis approach. One gene, OPRL1, encoding an opiate receptor, displayed extremely efficient levels of readthrough (∼31%) in HEK-293T cells. Signals both 5' and 3' of the OPRL1 stop codon contribute to this high level of readthrough. The sequence UGA-CUA alone can support 1.5% readthrough, underlying its importance.


Subject(s)
Codon, Terminator , Protein Biosynthesis , Aminoglycosides/pharmacology , Animals , Anti-Bacterial Agents/pharmacology , Aquaporin 4/genetics , Conserved Sequence , HEK293 Cells , Humans , Mitogen-Activated Protein Kinase 10/genetics , Nucleotide Motifs , Phylogeny , Protein Biosynthesis/drug effects , Receptors, Opioid/genetics , Receptors, Opioid, kappa/genetics , Nociceptin Receptor
20.
J Proteome Res ; 13(3): 1757-65, 2014 Mar 07.
Article in English | MEDLINE | ID: mdl-24490786

ABSTRACT

The existence of nonannotated protein-coding human short open reading frames (sORFs) has been revealed through the direct detection of their sORF-encoded polypeptide (SEP) products. The discovery of novel SEPs increases the size of the genome and the proteome and provides insights into the molecular biology of mammalian cells, such as the prevalent usage of non-AUG start codons. Through modifications of the existing SEP-discovery workflow, we discover an additional 195 SEPs in K562 cells and extend this methodology to identify novel human SEPs in additional cell lines and human tissue for a final tally of 237 new SEPs. These results continue to expand the human genome and proteome and demonstrate that SEPs are a ubiquitous class of nonannotated polypeptides that require further investigation.


Subject(s)
Breast Neoplasms/chemistry , Genome, Human , Open Reading Frames , Peptides/analysis , Proteome/analysis , Breast Neoplasms/genetics , Cell Line , Chromatography, Liquid , Codon, Initiator/chemistry , Codon, Initiator/genetics , Female , Humans , K562 Cells , Peptides/chemistry , Protein Biosynthesis , Proteome/chemistry , Tandem Mass Spectrometry
SELECTION OF CITATIONS
SEARCH DETAIL