RESUMO
Overlapping genes were thought to be essentially absent from the human genome until the discovery of abundant, frameshifted internal open reading frames (iORFs) nested within annotated protein coding sequences. However, it is currently unclear how many functional human iORFs exist and how they are expressed. We demonstrate that, in hundreds of cases, alternative transcript variants that bypass the start codon of annotated coding sequences (CDSs) can recode a human gene to express the iORF-encoded microprotein. While many human genes generate such non- coding alternative transcripts, they are poorly annotated. Here we develope a new analysis pipeline enabling the assignment of translated human iORFs to alternative transcripts, and provide long- read sequencing and molecular validation of their expression in dozens of cases. Finally, we demonstrate that a conserved DEDD2 iORF switches the function of this gene from pro- to anti- apoptotic. This work thus demonstrates that alternative transcript variants can broadly reprogram human genes to express frameshifted iORFs, revealing new levels of complexity in the human transcriptome and proteome.
RESUMO
A major scientific drive is to characterize the protein-coding genome as it provides the primary basis for the study of human health. But the fundamental question remains: what has been missed in prior genomic analyses? Over the past decade, the translation of non-canonical open reading frames (ncORFs) has been observed across human cell types and disease states, with major implications for proteomics, genomics, and clinical science. However, the impact of ncORFs has been limited by the absence of a large-scale understanding of their contribution to the human proteome. Here, we report the collaborative efforts of stakeholders in proteomics, immunopeptidomics, Ribo-seq ORF discovery, and gene annotation, to produce a consensus landscape of protein-level evidence for ncORFs. We show that at least 25% of a set of 7,264 ncORFs give rise to translated gene products, yielding over 3,000 peptides in a pan-proteome analysis encompassing 3.8 billion mass spectra from 95,520 experiments. With these data, we developed an annotation framework for ncORFs and created public tools for researchers through GENCODE and PeptideAtlas. This work will provide a platform to advance ncORF-derived proteins in biomedical discovery and, beyond humans, diverse animals and plants where ncORFs are similarly observed.
RESUMO
Over the past 15 years, hundreds of previously undiscovered bacterial small open reading frame (sORF)-encoded polypeptides (SEPs) of fewer than fifty amino acids have been identified, and biological functions have been ascribed to an increasing number of SEPs from intergenic regions and small RNAs. However, despite numbering in the dozens in Escherichia coli, and hundreds to thousands in humans, same-strand nested sORFs that overlap protein coding genes in alternative reading frames remain understudied. In order to provide insight into this enigmatic class of unannotated genes, we characterized GndA, a 36-amino acid, heat shock-regulated SEP encoded within the +2 reading frame of the gnd gene in E. coli K-12 MG1655. We show that GndA pulls down components of respiratory complex I (RCI) and is required for proper localization of a RCI subunit during heat shock. At high temperature GndA deletion (ΔGndA) cells exhibit perturbations in cell growth, NADH+/NAD ratio, and expression of a number of genes including several associated with oxidative stress. These findings suggest that GndA may function in maintenance of homeostasis during heat shock. Characterization of GndA therefore supports the nascent but growing consensus that functional, overlapping genes occur in genomes from viruses to humans.
RESUMO
The conserved WD40-repeat protein WDR5 interacts with multiple proteins both inside and outside the nucleus. However, it is currently unclear whether and how the distribution of WDR5 between complexes is regulated. Here, we show that an unannotated microprotein EMBOW (endogenous microprotein binder of WDR5) dually encoded in the human SCRIB gene interacts with WDR5 and regulates its binding to multiple interaction partners, including KMT2A and KIF2A. EMBOW is cell cycle regulated, with two expression maxima at late G1 phase and G2/M phase. Loss of EMBOW decreases WDR5 interaction with KIF2A, aberrantly shortens mitotic spindle length, prolongs G2/M phase, and delays cell proliferation. In contrast, loss of EMBOW increases WDR5 interaction with KMT2A, leading to WDR5 binding to off-target genes, erroneously increasing H3K4me3 levels, and activating transcription of these genes. Together, these results implicate EMBOW as a regulator of WDR5 that regulates its interactions and prevents its off-target binding in multiple contexts.
Assuntos
Cromatina , Peptídeos e Proteínas de Sinalização Intracelular , Humanos , Peptídeos e Proteínas de Sinalização Intracelular/genética , Proliferação de Células , Fuso Acromático , Cinesinas/genética , MicropeptídeosRESUMO
Advances in proteogenomic technologies have revealed hundreds to thousands of translated small open reading frames (sORFs) that encode microproteins in genomes across evolutionary space. While many microproteins have now been shown to play critical roles in biology and human disease, a majority of recently identified microproteins have little or no experimental evidence regarding their functionality. Computational tools have some limitations for analysis of short, poorly conserved microprotein sequences, so additional approaches are needed to determine the role of each member of this recently discovered polypeptide class. A currently underexplored avenue in the study of microproteins is structure prediction and determination, which delivers a depth of functional information. In this review, we provide a brief overview of microprotein discovery methods, then examine examples of microprotein structures (and, conversely, intrinsic disorder) that have been experimentally determined using crystallography, cryo-electron microscopy, and NMR, which provide insight into their molecular functions and mechanisms. Additionally, we discuss examples of predicted microprotein structures that have provided insight or context regarding their function. Analysis of microprotein structure at the angstrom level, and confirmation of predicted structures, therefore, has potential to identify translated microproteins that are of biological importance and to provide molecular mechanism for their in vivo roles.
Assuntos
Micropeptídeos , Proteogenômica , Humanos , Microscopia Crioeletrônica , Peptídeos , Proteogenômica/métodos , Fases de Leitura AbertaRESUMO
Translation of noncoding regions is ubiquitous and upregulated in disease. Kesner et al.1 elucidate the mechanism by which the BAG6 complex exerts quality control over noncoding translation while targeting stable, noncanonical polypeptides to cellular membranes.
Assuntos
Chaperonas Moleculares , Peptídeos , Controle de QualidadeRESUMO
Thousands of unannotated small and alternative open reading frames (smORFs and alt-ORFs, respectively) have recently been revealed in mammalian genomes. While hundreds of mammalian smORF- and alt-ORF-encoded proteins (SEPs and alt-proteins, respectively) affect cell proliferation, the overwhelming majority of smORFs and alt-ORFs remain uncharacterized at the molecular level. Complicating the task of identifying the biological roles of smORFs and alt-ORFs, the SEPs and alt-proteins that they encode exhibit limited sequence homology to protein domains of known function. Experimental techniques for the functionalization of these gene classes are therefore required. Approaches combining chemical labeling and quantitative proteomics have greatly advanced our ability to identify and characterize functional SEPs and alt-proteins in high throughput. In this review, we briefly describe the principles of proteomic discovery of SEPs and alt-proteins, then summarize how these technologies interface with chemical labeling for identification of SEPs and alt-proteins with specific properties, as well as in defining the interactome of SEPs and alt-proteins.
Assuntos
Peptídeos , Proteômica , Animais , Fases de Leitura Aberta , Peptídeos/química , Proteínas/genética , Genoma , Mamíferos/metabolismoRESUMO
RIBO-seq and proteogenomics have revealed that mammalian genomes harbor thousands of unannotated small and alternative open reading frames (smORFs, <100 amino acids, and alt-ORFs, >100 amino acids, respectively). Several dozen mammalian smORF-encoded proteins (SEPs) and alt-ORF-encoded proteins (alt-proteins) have been shown to play important biological roles, while the overwhelming majority of smORFs and alt-ORFs remain uncharacterized, particularly at the molecular level. Functional proteomics has the potential to reveal key properties of unannotated SEPs and alt-proteins in high throughput, and an approach to identify SEPs and alt-proteins undergoing regulated synthesis should be of broad utility. Here, we introduce a chemoproteomic pipeline based on bio-orthogonal non-canonical amino acid tagging (BONCAT) (Dieterich et al., 2006) to profile nascent SEPs and alt-proteins in human cells. This approach is able to identify cellular stress-induced and cell-cycle regulated SEPs and alt-proteins in cells. Graphical abstract Schematic overview of BONCAT-based chemoproteomic profiling of nascent, unannotated small and alternative open reading frame-encoded proteins (SEPs and alt-proteins).
RESUMO
Proteogenomic identification of translated small open reading frames has revealed thousands of previously unannotated, largely uncharacterized microproteins, or polypeptides of less than 100 amino acids, and alternative proteins (alt-proteins) that are co-encoded with canonical proteins and are often larger. The subcellular localizations of microproteins and alt-proteins are generally unknown but can have significant implications for their functions. Proximity biotinylation is an attractive approach to define the protein composition of subcellular compartments in cells and in animals. Here, we developed a high-throughput technology to map unannotated microproteins and alt-proteins to subcellular localizations by proximity biotinylation with TurboID (MicroID). More than 150 microproteins and alt-proteins are associated with subnuclear organelles. One alt-protein, alt-LAMA3, localizes to the nucleolus and functions in pre-rRNA transcription. We applied MicroID in a mouse model, validating expression of a conserved nuclear microprotein, and establishing MicroID for discovery of microproteins and alt-proteins in vivo.
Assuntos
Peptídeos , Proteínas , Animais , Nucléolo Celular , Camundongos , Fases de Leitura Aberta , Peptídeos/genética , Proteínas/genéticaRESUMO
Many unannotated microproteins and alternative proteins (alt-proteins) are coencoded with canonical proteins, but few of their functions are known. Motivated by the hypothesis that alt-proteins undergoing regulated synthesis could play important cellular roles, we developed a chemoproteomic pipeline to identify nascent alt-proteins in human cells. We identified 22 actively translated alt-proteins or N-terminal extensions, one of which is post-transcriptionally upregulated by DNA damage stress. We further defined a nucleolar, cell-cycle-regulated alt-protein that negatively regulates assembly of the pre-60S ribosomal subunit (MINAS-60). Depletion of MINAS-60 increases the amount of cytoplasmic 60S ribosomal subunit, upregulating global protein synthesis and cell proliferation. Mechanistically, MINAS-60 represses the rate of late-stage pre-60S assembly and export to the cytoplasm. Together, these results implicate MINAS-60 as a potential checkpoint inhibitor of pre-60S assembly and demonstrate that chemoproteomics enables hypothesis generation for uncharacterized alt-proteins.
Assuntos
Proteínas de Saccharomyces cerevisiae , Proteínas de Ciclo Celular/metabolismo , Humanos , RNA Ribossômico , Proteínas Ribossômicas/metabolismo , Subunidades Ribossômicas Maiores de Eucariotos/genética , Subunidades Ribossômicas Maiores de Eucariotos/metabolismo , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismoRESUMO
Proteogenomic identification of translated small open reading frames in humans has revealed thousands of microproteins, or polypeptides of fewer than 100 amino acids, that were previously invisible to geneticists. Hundreds of microproteins have been shown to be essential for cell growth and proliferation, and many regulate macromolecular complexes. However, the vast majority of microproteins remain functionally uncharacterized, and many lack secondary structure and exhibit limited evolutionary conservation. One such intrinsically disordered microprotein is NBDY, a 68-amino acid component of membraneless organelles known as P-bodies. In this work, we show that NBDY can undergo liquid-liquid phase separation, a biophysical process thought to underlie the formation of membraneless organelles, in the presence of RNA in vitro. Phosphorylation of NBDY drives liquid phase remixing in vitro and macroscopic P-body dissociation in cells undergoing growth factor signaling and cell division. These results suggest that NBDY phosphorylation enables regulation of P-body dynamics during cell proliferation and, more broadly, that intrinsically disordered microproteins may contribute to liquid-liquid phase separation and remixing behavior to affect cellular processes.
Assuntos
Proteínas Intrinsicamente Desordenadas/síntese química , Condensados Biomoleculares , Humanos , Proteínas Intrinsicamente Desordenadas/química , Tamanho da Partícula , FosforilaçãoRESUMO
Thousands of human small and alternative open reading frames (smORFs and alt-ORFs, respectively) have recently been annotated. Many alt-ORFs are co-encoded with canonical proteins in multicistronic configurations, but few of their functions are known. Here, we report the detection of alt-RPL36, a protein co-encoded with human RPL36. Alt-RPL36 partially localizes to the endoplasmic reticulum, where it interacts with TMEM24, which transports the phosphatidylinositol 4,5-bisphosphate (PI(4,5)P2) precursor phosphatidylinositol from the endoplasmic reticulum to the plasma membrane. Knock-out of alt-RPL36 increases plasma membrane PI(4,5)P2 levels, upregulates PI3K-AKT-mTOR signaling, and increases cell size. Alt-RPL36 contains four phosphoserine residues, point mutations of which abolish interaction with TMEM24 and, consequently, alt-RPL36 effects on PI3K signaling and cell size. These results implicate alt-RPL36 as an upstream regulator of PI3K-AKT-mTOR signaling. More broadly, the RPL36 transcript encodes two sequence-independent polypeptides that co-regulate translation via different molecular mechanisms, expanding our knowledge of multicistronic human gene functions.
Assuntos
Proteínas de Membrana/metabolismo , Fosfatidilinositol 3-Quinases/metabolismo , Proteínas Proto-Oncogênicas c-akt/metabolismo , Proteínas Ribossômicas/metabolismo , Transdução de Sinais , Serina-Treonina Quinases TOR/metabolismo , Processamento Alternativo , Sequência de Aminoácidos , Sequência de Bases , Transporte Biológico , Membrana Celular/metabolismo , Regulação para Baixo , Retículo Endoplasmático/metabolismo , Células HEK293 , Humanos , Proteínas de Membrana/genética , Mutação , Fosfatidilinositol 4,5-Difosfato/metabolismo , Ligação Proteica , Proteínas Ribossômicas/genéticaRESUMO
Polypeptides generated from proteolytic processing of protein precursors, or proteolytic proteoforms, play an important role in diverse biological functions and diseases. However, their often-small size and intricate post-translational biogenesis preclude the use of simple genetic tagging in their cellular studies. Herein, we develop a labeling strategy for this class of proteoforms, based on residue-specific genetic code expansion labeling with a molecular beacon design. We demonstrate the utility of such a design by creating a molecular beacon reporter to detect amyloid-ß peptides, known to be involved in the pathogenesis of Alzheimer's disease, as they are produced from amyloid precursor protein (APP) along the endocytic pathway of living cells.
Assuntos
Peptídeos beta-Amiloides/metabolismo , Precursor de Proteína beta-Amiloide/metabolismo , Lisina/análogos & derivados , Aminoacil-tRNA Sintetases/genética , Aminoacil-tRNA Sintetases/metabolismo , Peptídeos beta-Amiloides/química , Precursor de Proteína beta-Amiloide/genética , Proteínas Arqueais/genética , Proteínas Arqueais/metabolismo , Código Genético , Células HEK293 , Humanos , Lisina/química , Lisina/metabolismo , Methanosarcina/enzimologia , Microscopia de Fluorescência , Mutagênese Sítio-Dirigida , Processamento de Proteína Pós-TraducionalRESUMO
DCP2 is an RNA-decapping enzyme that controls the stability of human RNAs that encode factors functioning in transcription and the immune response. While >1,800 human DCP2 substrates have been identified, compensatory expression changes secondary to genetic ablation of DCP2 have complicated a complete mapping of its regulome. Cell-permeable, selective chemical inhibitors of DCP2 could provide a powerful tool to study DCP2 specificity. Here, we report phage display selection of CP21, a bicyclic peptide ligand to DCP2. CP21 has high affinity and selectivity for DCP2 and inhibits DCP2 decapping activity toward selected RNA substrates in human cells. CP21 increases formation of P-bodies, liquid condensates enriched in intermediates of RNA decay, in a manner that resembles the deletion or mutation of DCP2. We used CP21 to identify 76 previously unreported DCP2 substrates. This work demonstrates that DCP2 inhibition can complement genetic approaches to study RNA decay.
Assuntos
Compostos Bicíclicos Heterocíclicos com Pontes/farmacologia , Descoberta de Drogas , Endorribonucleases/antagonistas & inibidores , Inibidores Enzimáticos/farmacologia , Peptídeos/farmacologia , Compostos Bicíclicos Heterocíclicos com Pontes/síntese química , Compostos Bicíclicos Heterocíclicos com Pontes/química , Endorribonucleases/metabolismo , Inibidores Enzimáticos/síntese química , Inibidores Enzimáticos/química , Células HEK293 , Humanos , Conformação Molecular , Peptídeos/síntese química , Peptídeos/químicaRESUMO
Polypeptides generated from proteolytic processing of protein precursors, or proteolytic proteoforms, play an important role in diverse biological functions and diseases. However, their often-small size and intricate post-translational biogenesis preclude the use of simple genetic tagging in their cellular studies. Herein, we develop a labeling strategy for this class of proteoforms, based on residue-specific genetic code expansion labeling with a molecular beacon design. We demonstrate the utility of such a design by creating a molecular beacon reporter to detect amyloid-ß peptides, known to be involved in the pathogenesis of Alzheimer's disease, as they are produced from amyloid precursor protein (APP) along the endocytic pathway of living cells.
RESUMO
Proteogenomic identification of translated small open reading frames in humans has revealed thousands of microproteins, or polypeptides of fewer than 100 amino acids, that were previously invisible to geneticists. Hundreds of microproteins have been shown to be essential for cell growth and proliferation, and many regulate macromolecular complexes. One such regulatory microprotein is NBDY, a 68-amino acid component of the human cytoplasmic RNA decapping complex. Heterologously expressed NBDY was previously reported to regulate cytoplasmic ribonucleoprotein granules known as P-bodies and reporter gene stability, but the global effect of endogenous NBDY on the cellular transcriptome remained undefined. In this work, we demonstrate that endogenous NBDY directly interacts with the human RNA decapping complex through EDC4 and DCP1A and localizes to P-bodies. Global profiling of RNA stability changes in NBDY knockout (KO) cells reveals dysregulated stability of more than 1400 transcripts. DCP2 substrate transcript half-lives are both increased and decreased in NBDY KO cells, which correlates with 5' UTR length. NBDY deletion additionally alters the stability of non-DCP2 target transcripts, possibly as a result of downregulated expression of nonsense-mediated decay factors in NBDY KO cells. We present a comprehensive model of the regulation of RNA stability by NBDY.
Assuntos
Capuzes de RNA/química , Capuzes de RNA/metabolismo , Células HEK293 , Humanos , Degradação do RNAm Mediada por Códon sem Sentido/genética , Degradação do RNAm Mediada por Códon sem Sentido/fisiologia , Fases de Leitura Aberta/genética , Estabilidade de RNA , RNA Mensageiro/química , RNA Mensageiro/metabolismoRESUMO
Ribosome profiling and mass spectrometry have revealed thousands of small and alternative open reading frames (sm/alt-ORFs) that are translated into polypeptides variously termed as microproteins and alt-proteins in mammalian cells. Some micro-/alt-proteins exhibit stress-, cell-type-, and/or tissue-specific expression; understanding this regulated expression will be critical to elucidating their functions. While differential translation has been inferred by ribosome profiling, quantitative mass spectrometry-based proteomics is needed for direct comparison of microprotein and alt-protein expression between samples and conditions. However, while label-free quantitative proteomics has been applied to detect stress-dependent expression of bacterial microproteins, this approach has not yet been demonstrated for analysis of differential expression of unannotated ORFs in the more complex human proteome. Here, we present global micro-/alt-protein quantitation in two human leukemia cell lines, K562 and MOLT4. We identify 12 unannotated proteins that are differentially expressed in these cell lines. The expression of six micro/alt-proteins from cDNA was validated biochemically, and two were found to localize to the nucleus. Thus, we demonstrate that label-free comparative proteomics enables quantitation of micro-/alt-protein expression between human cell lines. We anticipate that this workflow will enable the discovery of regulated sm/alt-ORF products across many biological conditions in human cells.
Assuntos
Proteoma , Proteômica , Linhagem Celular , Humanos , Espectrometria de Massas , Fases de Leitura Aberta , Proteoma/genéticaRESUMO
Decapping is the first committed step in 5'-to-3' RNA decay, and in the cytoplasm of human cells, multiple decapping enzymes regulate the stabilities of distinct subsets of cellular transcripts. However, the complete set of RNAs regulated by any individual decapping enzyme remains incompletely mapped, and no consensus sequence or property is currently known to unambiguously predict decapping enzyme substrates. Dcp2 was the first-identified and best-studied eukaryotic decapping enzyme, but it has been shown to regulate the stability of <400 transcripts in mammalian cells to date. Here, we globally profile changes in the stability of the human transcriptome in Dcp2 knockout cells via TimeLapse-seq. We find that P-body enrichment is the strongest correlate of Dcp2-dependent decay and that modification with m6A exhibits an additive effect with P-body enrichment for Dcp2 targeting. These results are consistent with a model in which P-bodies represent sites where translationally repressed transcripts are sorted for decay by soluble cytoplasmic decay complexes through additional molecular marks.