RESUMEN
The 5'-untranslated region (5'-UTR) of mRNAs contains elements that affect expression, yet the rules by which these regions exert their effect are poorly understood. Here, we studied the impact of 5'-UTR sequences on protein levels in yeast, by constructing a large-scale library of mutants that differ only in the 10 bp preceding the translational start site of a fluorescent reporter. Using a high-throughput sequencing strategy, we obtained highly accurate measurements of protein abundance for over 2,000 unique sequence variants. The resulting pool spanned an approximately sevenfold range of protein levels, demonstrating the powerful consequences of sequence manipulations of even 1-10 nucleotides immediately upstream of the start codon. We devised computational models that predicted over 70% of the measured expression variability in held-out sequence variants. Notably, a combined model of the most prominent features successfully explained protein abundance in an additional, independently constructed library, whose nucleotide composition differed greatly from the library used to parameterize the model. Our analysis reveals the dominant contribution of the start codon context at positions -3 to -1, mRNA secondary structure, and out-of-frame upstream AUGs (uAUGs) to phenotypic diversity, thereby advancing our understanding of how protein levels are modulated by 5'-UTR sequences, and paving the way toward predictably tuning protein expression through manipulations of 5'-UTRs.
Asunto(s)
Regiones no Traducidas 5' , Proteínas Fúngicas/metabolismo , Saccharomyces cerevisiae/metabolismo , Secuencia de Bases , Codón Iniciador , Cartilla de ADN , Proteínas Fúngicas/genética , Conformación de Ácido Nucleico , ARN Mensajero/genética , Saccharomyces cerevisiae/genéticaRESUMEN
Embryonic stem cell (ESC) self-renewal and cell fate decisions are driven by a broad array of molecular signals. While transcriptional regulators have been extensively studied in human ESCs (hESCs), the extent to which RNA-binding proteins (RBPs) contribute to human pluripotency remains unclear. Here, we carry out a proteome-wide screen and identify 810 proteins that bind RNA in hESCs. We reveal that RBPs are preferentially expressed in hESCs and dynamically regulated during early stem cell differentiation. Notably, many RBPs are affected by knockdown of OCT4, a master regulator of pluripotency, several dozen of which are directly targeted by this factor. Using cross-linking and immunoprecipitation (CLIP-seq), we find that the pluripotency-associated STAT3 and OCT4 transcription factors interact with RNA in hESCs and confirm the binding of STAT3 to the conserved NORAD long-noncoding RNA. Our findings indicate that RBPs have a more widespread role in human pluripotency than previously appreciated.
Asunto(s)
Células Madre Embrionarias Humanas/metabolismo , Proteínas de Unión al ARN/metabolismo , Diferenciación Celular/genética , Línea Celular , ADN/metabolismo , Perfilación de la Expresión Génica , Regulación de la Expresión Génica , Humanos , Unión Proteica , Proteoma/metabolismo , ARN Mensajero/genética , ARN Mensajero/metabolismo , Proteínas de Unión al ARN/genética , Factor de Transcripción STAT3/metabolismoRESUMEN
RNA-binding proteins (RBPs) interact with RNA to form Ribonucleoprotein Particles (RNPs). The interaction between RBPs and their RNA partners are traditionally thought to be mediated by highly conserved RNA-binding domains (RBDs). Recently, high-throughput studies led to the discovery of hundreds of novel proteins and domains, of which many do not follow the classical definition of RNA-binding. Despite technological innovations, experimental screenings are currently limited to the detection of specific types of RNPs, underscoring the importance of computational methods for predicting novel RBPs and RNA interacting residues and interfaces. Here, we discuss major challenges in computational prediction of RBPs and RBDs and outline new strategies to circumvent current limitations of experimental techniques.