RESUMO
BACKGROUND: RNA-binding proteins (RBPs) play vital roles in many processes in the cell. Different RBPs bind RNA with different sequence and structure specificities. While sequence specificities for a large set of 205 RBPs have been reported through the RNAcompete compendium, structure specificities are known for only a small fraction. The main limitation lies in the design of the RNAcompete technology, which tests RBP binding against unstructured RNA probes, making it difficult to infer structural preferences from these data. We recently developed RCK, an algorithm to infer sequence and structural binding models from RNAcompete data. The set of binding models enables, for the first time, a large-scale assessment of RNA structure in the RBPome. RESULTS: We re-validate and uncover the role of RNA structure in the RPBome through novel analysis of the largest-scale dataset to date. First, we show that RNA structure exists in presumably unstructured RNA probes and that its variability is correlated with RNA-binding. Second, we examine the structural binding preferences of RBPs and discover an overall preference to bind RNA loops. Third, we significantly improve protein-binding prediction using RNA structure, both in vitro and in vivo. Lastly, we demonstrate that RNA structural binding preferences can be inferred for new proteins from solely their amino acid content. CONCLUSIONS: By counter-intuitively demonstrating through our analysis that we can predict both the RNA structure of and RBP binding to these putatively unstructured RNAs, we transform a compendium of RNA-binding proteins into a valuable resource for structure-based binding models. We uncover the important role RNA structure plays in protein-RNA interaction for hundreds of RNA-binding proteins.
Assuntos
Conformação de Ácido Nucleico , Proteínas de Ligação a RNA/química , RNA/química , Motivos de Aminoácidos , Sítios de Ligação , Modelos Teóricos , Motivos de Nucleotídeos , Ligação Proteica , RNA/genética , Proteínas de Ligação a RNA/genética , Proteínas de Ligação a RNA/metabolismo , Reprodutibilidade dos Testes , Relação Estrutura-AtividadeRESUMO
RNA-binding proteins (RBPs) participate in diverse cellular processes and have important roles in human development and disease. The human genome, and that of many other eukaryotes, encodes hundreds of RBPs that contain canonical sequence-specific RNA-binding domains (RBDs) as well as numerous other unconventional RNA binding proteins (ucRBPs). ucRBPs physically associate with RNA but lack common RBDs. The degree to which these proteins bind RNA, in a sequence specific manner, is unknown. Here, we provide a detailed description of both the laboratory and data processing methods for RNAcompete, a method we have previously used to analyze the RNA binding preferences of hundreds of RBD-containing RBPs, from diverse eukaryotes. We also determine the RNA-binding preferences for two human ucRBPs, NUDT21 and CNBP, and use this analysis to exemplify the RNAcompete pipeline. The results of our RNAcompete experiments are consistent with independent RNA-binding data for these proteins and demonstrate the utility of RNAcompete for analyzing the growing repertoire of ucRBPs.
Assuntos
Fator de Especificidade de Clivagem e Poliadenilação/genética , Análise em Microsséries/métodos , Proteínas de Ligação a RNA/genética , RNA/química , Animais , Sequência de Bases , Sítios de Ligação , Fator de Especificidade de Clivagem e Poliadenilação/metabolismo , Clonagem Molecular , Primers do DNA/química , Primers do DNA/metabolismo , Drosophila melanogaster/genética , Drosophila melanogaster/metabolismo , Escherichia coli/genética , Escherichia coli/metabolismo , Expressão Gênica , Humanos , Ligação Proteica , Domínios Proteicos , RNA/genética , RNA/metabolismo , Proteínas de Ligação a RNA/metabolismo , Proteínas Recombinantes/genética , Proteínas Recombinantes/metabolismo , Alinhamento de SequênciaRESUMO
Deep neural networks have demonstrated improved performance at predicting sequence specificities of DNA- and RNA-binding proteins. However, it remains unclear why they perform better than previous methods that rely on k-mers and position weight matrices. Here, we highlight a recent deep learning-based software package, called ResidualBind, that analyzes RNA-protein interactions using only RNA sequence as an input feature and performs global importance analysis for model interpretability. We discuss practical considerations for model interpretability to uncover learned sequence motifs and their secondary structure preferences.