RESUMO
RNA-binding proteins (RBPs) are key co- and post-transcriptional regulators of gene expression, playing a crucial role in many biological processes. Experimental methods like CLIP-seq have enabled the identification of transcriptome-wide RNA-protein interactions for select proteins; however, the time- and resource-intensive nature of these technologies call for the development of computational methods to complement their predictions. Here, we leverage recent, large-scale CLIP-seq experiments to construct a de novo predictor of RNA-protein interactions based on graph neural networks (GNN). We show that the GNN method allows us not only to predict missing links in an RNA-protein network, but to predict the entire complement of targets of previously unassayed proteins, and even to reconstruct the entire network of RNA-protein interactions in different conditions based on minimal information. Our results demonstrate the potential of modern machine learning methods to extract useful information on post-transcriptional regulation from large data sets.
Assuntos
Redes Neurais de Computação , RNA , Análise de Sequência de RNA/métodos , RNA/genética , RNA/metabolismo , Proteínas de Ligação a RNA/genética , Proteínas de Ligação a RNA/metabolismo , Aprendizado de MáquinaRESUMO
RNA-protein interactions have long being recognised as crucial regulators of gene expression. Recently, the development of scalable experimental techniques to measure these interactions has revolutionised the field, leading to the production of large-scale datasets which offer both opportunities and challenges for machine learning techniques. In this brief note, we will discuss some of the major stumbling blocks towards the use of machine learning in computational RNA biology, focusing specifically on the problem of predicting RNA-protein interactions from next-generation sequencing data.
Assuntos
Biologia Computacional , Aprendizado de Máquina , Biologia Computacional/métodos , RNA/genéticaRESUMO
Complex networks can model a wide range of complex systems in nature and society, and many algorithms (network generators) capable of synthesizing networks with few and very specific structural characteristics (degree distribution, average path length, etc.) have been developed. However, there remains a significant lack of generators capable of synthesizing networks with strong resemblance to those observed in the real-world, which can subsequently be used as a null model, or to perform tasks such as extrapolation, compression and control. In this paper, a robust new approach we term Action-based Modeling is presented that creates a compact probabilistic model of a given target network, which can then be used to synthesize networks of arbitrary size. Statistical comparison to existing network generators is performed and results show that the performance of our approach is comparable to the current state-of-the-art methods on a variety of network measures, while also yielding easily interpretable generators. Additionally, the action-based approach described herein allows the user to consider an arbitrarily large set of structural characteristics during the generator design process.