RESUMEN
Cardiovascular diseases (CVDs) represent a major concern for global health, whose mechanistic understanding is complicated by a complex interplay between genetic predisposition and environmental factors. Specifically, heart failure (HF), encompassing dilated cardiomyopathy (DC), ischemic cardiomyopathy (ICM), and hypertrophic cardiomyopathy (HCM), is a topic of substantial interest in basic and clinical research. Here, we used a Partial Correlation Coefficient-based algorithm (PCC) within the context of a meta-analysis framework to construct a Gene Regulatory Network (GRN) that identifies key regulators whose activity is perturbed in Heart Failure. By integrating data from multiple independent studies, our approach unveiled crucial regulatory associations between transcription factors (TFs) and structural genes, emphasizing their pivotal roles in regulating metabolic pathways, such as fatty acid metabolism, oxidative stress response, epithelial-to-mesenchymal transition, and coagulation. In addition to known associations, our analysis also identified novel regulators, including the identification of TFs FPM315 and OVOL2, which are implicated in dilated cardiomyopathies, and TEAD1 and TEAD2 in both dilated and ischemic cardiomyopathies. Moreover, we uncovered alterations in adipogenesis and oxidative phosphorylation pathways in hypertrophic cardiomyopathy and discovered a role for IL2 STAT5 signaling in heart failure. Our findings underscore the importance of TF activity in the initiation and progression of cardiac disease, highlighting their potential as pharmacological targets.
Asunto(s)
Enfermedades Cardiovasculares , Redes Reguladoras de Genes , Factores de Transcripción , Humanos , Enfermedades Cardiovasculares/genética , Enfermedades Cardiovasculares/metabolismo , Factores de Transcripción/metabolismo , Factores de Transcripción/genética , Regulación de la Expresión Génica , Algoritmos , Insuficiencia Cardíaca/genética , Insuficiencia Cardíaca/metabolismoRESUMEN
The interaction between RNA and RNA-binding proteins (RBPs) has a key role in the regulation of gene expression, in RNA stability, and in many other biological processes. RBPs accomplish these functions by binding target RNA molecules through specific sequence and structure motifs. The identification of these binding motifs is therefore fundamental to improve our knowledge of the cellular processes and how they are regulated. Here, we present BRIO (BEAM RNA Interaction mOtifs), a new web server designed for the identification of sequence and structure RNA-binding motifs in one or more RNA molecules of interest. BRIO enables the user to scan over 2508 sequence motifs and 2296 secondary structure motifs identified in Homo sapiens and Mus musculus, in three different types of experiments (PAR-CLIP, eCLIP, HITS). The motifs are associated with the binding of 186 RBPs and 69 protein domains. The web server is freely available at http://brio.bio.uniroma2.it.
Asunto(s)
Proteínas de Unión al ARN/metabolismo , ARN/química , Programas Informáticos , Animales , Secuencia de Bases , Línea Celular , Humanos , Internet , Ratones , Motivos de Nucleótidos , ARN/metabolismo , ARN Nuclear Pequeño/metabolismo , ARN Viral/metabolismo , Análisis de Secuencia de ARNRESUMEN
RNA molecules are able to bind proteins, DNA and other small or long RNAs using information at primary, secondary or tertiary structure level. Recent techniques that use cross-linking and immunoprecipitation of RNAs can detect these interactions and, if followed by high-throughput sequencing, molecules can be analysed to find recurrent elements shared by interactors, such as sequence and/or structure motifs. Many tools are able to find sequence motifs from lists of target RNAs, while others focus on structure using different approaches to find specific interaction elements. In this work, we make a systematic analysis of RBP-RNA and RNA-RNA datasets to better characterize the interaction landscape with information about multi-motifs on the same RNAs. To achieve this goal, we updated our BEAM algorithm to combine both sequence and structure information to create pairs of patterns that model motifs of interaction. This algorithm was applied to several RNA binding proteins and ncRNAs interactors, confirming already known motifs and discovering new ones. This landscape analysis on interaction variability reflects the diversity of target recognition and underlines that often both primary and secondary structure are involved in molecular recognition.
Asunto(s)
Motivos de Nucleótidos , Proteínas de Unión al ARN/química , ARN/química , Análisis de Secuencia de ARN/métodos , Algoritmos , Animales , Secuencia de Bases , Sitios de Unión , Línea Celular , Células HEK293 , Células Hep G2 , Humanos , Células K562 , Ratones , MicroARNs/química , MicroARNs/genética , MicroARNs/metabolismo , Unión Proteica , ARN/genética , ARN/metabolismo , Proteínas de Unión al ARN/genética , Proteínas de Unión al ARN/metabolismoRESUMEN
Motivation: Signaling and metabolic pathways are finely regulated by a network of protein phosphorylation events. Unraveling the nature of this intricate network, composed of kinases, target proteins and their interactions, is therefore of crucial importance. Although thousands of kinase-specific phosphorylations (KsP) have been annotated in model organisms their kinase-target network is far from being complete, with less studied organisms lagging behind. Results: In this work, we achieved an automated and accurate identification of kinase domains, inferring the residues that most likely contribute to peptide specificity. We integrated this information with the target peptides of known human KsP to predict kinase-specific interactions in other eukaryotes through a deep neural network, outperforming similar methods. We analyzed the differential conservation of kinase specificity among eukaryotes revealing the high conservation of the specificity of tyrosine kinases. With this approach we discovered 1590 novel KsP of potential clinical relevance in the human proteome. Availability and implementation: http://akid.bio.uniroma2.it. Supplementary information: Supplementary data are available at Bioinformatics online.
Asunto(s)
Fosfotransferasas/química , Proteoma , Transducción de Señal , Eucariontes , Humanos , FosforilaciónRESUMEN
The most frequently used approach for protein structure prediction is currently homology modeling. The 3D model building phase of this methodology is critical for obtaining an accurate and biologically useful prediction. The most widely employed tool to perform this task is MODELLER. This program implements the "modeling by satisfaction of spatial restraints" strategy and its core algorithm has not been altered significantly since the early 1990s. In this work, we have explored the idea of modifying MODELLER with two effective, yet computationally light strategies to improve its 3D modeling performance. Firstly, we have investigated how the level of accuracy in the estimation of structural variability between a target protein and its templates in the form of σ values profoundly influences 3D modeling. We show that the σ values produced by MODELLER are on average weakly correlated to the true level of structural divergence between target-template pairs and that increasing this correlation greatly improves the program's predictions, especially in multiple-template modeling. Secondly, we have inquired into how the incorporation of statistical potential terms (such as the DOPE potential) in the MODELLER's objective function impacts positively 3D modeling quality by providing a small but consistent improvement in metrics such as GDT-HA and lDDT and a large increase in stereochemical quality. Python modules to harness this second strategy are freely available at https://github.com/pymodproject/altmod. In summary, we show that there is a large room for improving MODELLER in terms of 3D modeling quality and we propose strategies that could be pursued in order to further increase its performance.
Asunto(s)
Modelos Moleculares , Programas Informáticos , Homología Estructural de Proteína , Algoritmos , Biología Computacional , Simulación de Dinámica Molecular/estadística & datos numéricos , Proteínas/química , Alineación de Secuencia/estadística & datos numéricosRESUMEN
Motivation: RNA structural motif finding is a relevant problem that becomes computationally hard when working on high-throughput data (e.g. eCLIP, PAR-CLIP), often represented by thousands of RNA molecules. Currently, the BEAM server is the only web tool capable to handle tens of thousands of RNA in input with a motif discovery procedure that is only limited by the current secondary structure prediction accuracies. Results: The recently developed method BEAM (BEAr Motifs finder) can analyze tens of thousands of RNA molecules and identify RNA secondary structure motifs associated to a measure of their statistical significance. BEAM is extremely fast thanks to the BEAR encoding that transforms each RNA secondary structure in a string of characters. BEAM also exploits the evolutionary knowledge contained in a substitution matrix of secondary structure elements, extracted from the RFAM database of families of homologous RNAs. The BEAM web server has been designed to streamline data pre-processing by automatically handling folding and encoding of RNA sequences, giving users a choice for the preferred folding program. The server provides an intuitive and informative results page with the list of secondary structure motifs identified, the logo of each motif, its significance, graphic representation and information about its position in the RNA molecules sharing it. Availability and implementation: The web server is freely available at http://beam.uniroma2.it/ and it is implemented in NodeJS and Python with all major browsers supported. Contact: marco.pietrosanto@uniroma2.it. Supplementary information: Supplementary data are available at Bioinformatics online.
Asunto(s)
ARN/química , Secuencia de Bases , Bases de Datos Factuales , Internet , Motivos de Nucleótidos , Análisis de Secuencia de ARN , Programas InformáticosRESUMEN
Phosphorylation is a widespread post-translational modification that modulates the function of a large number of proteins. Here we show that a significant proportion of all the domains in the human proteome is significantly enriched or depleted in phosphorylation events. A substantial improvement in phosphosites prediction is achieved by leveraging this observation, which has not been tapped by existing methods. Phosphorylation sites are often not shared between multiple occurrences of the same domain in the proteome, even when the phosphoacceptor residue is conserved. This is partly because of different functional constraints acting on the same domain in different protein contexts. Moreover, by augmenting domain alignments with structural information, we were able to provide direct evidence that phosphosites in protein-protein interfaces need not be positionally conserved, likely because they can modulate interactions simply by sitting in the same general surface area.
Asunto(s)
Fosforilación , Proteoma/metabolismo , Biología Computacional/métodos , Humanos , Fosfoproteínas/metabolismo , Dominios Proteicos , Proteoma/químicaRESUMEN
Structural information is crucial in ribonucleic acid (RNA) analysis and functional annotation; nevertheless, how to include such structural data is still a debated problem. Dot-bracket notation is the most common and simple representation for RNA secondary structures but its simplicity leads also to ambiguity requiring further processing steps to dissolve. Here we present BEAR (Brand nEw Alphabet for RNA), a new context-aware structural encoding represented by a string of characters. Each character in BEAR encodes for a specific secondary structure element (loop, stem, bulge and internal loop) with specific length. Furthermore, exploiting this informative and yet simple encoding in multiple alignments of related RNAs, we captured how much structural variation is tolerated in RNA families and convert it into transition rates among secondary structure elements. This allowed us to compute a substitution matrix for secondary structure elements called MBR (Matrix of BEAR-encoded RNA secondary structures), of which we tested the ability in aligning RNA secondary structures. We propose BEAR and the MBR as powerful resources for the RNA secondary structure analysis, comparison and classification, motif finding and phylogeny.
Asunto(s)
ARN/química , Algoritmos , Biología Computacional/métodos , Conformación de Ácido Nucleico , Análisis de Secuencia de ARNRESUMEN
Nucleos is a web server for the identification of nucleotide-binding sites in protein structures. Nucleos compares the structure of a query protein against a set of known template 3D binding sites representing nucleotide modules, namely the nucleobase, carbohydrate and phosphate. Structural features, clustering and conservation are used to filter and score the predictions. The predicted nucleotide modules are then joined to build whole nucleotide-binding sites, which are ranked by their score. The server takes as input either the PDB code of the query protein structure or a user-submitted structure in PDB format. The output of Nucleos is composed of ranked lists of predicted nucleotide-binding sites divided by nucleotide type (e.g. ATP-like). For each ranked prediction, Nucleos provides detailed information about the score, the template structure and the structural match for each nucleotide module composing the nucleotide-binding site. The predictions on the query structure and the template-binding sites can be viewed directly on the web through a graphical applet. In 98% of the cases, the modules composing correct predictions belong to proteins with no homology relationship between each other, meaning that the identification of brand-new nucleotide-binding sites is possible using information from non-homologous proteins. Nucleos is available at http://nucleos.bio.uniroma2.it/nucleos/.
Asunto(s)
Nucleótidos/metabolismo , Conformación Proteica , Programas Informáticos , Apoproteínas/química , Apoproteínas/metabolismo , Sitios de Unión , Internet , Proteínas/metabolismoRESUMEN
The webPDBinder (http://pdbinder.bio.uniroma2.it/PDBinder) is a web server for the identification of small ligand-binding sites in a protein structure. webPDBinder searches a protein structure against a library of known binding sites and a collection of control non-binding pockets. The number of similarities identified with the residues in the two sets is then used to derive a propensity value for each residue of the query protein associated to the likelihood that the residue is part of a ligand binding site. The predicted binding residues can be further refined using conservation scores derived from the multiple alignment of the PFAM protein family. webPDBinder correctly identifies residues belonging to the binding site in 77% of the cases and is able to identify binding pockets starting from holo or apo structures with comparable performances. This is important for all the real world cases where the query protein has been crystallized without a ligand and is also difficult to obtain clear similarities with bound pockets from holo pocket libraries. The input is either a PDB code or a user-submitted structure. The output is a list of predicted binding pocket residues with propensity and conservation values both in text and graphical format.
Asunto(s)
Proteínas/química , Programas Informáticos , Sitios de Unión , Internet , Ligandos , Modelos Moleculares , Conformación Proteica , Proteínas/metabolismoRESUMEN
We analyzed the structure of human long non-coding RNA (lncRNAs) genes to investigate whether the non-coding transcriptome is organized in modular domains, as is the case for protein-coding genes. To this aim, we compared all known human lncRNA exons and identified 340 pairs of exons with high sequence and/or secondary structure similarity but embedded in a dissimilar sequence context. We grouped these pairs in 106 clusters based on their reciprocal similarities. These shared modules are highly conserved between humans and the four great ape species, display evidence of purifying selection and likely arose as a result of recent segmental duplications. Our analysis contributes to the understanding of the mechanisms driving the evolution of the non-coding genome and suggests additional strategies towards deciphering the functional complexity of this class of molecules.
RESUMEN
BACKGROUND: Anecdotal evidence of the involvement of alternative splicing (AS) in the regulation of protein-protein interactions has been reported by several studies. AS events have been shown to significantly occur in regions where a protein interaction domain or a short linear motif is present. Several AS variants show partial or complete loss of interface residues, suggesting that AS can play a major role in the interaction regulation by selectively targeting the protein binding sites. In the present study we performed a statistical analysis of the alternative splicing of a non-redundant dataset of human protein-protein interfaces known at molecular level to determine the importance of this way of modulation of protein-protein interactions through AS. RESULTS: Using a Cochran-Mantel-Haenszel chi-square test we demonstrated that the alternative splicing-mediated partial removal of both heterodimeric and homodimeric binding sites occurs at lower frequencies than expected, and this holds true even if we consider only those isoforms whose sequence is less different from that of the canonical protein and which therefore allow to selectively regulate functional regions of the protein. On the other hand, large removals of the binding site are not significantly prevented, possibly because they are associated to drastic structural changes of the protein. The observed protection of the binding sites from AS is not preferentially directed towards putative hot spot interface residues, and is widespread to all protein functional classes. CONCLUSIONS: Our findings indicate that protein-protein binding sites are generally protected from alternative splicing-mediated partial removals. However, some cases in which the binding site is selectively removed exist, and here we discuss one of them.
Asunto(s)
Empalme Alternativo , Proteínas/química , Proteínas/metabolismo , Proteómica , Sitios de Unión , Proteínas Cullin/química , Proteínas Cullin/genética , Proteínas Cullin/metabolismo , Proteínas de Unión al ADN/química , Proteínas de Unión al ADN/genética , Proteínas de Unión al ADN/metabolismo , Humanos , Modelos Moleculares , Unión Proteica , Isoformas de Proteínas/química , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Multimerización de Proteína , Estructura Cuaternaria de Proteína , Proteínas/genética , TermodinámicaRESUMEN
Nearly half of known protein structures interact with phosphate-containing ligands, such as nucleotides and other cofactors. Many methods have been developed for the identification of metal ions-binding sites and some for bigger ligands such as carbohydrates, but none is yet available for the prediction of phosphate-binding sites. Here we describe Pfinder, a method that predicts binding sites for phosphate groups, both in the form of ions or as parts of other non-peptide ligands, in proteins of known structure. Pfinder uses the Query3D local structural comparison algorithm to scan a protein structure for the presence of a number of structural motifs identified for their ability to bind the phosphate chemical group. Pfinder has been tested on a data set of 52 proteins for which both the apo and holo forms were available. We obtained at least one correct prediction in 63% of the holo structures and in 62% of the apo. The ability of Pfinder to recognize a phosphate-binding site in unbound protein structures makes it an ideal tool for functional annotation and for complementing docking and drug design methods. The Pfinder program is available at http://pdbfun.uniroma2.it/pfinder.
Asunto(s)
Algoritmos , Fosfatos/química , Proteínas/química , Aminoácidos/química , Sitios de Unión , Ligandos , Modelos Moleculares , Anotación de Secuencia Molecular , Conformación Proteica , Pliegue de Proteína , Programas InformáticosRESUMEN
Phosfinder is a web server for the identification of phosphate binding sites in protein structures. Phosfinder uses a structural comparison algorithm to scan a query structure against a set of known 3D phosphate binding motifs. Whenever a structural similarity between the query protein and a phosphate binding motif is detected, the phosphate bound by the known motif is added to the protein structure thus representing a putative phosphate binding site. Predicted binding sites are then evaluated according to (i) their position with respect to the query protein solvent-excluded surface and (ii) the conservation of the binding residues in the protein family. The server accepts as input either the PDB code of the protein to be analyzed or a user-submitted structure in PDB format. All the search parameters are user modifiable. Phosfinder outputs a list of predicted binding sites with detailed information about their structural similarity with known phosphate binding motifs, and the conservation of the residues involved. A graphical applet allows the user to visualize the predicted binding sites on the query protein structure. The results on a set of 52 apo/holo structure pairs show that the performance of our method is largely unaffected by ligand-induced conformational changes. Phosfinder is available at http://phosfinder.bio.uniroma2.it.
Asunto(s)
Fosfatos/química , Conformación Proteica , Programas Informáticos , Sitios de Unión , Internet , Modelos Moleculares , Proteínas/químicaRESUMEN
BACKGROUND: The identification of ligand binding sites is a key task in the annotation of proteins with known structure but uncharacterized function. Here we describe a knowledge-based method exploiting the observation that unrelated binding sites share small structural motifs that bind the same chemical fragments irrespective of the nature of the ligand as a whole. RESULTS: PDBinder compares a query protein against a library of binding and non-binding protein surface regions derived from the PDB. The results of the comparison are used to derive a propensity value for each residue which is correlated with the likelihood that the residue is part of a ligand binding site. The method was applied to two different problems: i) the prediction of ligand binding residues and ii) the identification of which surface cleft harbours the binding site. In both cases PDBinder performed consistently better than existing methods. PDBinder has been trained on a non-redundant set of 1356 high-quality protein-ligand complexes and tested on a set of 239 holo and apo complex pairs. We obtained an MCC of 0.313 on the holo set with a PPV of 0.413 while on the apo set we achieved an MCC of 0.271 and a PPV of 0.372. CONCLUSIONS: We show that PDBinder performs better than existing methods. The good performance on the unbound proteins is extremely important for real-world applications where the location of the binding site is unknown. Moreover, since our approach is orthogonal to those used in other programs, the PDBinder propensity value can be integrated in other algorithms further increasing the final performance.
Asunto(s)
Algoritmos , Bases del Conocimiento , Proteínas/química , Animales , Sitios de Unión , Bases de Datos de Proteínas , Ligandos , Modelos Moleculares , Unión Proteica , Conformación Proteica , Estructura Terciaria de Proteína , Proteínas/metabolismoRESUMEN
Recently, modularity has emerged as a general attribute of complex biological systems. This is probably because modular systems lend themselves readily to optimization via random mutation followed by natural selection. Although they are not traditionally considered to evolve by this process, biological ligands are also modular, being composed of recurring chemical fragments, and moreover they exhibit similarities reminiscent of mutations (e.g. the few atoms differentiating adenine and guanine). Many ligands are also promiscuous in the sense that they bind to many different protein folds. Here, we investigated whether ligand chemical modularity is reflected in an underlying modularity of binding sites across unrelated proteins. We chose nucleotides as paradigmatic ligands, because they can be described as composed of well-defined fragments (nucleobase, ribose and phosphates) and are quite abundant both in nature and in protein structure databases. We found that nucleotide-binding sites do indeed show a modular organization and are composed of fragment-specific protein structural motifs, which parallel the modular structure of their ligands. Through an analysis of the distribution of these motifs in different proteins and in different folds, we discuss the evolutionary implications of these findings and argue that the structural features we observed can arise both as a result of divergence from a common ancestor or convergent evolution.
Asunto(s)
Nucleótidos/química , Proteínas/química , Secuencias de Aminoácidos , Sitios de Unión , Evolución Molecular , Ligandos , Modelos MolecularesRESUMEN
Recent research provides insight into the ability of miRNA to regulate various pathways in several cancer types. Despite their involvement in the regulation of the mRNA via targeting the 3'UTR, there are relatively few studies examining the changes in these regulatory mechanisms specific to single cancer types or shared between different cancer types. We analyzed samples where both miRNA and mRNA expression had been measured and performed a thorough correlation analysis on 7494 experimentally validated human miRNA-mRNA target-gene pairs in both healthy and tumoral samples. We show how more than 90% of these miRNA-mRNA interactions show a loss of regulation in the tumoral samples compared with their healthy counterparts. As expected, we found shared miRNA-mRNA dysregulated pairs among different tumors of the same tissue. However, anatomically different cancers also share multiple dysregulated interactions, suggesting that some cancer-related mechanisms are not tumor-specific. 2865 unique miRNA-mRNA pairs were identified across 13 cancer types, ≈ 40% of these pairs showed a loss of correlation in the tumoral samples in at least 2 out of the 13 analyzed cancers. Specifically, miR-200 family, miR-155 and miR-1 were identified, based on the computational analysis described below, as the miRNAs that potentially lose the highest number of interactions across different samples (only literature-based interactions were used for this analysis). Moreover, the miR-34a/ALDH2 and miR-9/MTHFD2 pairs show a switch in their correlation between healthy and tumor kidney samples suggesting a possible change in the regulation exerted by the miRNAs. Interestingly, the expression of these mRNAs is also associated with the overall survival. The disruption of miRNA regulation on its target, therefore, suggests the possible involvement of these pairs in cell malignant functions. The analysis reported here shows how the regulation of miRNA-mRNA interactions strongly differs between healthy and tumoral cells, based on the strong correlation variation between miRNA and its target that we obtained by analyzing the expression data of healthy and tumor tissue in highly reliable miRNA-target pairs. Finally, a go term enrichment analysis shows that the critical pairs identified are involved in cellular adhesion, proliferation, and migration.
RESUMEN
BACKGROUND: Protein phosphorylation modulates protein function in organisms at all levels of complexity. Parasites of the Leishmania genus undergo various developmental transitions in their life cycle triggered by changes in the environment. The molecular mechanisms that these organisms use to process and integrate these external cues are largely unknown. However Leishmania lacks transcription factors, therefore most regulatory processes may occur at a post-translational level and phosphorylation has recently been demonstrated to be an important player in this process. Experimental identification of phosphorylation sites is a time-consuming task. Moreover some sites could be missed due to the highly dynamic nature of this process or to difficulties in phospho-peptide enrichment. RESULTS: Here we present PhosTryp, a phosphorylation site predictor specific for trypansomatids. This method uses an SVM-based approach and has been trained with recent Leishmania phosphosproteomics data. PhosTryp achieved a 17% improvement in prediction performance compared with Netphos, a non organism-specific predictor. The analysis of the peptides correctly predicted by our method but missed by Netphos demonstrates that PhosTryp captures Leishmania-specific phosphorylation features. More specifically our results show that Leishmania kinases have sequence specificities which are different from their counterparts in higher eukaryotes. Consequently we were able to propose two possible Leishmania-specific phosphorylation motifs.We further demonstrate that this improvement in performance extends to the related trypanosomatids Trypanosoma brucei and Trypanosoma cruzi. Finally, in order to maximize the usefulness of PhosTryp, we trained a predictor combining all the peptides from L. infantum, T. brucei and T. cruzi. CONCLUSIONS: Our work demonstrates that training on organism-specific data results in an improvement that extends to related species. PhosTryp is freely available at http://phostryp.bio.uniroma2.it.
Asunto(s)
Trypanosoma/metabolismo , Animales , FosforilaciónRESUMEN
RNA primary and secondary motif discovery is an important step in the annotation and characterization of unknown interaction dynamics between RNAs and RNA-Binding Proteins, and several methods have been developed to meet the need of fast and efficient discovery of interaction motifs. Recent advances have increased the amount of data produced by experimental assays and there is no available method suitable for the analysis of all type of results. Here we present a simple workflow to help choosing the more appropriate method, depending on the starting situation, among the three algorithms that best cover the landscape of approaches. A detailed analysis is presented to highlight the need for different algorithms in different working settings. In conclusion, the proposed workflow depends on the nature of the starting data and on the availability of RNA annotations.
Asunto(s)
Conformación de Ácido Nucleico , ARN/química , Análisis de Secuencia de ARN/métodos , Algoritmos , Animales , Sitios de Unión/genética , Biología Computacional/métodos , Conjuntos de Datos como Asunto , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Unión Proteica/genética , ARN/genética , ARN/metabolismo , Proteínas de Unión al ARN/metabolismo , Programas InformáticosRESUMEN
Structural characterization of RNAs is a dynamic field, offering many modelling possibilities. RNA secondary structure models are usually characterized by an encoding that depicts structural information of the molecule through string representations or graphs. In this work, we provide a generalization of the BEAR encoding (a context-aware structural encoding we previously developed) by expanding the set of alignments used for the construction of substitution matrices and then applying it to secondary structure encodings ranging from fine-grained to more coarse-grained representations. We also introduce a re-interpretation of the Shannon Information applied on RNA alignments, proposing a new scoring metric, the Relative Information Gain (RIG). The RIG score is available for any position in an alignment, showing how different levels of detail encoded in the RNA representation can contribute differently to convey structural information. The approaches presented in this study can be used alongside state-of-the-art tools to synergistically gain insights into the structural elements that RNAs and RNA families are composed of. This additional information could potentially contribute to their improvement or increase the degree of confidence in the secondary structure of families and any set of aligned RNAs.