RESUMEN
RNA molecules undergo a number of chemical modifications whose effects can alter their structure and molecular interactions. Previous studies have shown that RNA editing can impact the formation of ribonucleoprotein complexes and influence the assembly of membrane-less organelles such as stress granules. For instance, N6-methyladenosine (m6A) enhances SG formation and N1-methyladenosine (m1A) prevents their transition to solid-like aggregates. Yet, very little is known about adenosine to inosine (A-to-I) modification that is very abundant in human cells and not only impacts mRNAs but also noncoding RNAs. Here, we introduce the CROSSalive predictor of A-to-I effects on RNA structure based on high-throughput in-cell experiments. Our method shows an accuracy of 90% in predicting the single and double-stranded content of transcripts and identifies a general enrichment of double-stranded regions caused by A-to-I in long intergenic noncoding RNAs (lincRNAs). For the individual cases of NEAT1, NORAD, and XIST, we investigated the relationship between A-to-I editing and interactions with RNA-binding proteins using available CLIP data and catRAPID predictions. We found that A-to-I editing is linked to the alteration of interaction sites with proteins involved in phase separation, which suggests that RNP assembly can be influenced by A-to-I. CROSSalive is available at http://service.tartaglialab.com/new_submission/crossalive.
Asunto(s)
Adenosina , ARN Largo no Codificante , Humanos , Adenosina/química , ARN no Traducido/genética , ARN Mensajero/genética , ARN Largo no Codificante/genética , ARN Largo no Codificante/metabolismo , Inosina/metabolismoRESUMEN
The mechanisms that regulate the switch between epidermal progenitor state and differentiation are not fully understood. Recent findings indicate that the chromatin remodelling BAF complex (Brg1-associated factor complex or SWI/SNF complex) and the transcription factor p63 mutually recruit one another to open chromatin during epidermal differentiation. Here, we identify a long non-coding transcript that includes an ultraconserved element, uc.291, which physically interacts with ACTL6A and modulates chromatin remodelling to allow differentiation. Loss of uc.291 expression, both in primary keratinocytes and in three-dimensional skin equivalents, inhibits differentiation as indicated by epidermal differentiation complex genes down-regulation. ChIP experiments reveal that upon uc.291 depletion, ACTL6A is bound to the differentiation gene promoters and inhibits BAF complex targeting to induce terminal differentiation genes. In the presence of uc.291, the ACTL6A inhibitory effect is released, allowing chromatin changes to promote the expression of differentiation genes. Thus, uc.291 interacts with ACTL6A to modulate chromatin remodelling activity, allowing the transcription of late differentiation genes.
Asunto(s)
Actinas/genética , Proteínas Cromosómicas no Histona/genética , Proteínas de Unión al ADN/genética , ARN Largo no Codificante , Células Cultivadas , Cromatina/genética , Ensamble y Desensamble de Cromatina , Proteínas Cromosómicas no Histona/metabolismo , Humanos , ARN Largo no Codificante/genética , Factores de Transcripción/genética , Factores de Transcripción/metabolismoRESUMEN
Specific elements of viral genomes regulate interactions within host cells. Here, we calculated the secondary structure content of >2000 coronaviruses and computed >100 000 human protein interactions with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The genomic regions display different degrees of conservation. SARS-CoV-2 domain encompassing nucleotides 22 500-23 000 is conserved both at the sequence and structural level. The regions upstream and downstream, however, vary significantly. This part of the viral sequence codes for the Spike S protein that interacts with the human receptor angiotensin-converting enzyme 2 (ACE2). Thus, variability of Spike S is connected to different levels of viral entry in human cells within the population. Our predictions indicate that the 5' end of SARS-CoV-2 is highly structured and interacts with several human proteins. The binding proteins are involved in viral RNA processing, include double-stranded RNA specific editases and ATP-dependent RNA-helicases and have strong propensity to form stress granules and phase-separated assemblies. We propose that these proteins, also implicated in viral infections such as HIV, are selectively recruited by SARS-CoV-2 genome to alter transcriptional and post-transcriptional regulation of host cells and to promote viral replication.
Asunto(s)
Genoma Viral , Mapas de Interacción de Proteínas , SARS-CoV-2/genética , Enzima Convertidora de Angiotensina 2/metabolismo , COVID-19/virología , Humanos , Unión Proteica , SARS-CoV-2/patogenicidad , SARS-CoV-2/fisiología , Glicoproteína de la Espiga del Coronavirus/química , Glicoproteína de la Espiga del Coronavirus/genética , Glicoproteína de la Espiga del Coronavirus/metabolismo , Virulencia/genética , Internalización del Virus , Replicación ViralRESUMEN
BACKGROUND: With more than 300 million potentially infected people every year, and with the expanded habitat of mosquitoes due to climate change, Dengue virus (DENV) cannot be considered anymore only a tropical disease. The RNA secondary structure is a functional characteristic of RNA viruses, and together with the accumulated high-throughput sequencing data could provide general insights towards understanding virus biology. Here, we profiled the RNA secondary structure of > 7000 complete viral genomes from 11 different species focusing on viral hemorrhagic fevers, including DENV serotypes, EBOV, and YFV. RESULTS: In our work we demonstrated that the secondary structure and presence of protein-binding domains in the genomes can be used as intrinsic signature to further classify the viruses. With our predictive approach, we achieved high prediction scores of the secondary structure (AUC up to 0.85 with experimental data), and computed consensus secondary structure profiles using hundreds of in silico models. We observed that viruses show different structural patterns, where e.g., DENV-2 and Ebola virus tend to be less structured than the other viruses. Furthermore, we observed virus-specific correlations between secondary structure and the number of interaction sites with human proteins, reaching a correlation of 0.89 in the case of Zika virus. We also identified that helicases-encoding regions are more structured in several flaviviruses, while the regions encoding for the contact proteins exhibit virus-specific clusters in terms of RNA structure and potential protein-RNA interactions. We also used structural data to study the geographical distribution of DENV, finding a significant difference between DENV-3 from Asia and South-America, where the structure is also driving the clustering more than sequence identity, which could imply different evolutionary routes of this subtype. CONCLUSIONS: Our massive computational analysis provided novel results regarding the secondary structure and the interaction with human proteins, not only for DENV serotypes, but also for other flaviviruses and viral hemorrhagic fevers-associated viruses. We showed how the RNA secondary structure can be used to categorise viruses, and even to further classify them based on the interaction with proteins. We envision that these approaches can be used to further classify and characterise these complex viruses.
Asunto(s)
Virus del Dengue , Dengue , Fiebres Hemorrágicas Virales , Infección por el Virus Zika , Virus Zika , Animales , Asia , Virus del Dengue/genética , Humanos , Serogrupo , América del SurRESUMEN
The human transcriptome contains thousands of long non-coding RNAs (lncRNAs). Characterizing their function is a current challenge. An emerging concept is that lncRNAs serve as protein scaffolds, forming ribonucleoproteins and bringing proteins in proximity. However, only few scaffolding lncRNAs have been characterized and the prevalence of this function is unknown. Here, we propose the first computational approach aimed at predicting scaffolding lncRNAs at large scale. We predicted the largest human lncRNA-protein interaction network to date using the catRAPID omics algorithm. In combination with tissue expression and statistical approaches, we identified 847 lncRNAs (â¼5% of the long non-coding transcriptome) predicted to scaffold half of the known protein complexes and network modules. Lastly, we show that the association of certain lncRNAs to disease may involve their scaffolding ability. Overall, our results suggest for the first time that RNA-mediated scaffolding of protein complexes and modules may be a common mechanism in human cells.
Asunto(s)
Biología Computacional/métodos , ARN Largo no Codificante/metabolismo , Proteínas de Unión al ARN/metabolismo , Ribonucleoproteínas/metabolismo , Algoritmos , Predisposición Genética a la Enfermedad/genética , Humanos , Unión Proteica , Mapas de Interacción de Proteínas , Proteoma/genética , Proteoma/metabolismo , ARN Largo no Codificante/genética , Proteínas de Unión al ARN/genética , Ribonucleoproteínas/genética , TranscriptomaRESUMEN
Here we introduce the Computational Recognition of Secondary Structure (CROSS) method to calculate the structural profile of an RNA sequence (single- or double-stranded state) at single-nucleotide resolution and without sequence length restrictions. We trained CROSS using data from high-throughput experiments such as Selective 2Î-Hydroxyl Acylation analyzed by Primer Extension (SHAPE; Mouse and HIV transcriptomes) and Parallel Analysis of RNA Structure (PARS; Human and Yeast transcriptomes) as well as high-quality NMR/X-ray structures (PDB database). The algorithm uses primary structure information alone to predict experimental structural profiles with >80% accuracy, showing high performances on large RNAs such as Xist (17 900 nucleotides; Area Under the ROC Curve AUC of 0.75 on dimethyl sulfate (DMS) experiments). We integrated CROSS in thermodynamics-based methods to predict secondary structure and observed an increase in their predictive power by up to 30%.
Asunto(s)
Algoritmos , Conformación de Ácido Nucleico , Polimorfismo de Nucleótido Simple , ARN/química , Animales , Área Bajo la Curva , Humanos , Ratones , ARN Largo no Codificante/genética , ARN Largo no Codificante/metabolismo , Curva ROC , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Programas Informáticos , TermodinámicaRESUMEN
MOTIVATION: Recent technological advances revealed that an unexpected large number of proteins interact with transcripts even if the RNA-binding domains are not annotated. We introduce catRAPID signature to identify ribonucleoproteins based on physico-chemical features instead of sequence similarity searches. The algorithm, trained on human proteins and tested on model organisms, calculates the overall RNA-binding propensity followed by the prediction of RNA-binding regions. catRAPID signature outperforms other algorithms in the identification of RNA-binding proteins and detection of non-classical RNA-binding regions. Results are visualized on a webpage and can be downloaded or forwarded to catRAPID omics for predictions of RNA targets. AVAILABILITY AND IMPLEMENTATION: catRAPID signature can be accessed at http://s.tartaglialab.com/new_submission/signature CONTACT: gian.tartaglia@crg.es or gian@tartaglialab.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Programas Informáticos , Algoritmos , Humanos , ARN , RibonucleoproteínasRESUMEN
SUMMARY: Here we introduce ccSOL omics, a webserver for large-scale calculations of protein solubility. Our method allows (i) proteome-wide predictions; (ii) identification of soluble fragments within each sequences; (iii) exhaustive single-point mutation analysis. RESULTS: Using coil/disorder, hydrophobicity, hydrophilicity, ß-sheet and α-helix propensities, we built a predictor of protein solubility. Our approach shows an accuracy of 79% on the training set (36 990 Target Track entries). Validation on three independent sets indicates that ccSOL omics discriminates soluble and insoluble proteins with an accuracy of 74% on 31 760 proteins sharing <30% sequence similarity. AVAILABILITY AND IMPLEMENTATION: ccSOL omics can be freely accessed on the web at http://s.tartaglialab.com/page/ccsol_group. Documentation and tutorial are available at http://s.tartaglialab.com/static_files/shared/tutorial_ccsol_omics.html. CONTACT: gian.tartaglia@crg.es SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Proteínas de Escherichia coli/química , Proteínas de Escherichia coli/genética , Escherichia coli/genética , Regulación Bacteriana de la Expresión Génica , Internet , Proteómica/métodos , Algoritmos , Expresión Génica , Interacciones Hidrofóbicas e Hidrofílicas , Estructura Secundaria de Proteína , SolubilidadRESUMEN
Flaviviruses pose significant global health threats, infecting over 300 million people annually. Among their evasion strategies, the production of subgenomic flaviviral RNAs (sfRNAs) from the 3' UTR of viral genomes is particularly notable. Utilizing a comprehensive in silico approach with the catRAPID algorithm, we analyzed over 300,000 interactions between sfRNAs and human proteins derived from more than 8000 flavivirus genomes, including Dengue, Zika, Yellow Fever, West Nile, and Japanese Encephalitis viruses. By providing the first extensive atlas of sfRNA interactions, we offer new insights into how flaviviruses can manipulate host cellular machinery to facilitate viral survival and persistence. Our study not only validated known interactions but also revealed novel human proteins that could be involved in sfRNA-mediated host defense evasion, including helicases, splicing factors, and chemokines. These findings significantly expand the known interactome of sfRNAs with human proteins, underscoring their role in modulating host cellular pathways. Intriguingly, we predict interaction with stress granules, a critical component of the cellular response to viral infection, suggesting a mechanism by which flaviviruses inhibit their formation to evade host defenses. Moreover, a set of highly-interacting proteins in common among the sfRNAs showed predictive power to identify sfRNA-forming regions, highlighting how protein signatures could be used to annotate viruses. This atlas not only serves as a resource for exploring therapeutic targets but also aids in the identification of sfRNA biomarkers for improved flavivirus diagnostics.
RESUMEN
Mature tropical urban trees are susceptible to root and trunk rot caused by pathogenic fungi. A metagenomic survey of such fungi was carried out on 210 soil and tissue samples collected from 134 trees of 14 common species in Singapore. Furthermore, 121 fruiting bodies were collected and barcoded. Out of the 22,067 OTUs (operational taxonomic units) identified, 10,646 OTUs had annotation information, and most were either ascomycetes (63.4%) or basidiomycetes (22.5%). Based on their detection in the diseased tissues and surrounding soils and/or the presence of fruiting bodies, fourteen basidiomycetes (nine Polyporales, four Hymenochaetales, one Boletales) and three ascomycetes (three species of Scytalidium) were strongly associated with the diseased trees. Fulvifomes siamensis affected the largest number of tree species surveyed. The association of three fungi was further supported by in vitro wood decay studies. Genetic heterogeneity was common in the diseased tissues and fruiting bodies (Ganoderma species especially). This survey identified the common pathogenic fungi of tropical urban trees and laid the foundation for early diagnosis and targeted mitigation efforts. It also illustrated the complexity of fungal ecology and pathogenicity.
RESUMEN
The new flow of high-throughput RNA secondary structure data coming from different techniques allowed the further development of machine learning approaches. We developed CROSS and CROSSalive, two algorithms trained on experimental data able to predict the RNA secondary structure propensity both in vitro and in vivo. Since the in vivo folding of RNA molecules depends on multiple factors due to the cellular crowded environment, prediction is a complex problem that needs additional calculations for the interaction with proteins and other molecules. In the following chapter, we will describe the differences in predicting RNA secondary structure propensity using experimental data as input for an Artificial Neural Network (ANN) in vitro and in vivo.
Asunto(s)
Redes Neurales de la Computación , Aprendizaje Automático , Estructura Secundaria de Proteína , ARN/genéticaRESUMEN
Post-transcriptional methylation of N6-adenine and N1-adenine can affect transcriptome turnover and translation. Furthermore, the regulatory function of N6-methyladenine (m6A) during heat shock has been uncovered, including the enhancement of the phase separation potential of RNAs. In response to acute stress, e.g. heat shock, the orderly sequestration of mRNAs in stress granules (SGs) is considered important to protect transcripts from the irreversible aggregation. Until recently, the role of N1-methyladenine (m1A) on mRNAs during acute stress response remains largely unknown. Here we show that the methyltransferase complex TRMT6/61A, which generates the m1A tag, is involved in transcriptome protection during heat shock. Our bioinformatics analysis indicates that occurrence of the m1A motif is increased in mRNAs known to be enriched in SGs. Accordingly, the m1A-generating methyltransferase TRMT6/61A accumulated in SGs and mass spectrometry confirmed enrichment of m1A in the SG RNAs. The insertion of a single methylation motif in the untranslated region of a reporter RNA leads to more efficient recovery of protein synthesis from that transcript after the return to normal temperature. Our results demonstrate far-reaching functional consequences of a minimal RNA modification on N1-adenine during acute proteostasis stress.
Asunto(s)
Adenosina/análogos & derivados , Gránulos Citoplasmáticos/metabolismo , Citoprotección , Estrés Fisiológico , Adenosina/metabolismo , Arsenitos/toxicidad , Gránulos Citoplasmáticos/efectos de los fármacos , Citoprotección/efectos de los fármacos , Células HeLa , Respuesta al Choque Térmico/efectos de los fármacos , Humanos , Metilación/efectos de los fármacos , Modelos Biológicos , Conformación Proteica , ARN Mensajero/química , ARN Mensajero/genética , ARN Mensajero/metabolismo , Proteínas de Unión al ARN/química , Proteínas de Unión al ARN/metabolismo , Estrés Fisiológico/efectos de los fármacos , ARNt Metiltransferasas/metabolismoRESUMEN
Plants produce a vast array of chemical compounds that we use as medicines and flavors, but these compounds' biosynthetic pathways are still poorly understood. This paucity precludes us from modifying, improving, and mass-producing these specialized metabolites in suitable bioreactors. Many of the specialized metabolites are expressed in a narrow range of organs, tissues, and cell types, suggesting a tight regulation of the responsible biosynthetic pathways. Fortunately, with unprecedented ease of generating gene expression data and with >200,000 publicly available RNA sequencing samples, we are now able to study the expression of genes from hundreds of plant species. This review demonstrates how gene expression can elucidate the biosynthetic pathways by mining organ-specific genes, gene expression clusters, and applying various types of co-expression analyses. To empower biologists to perform these analyses, we showcase these analyses using recently published, user-friendly tools. Finally, we analyze the performance of co-expression networks and show that they are a valuable addition to elucidating multiple the biosynthetic pathways of specialized metabolism.
RESUMEN
The fungi kingdom is composed of eukaryotic heterotrophs, which are responsible for balancing the ecosystem and play a major role as decomposers. They also produce a vast diversity of secondary metabolites, which have antibiotic or pharmacological properties. However, our lack of knowledge of gene function in fungi precludes us from tailoring them to our needs and tapping into their metabolic diversity. To help remedy this, we gathered genomic and gene expression data of 19 most widely-researched fungi to build an online tool, fungi.guru, which contains tools for cross-species identification of conserved pathways, functional gene modules, and gene families. We exemplify how our tool can elucidate the molecular function, biological process and cellular component of genes involved in various biological processes, by identifying a secondary metabolite pathway producing gliotoxin in Aspergillus fumigatus, the catabolic pathway of cellulose in Coprinopsis cinerea and the conserved DNA replication pathway in Fusarium graminearum and Pyricularia oryzae. The tool is available at www.fungi.guru.
RESUMEN
To compare the secondary structure profiles of RNA molecules we developed the CROSSalign method. CROSSalign is based on the combination of the Computational Recognition Of Secondary Structure (CROSS) algorithm to predict the RNA secondary structure profile at single-nucleotide resolution and the Dynamic Time Warping (DTW) method to align profiles of different lengths. We applied CROSSalign to investigate the structural conservation of long non-coding RNAs such as XIST and HOTAIR as well as ssRNA viruses including HIV. CROSSalign performs pair-wise comparisons and is able to find homologs between thousands of matches identifying the exact regions of similarity between profiles of different lengths. In a pool of sequences with the same secondary structure CROSSalign accurately recognizes repeat A of XIST and domain D2 of HOTAIR and outperforms other methods based on covariance modeling. The algorithm is freely available at the webpage http://service.tartaglialab.com//new_submission/crossalign.
RESUMEN
Synchronization of mitochondrial and cytoplasmic translation rates is critical for the maintenance of cellular fitness, with cancer cells being especially vulnerable to translational uncoupling. Although alterations of cytosolic protein synthesis are common in human cancer, compensating mechanisms in mitochondrial translation remain elusive. Here we show that the malignant long non-coding RNA (lncRNA) SAMMSON promotes a balanced increase in ribosomal RNA (rRNA) maturation and protein synthesis in the cytosol and mitochondria by modulating the localization of CARF, an RNA-binding protein that sequesters the exo-ribonuclease XRN2 in the nucleoplasm, which under normal circumstances limits nucleolar rRNA maturation. SAMMSON interferes with XRN2 binding to CARF in the nucleus by favoring the formation of an aberrant cytoplasmic RNA-protein complex containing CARF and p32, a mitochondrial protein required for the processing of the mitochondrial rRNAs. These data highlight how a single oncogenic lncRNA can simultaneously modulate RNA-protein complex formation in two distinct cellular compartments to promote cell growth.