|

1.

Accelerating 3D genomics data analysis with Microcket.

Zhao, Yu; Yang, Mengqi; Gong, Fanglei; Pan, Yuqi; Hu, Minghui; Peng, Qin; Lu, Leina; Lyu, Xiaowen; Sun, Kun.

Commun Biol ; 7(1): 675, 2024 Jun 01.

Article En | MEDLINE | ID: mdl-38824179

The three-dimensional (3D) organization of genome is fundamental to cell biology. To explore 3D genome, emerging high-throughput approaches have produced billions of sequencing reads, which is challenging and time-consuming to analyze. Here we present Microcket, a package for mapping and extracting interacting pairs from 3D genomics data, including Hi-C, Micro-C, and derivant protocols. Microcket utilizes a unique read-stitch strategy that takes advantage of the long read cycles in modern DNA sequencers; benchmark evaluations reveal that Microcket runs much faster than the current tools along with improved mapping efficiency, and thus shows high potential in accelerating and enhancing the biological investigations into 3D genome. Microcket is freely available at https://github.com/hellosunking/Microcket .

Genomics , Software , Genomics/methods , High-Throughput Nucleotide Sequencing/methods , Humans , Sequence Analysis, DNA/methods , Data Analysis

2.

GATSol, an enhanced predictor of protein solubility through the synergy of 3D structure graph and large language modeling.

Li, Bin; Ming, Dengming.

BMC Bioinformatics ; 25(1): 204, 2024 Jun 01.

Article En | MEDLINE | ID: mdl-38824535

BACKGROUND: Protein solubility is a critically important physicochemical property closely related to protein expression. For example, it is one of the main factors to be considered in the design and production of antibody drugs and a prerequisite for realizing various protein functions. Although several solubility prediction models have emerged in recent years, many of these models are limited to capturing information embedded in one-dimensional amino acid sequences, resulting in unsatisfactory predictive performance. RESULTS: In this study, we introduce a novel Graph Attention network-based protein Solubility model, GATSol, which represents the 3D structure of proteins as a protein graph. In addition to the node features of amino acids extracted by the state-of-the-art protein large language model, GATSol utilizes amino acid distance maps generated using the latest AlphaFold technology. Rigorous testing on independent eSOL and the Saccharomyces cerevisiae test datasets has shown that GATSol outperforms most recently introduced models, especially with respect to the coefficient of determination R2, which reaches 0.517 and 0.424, respectively. It outperforms the current state-of-the-art GraphSol by 18.4% on the S. cerevisiae_test set. CONCLUSIONS: GATSol captures 3D dimensional features of proteins by building protein graphs, which significantly improves the accuracy of protein solubility prediction. Recent advances in protein structure modeling allow our method to incorporate spatial structure features extracted from predicted structures into the model by relying only on the input of protein sequences, which simplifies the entire graph neural network prediction process, making it more user-friendly and efficient. As a result, GATSol may help prioritize highly soluble proteins, ultimately reducing the cost and effort of experimental work. The source code and data of the GATSol model are freely available at https://github.com/binbinbinv/GATSol .

Proteins , Solubility , Proteins/chemistry , Proteins/metabolism , Protein Conformation , Databases, Protein , Computational Biology/methods , Software , Saccharomyces cerevisiae/metabolism , Saccharomyces cerevisiae/chemistry , Algorithms , Models, Molecular , Amino Acid Sequence

3.

RNA-Seq Data Analysis: A Practical Guide for Model and Non-Model Organisms.

Pola-Sánchez, Enrique; Hernández-Martínez, Karen Magdalena; Pérez-Estrada, Rafael; Sélem-Mójica, Nelly; Simpson, June; Abraham-Juárez, María Jazmín; Herrera-Estrella, Alfredo; Villalobos-Escobedo, José Manuel.

Curr Protoc ; 4(5): e1054, 2024 May.

Article En | MEDLINE | ID: mdl-38808970

RNA sequencing (RNA-seq) has emerged as a powerful tool for assessing genome-wide gene expression, revolutionizing various fields of biology. However, analyzing large RNA-seq datasets can be challenging, especially for students or researchers lacking bioinformatics experience. To address these challenges, we present a comprehensive guide to provide step-by-step workflows for analyzing RNA-seq data, from raw reads to functional enrichment analysis, starting with considerations for experimental design. This is designed to aid students and researchers working with any organism, irrespective of whether an assembled genome is available. Within this guide, we employ various recognized bioinformatics tools to navigate the landscape of RNA-seq analysis and discuss the advantages and disadvantages of different tools for the same task. Our protocol focuses on clarity, reproducibility, and practicality to enable users to navigate the complexities of RNA-seq data analysis easily and gain valuable biological insights from the datasets. Additionally, all scripts and a sample dataset are available in a GitHub repository to facilitate the implementation of the analysis pipeline. © 2024 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Analysis of data from a model plant with an available reference genome Basic Protocol 2: Gene ontology enrichment analysis Basic Protocol 3: De novo assembly of data from non-model plants.

RNA-Seq , RNA-Seq/methods , Computational Biology/methods , Sequence Analysis, RNA/methods , Software

4.

Improved algorithm for generating evenly-spaced streamlines from an orientation field on a triangulated surface.

Jacquemet, Vincent.

Comput Methods Programs Biomed ; 251: 108202, 2024 Jun.

Article En | MEDLINE | ID: mdl-38703718

BACKGROUND: Vector fields such as cardiac fiber orientation can be visualized on a surface using streamlines. The application of evenly-spaced streamline generation to the construction of interconnected cable structure for cardiac propagation models has more stringent requirements imperfectly fulfilled by current algorithms. METHOD: We developed an open-source C++/python package for the placement of evenly-spaced streamlines on a triangulated surface. The new algorithm improves upon previous works by more accurately handling streamline extremities, U-turns and limit cycles, by providing stronger geometrical guarantees on inter-streamline minimal distance, particularly when a high density of streamlines (up to 10µm spacing) is desired, and by making a more efficient parallel implementation available. The approach requires finding intersections between geometrical capsules and triangles to update an occupancy mask defined on the triangles. This enables fast streamline integration from thousands of seed points to identify optimal streamline placement. RESULTS: The algorithm was assessed qualitatively on different left atrial models of fiber orientation with varying mesh resolutions (up to 375k triangles) and quantitatively by measuring streamline lengths and distribution of inter-streamline minimal distance. The complexity and the computational performance of the algorithm were studied as a function of streamline spacing in relation to triangular mesh resolution. CONCLUSION: More accurate geometrical computations, attention to details and fine-tuning led to an algorithm more amenable to applications that require precise positioning of streamlines.

Algorithms , Humans , Models, Cardiovascular , Computer Simulation , Heart Atria , Software

5.

Where's Whaledo: A software toolkit for array localization of animal vocalizations.

Snyder, Eric R; Solsona-Berga, Alba; Baumann-Pickering, Simone; Frasier, Kait E; Wiggins, Sean M; Hildebrand, John A.

PLoS Comput Biol ; 20(5): e1011456, 2024 May.

Article En | MEDLINE | ID: mdl-38768239

Where's Whaledo is a software toolkit that uses a combination of automated processes and user interfaces to greatly accelerate the process of reconstructing animal tracks from arrays of passive acoustic recording devices. Passive acoustic localization is a non-invasive yet powerful way to contribute to species conservation. By tracking animals through their acoustic signals, important information on diving patterns, movement behavior, habitat use, and feeding dynamics can be obtained. This method is useful for helping to understand habitat use, observe behavioral responses to noise, and develop potential mitigation strategies. Animal tracking using passive acoustic localization requires an acoustic array to detect signals of interest, associate detections on various receivers, and estimate the most likely source location by using the time difference of arrival (TDOA) of sounds on multiple receivers. Where's Whaledo combines data from two small-aperture volumetric arrays and a variable number of individual receivers. In a case study conducted in the Tanner Basin off Southern California, we demonstrate the effectiveness of Where's Whaledo in localizing groups of Ziphius cavirostris. We reconstruct the tracks of six individual animals vocalizing concurrently and identify Ziphius cavirostris tracks despite being obscured by a large pod of vocalizing dolphins.

Software , Vocalization, Animal , Animals , Vocalization, Animal/physiology , Computational Biology/methods , Dolphins/physiology , Acoustics

6.

Krisp: A Python package to aid in the design of CRISPR and amplification-based diagnostic assays from whole genome sequencing data.

Foster, Zachary S L; Tupper, Andrew S; Press, Caroline M; Grünwald, Niklaus J.

PLoS Comput Biol ; 20(5): e1012139, 2024 May.

Article En | MEDLINE | ID: mdl-38768250

Recent pandemics like COVID-19 highlighted the importance of rapidly developing diagnostics to detect evolving pathogens. CRISPR-Cas technology has recently been used to develop diagnostic assays for sequence-specific recognition of DNA or RNA. These assays have similar sensitivity to the gold standard qPCR but can be deployed as easy to use and inexpensive test strips. However, the discovery of diagnostic regions of a genome flanked by conserved regions where primers can be designed requires extensive bioinformatic analyses of genome sequences. We developed the Python package krisp to aid in the discovery of primers and diagnostic sequences that differentiate groups of samples from each other, using either unaligned genome sequences or a variant call format (VCF) file as input. Krisp has been optimized to handle large datasets by using efficient algorithms that run in near linear time, use minimal RAM, and leverage parallel processing when available. The validity of krisp results has been demonstrated in the laboratory with the successful design of a CRISPR diagnostic assay to distinguish the sudden oak death pathogen Phytophthora ramorum from closely related Phytophthora species. Krisp is released open source under a permissive license with all the documentation needed to quickly design CRISPR-Cas diagnostic assays.

CRISPR-Cas Systems , SARS-CoV-2 , Software , Whole Genome Sequencing , CRISPR-Cas Systems/genetics , Humans , Whole Genome Sequencing/methods , SARS-CoV-2/genetics , Computational Biology/methods , COVID-19/diagnosis , COVID-19/virology , Algorithms

7.

A computational tool suite to facilitate single-cell lineage tracing analyses.

Waterfall, Joshua J; Midoun, Adil; Perié, Leïla.

Cell Rep Methods ; 4(5): 100780, 2024 May 20.

Article En | MEDLINE | ID: mdl-38744285

Tracking the lineage relationships of cell populations is of increasing interest in diverse biological contexts. In this issue of Cell Reports Methods, Holze et al. present a suite of computational tools to facilitate such analyses and encourage their broader application.

Cell Lineage , Computational Biology , Single-Cell Analysis , Single-Cell Analysis/methods , Humans , Computational Biology/methods , Software , Animals

8.

Artificial intelligence in Medicare: utilization, spending, and access to AI-enabled clinical software.

Zink, Anna; Boone, Claire; Joynt Maddox, Karen E; Chernew, Michael E; Neprash, Hannah T.

Am J Manag Care ; 30(6 Spec No.): SP473-SP477, 2024 May.

Article En | MEDLINE | ID: mdl-38820190

OBJECTIVES: In 2018, CMS established reimbursement for the first Medicare-covered artificial intelligence (AI)-enabled clinical software: CT fractional flow reserve (FFRCT) to assist in the diagnosis of coronary artery disease. This study quantified Medicare utilization of and spending on FFRCT from 2018 through 2022 and characterized adopting hospitals, clinicians, and patients. STUDY DESIGN: Analysis, using 100% Medicare fee-for-service claims data, of the hospitals, clinicians, and patients who performed or received coronary CT angiography with or without FFRCT. METHODS: We measured annual trends in utilization of and spending on FFRCT among hospitals and clinicians from 2018 through 2022. Characteristics of FFRCT-adopting and nonadopting hospitals and clinicians were compared, as well as the characteristics of patients who received FFRCT vs those who did not. RESULTS: From 2018 to 2022, FFRCT billing volume in Medicare increased more than 11-fold (from 1083 to 12,363 claims). Compared with nonbilling hospitals, FFRCT-billing hospitals were more likely to be larger, part of a health system, nonprofit, and financially profitable. FFRCT-billing clinicians worked in larger group practices and were more likely to be cardiac specialists. FFRCT-receiving patients were more likely to be male and White and less likely to be dually enrolled in Medicaid or receiving disability benefits. CONCLUSIONS: In the initial 5 years of Medicare reimbursement for FFRCT, growth was concentrated among well-resourced hospitals and clinicians. As Medicare begins to reimburse clinicians for the use of AI-enabled clinical software such as FFRCT, it is crucial to monitor the diffusion of these services to ensure equal access.

Artificial Intelligence , Coronary Artery Disease , Medicare , United States , Humans , Medicare/economics , Medicare/statistics & numerical data , Male , Female , Aged , Coronary Artery Disease/economics , Fractional Flow Reserve, Myocardial , Fee-for-Service Plans/statistics & numerical data , Computed Tomography Angiography/economics , Computed Tomography Angiography/statistics & numerical data , Software , Coronary Angiography/statistics & numerical data , Coronary Angiography/economics

9.

Classified Dynamic Programming in RNA Structure Analysis.

Voß, Björn.

Methods Mol Biol ; 2726: 125-141, 2024.

Article En | MEDLINE | ID: mdl-38780730

Analysis of the folding space of RNA generally suffers from its exponential size. With classified Dynamic Programming algorithms, it is possible to alleviate this burden and to analyse the folding space of RNA in great depth. Key to classified DP is that the search space is partitioned into classes based on an on-the-fly computed feature. A class-wise evaluation is then used to compute class-wide properties, such as the lowest free energy structure for each class, or aggregate properties, such as the class' probability. In this paper we describe the well-known shape and hishape abstraction of RNA structures, their power to help better understand RNA function and related methods that are based on these abstractions.

Algorithms , Computational Biology , Nucleic Acid Conformation , RNA Folding , RNA , RNA/chemistry , RNA/genetics , Computational Biology/methods , Software , Thermodynamics

10.

Classification and Identification of Non-canonical Base Pairs and Structural Motifs.

Sarrazin-Gendron, Roman; Waldispühl, Jérôme; Reinharz, Vladimir.

Methods Mol Biol ; 2726: 143-168, 2024.

Article En | MEDLINE | ID: mdl-38780731

The 3D structures of many ribonucleic acid (RNA) loops are characterized by highly organized networks of non-canonical interactions. Multiple computational methods have been developed to annotate structures with those interactions or automatically identify recurrent interaction networks. By contrast, the reverse problem that aims to retrieve the geometry of a look from its sequence or ensemble of interactions remains much less explored. In this chapter, we will describe how to retrieve and build families of conserved structural motifs using their underlying network of non-canonical interactions. Then, we will show how to assign sequence alignments to those families and use the software BayesPairing to build statistical models of structural motifs with their associated sequence alignments. From this model, we will apply BayesPairing to identify in new sequences regions where those loop geometries can occur.

Base Pairing , Computational Biology , RNA , Software , Computational Biology/methods , RNA/chemistry , RNA/genetics , Nucleic Acid Conformation , Sequence Alignment/methods , Algorithms , Nucleotide Motifs , Bayes Theorem , Models, Molecular

11.

LocARNA 2.0: Versatile Simultaneous Alignment and Folding of RNAs.

Will, Sebastian.

Methods Mol Biol ; 2726: 235-254, 2024.

Article En | MEDLINE | ID: mdl-38780734

Generating accurate alignments of non-coding RNA sequences is indispensable in the quest for understanding RNA function. Nevertheless, aligning RNAs remains a challenging computational task. In the twilight-zone of RNA sequences with low sequence similarity, sequence homologies and compatible, favorable (a priori unknown) structures can be inferred only in dependency of each other. Thus, simultaneous alignment and folding (SA&F) remains the gold-standard of comparative RNA analysis, even if this method is computationally highly demanding. This text introduces to the recent release 2.0 of the software package LocARNA, focusing on its practical application. The package enables versatile, fast and accurate analysis of multiple RNAs. For this purpose, it implements SA&F algorithms in a specific, lightweight flavor that makes them routinely applicable in large scale. Its high performance is achieved by combining ensemble-based sparsification of the structure space and banding strategies. Probabilistic banding strongly improves the performance of LocARNA 2.0 even over previous releases, while simplifying its effective use. Enabling flexible application to various use cases, LocARNA provides tools to globally and locally compare, cluster, and multiply aligned RNAs based on optimization and probabilistic variants of SA&F, which optionally integrate prior knowledge, expressible by anchor and structure constraints.

Algorithms , Computational Biology , RNA Folding , RNA , Software , RNA/genetics , RNA/chemistry , Computational Biology/methods , Nucleic Acid Conformation , Sequence Alignment/methods , Sequence Analysis, RNA/methods

12.

VarGibbs Usage in the Optimization of Nearest-Neighbor Parameters and Prediction of Melting Temperature of RNA Duplexes.

Ferreira, Izabela; Weber, Gerald.

Methods Mol Biol ; 2726: 15-43, 2024.

Article En | MEDLINE | ID: mdl-38780726

The nearest-neighbor (NN) model is a general tool for the evaluation for oligonucleotide thermodynamic stability. It is primarily used for the prediction of melting temperatures but has also found use in RNA secondary structure prediction and theoretical models of hybridization kinetics. One of the key problems is to obtain the NN parameters from melting temperatures, and VarGibbs was designed to obtain those parameters directly from melting temperatures. Here we will describe the basic workflow from RNA melting temperatures to NN parameters with the use of VarGibbs. We start by a brief revision of the basic concepts of RNA hybridization and of the NN model and then show how to prepare the data files, run the parameter optimization, and interpret the results.

Nucleic Acid Conformation , Nucleic Acid Denaturation , Thermodynamics , Transition Temperature , RNA/chemistry , RNA/genetics , Software , Algorithms , Nucleic Acid Hybridization/methods

13.

RNA Secondary Structure Modeling Following the IPANEMAP Workflow.

Allouche, Delphine; De Bisschop, Grégoire; Saaidi, Afaf; Hardouin, Pierre; du Moutier, Francois-Xavier Lyonnet; Ponty, Yann; Bruno, Sargueil.

Methods Mol Biol ; 2726: 85-104, 2024.

Article En | MEDLINE | ID: mdl-38780728

The structure of RNA molecules and their complexes are crucial for understanding biology at the molecular level. Resolving these structures holds the key to understanding their manifold structure-mediated functions ranging from regulating gene expression to catalyzing biochemical processes. Predicting RNA secondary structure is a prerequisite and a key step to accurately model their three dimensional structure. Although dedicated modelling software are making fast and significant progresses, predicting an accurate secondary structure from the sequence remains a challenge. Their performance can be significantly improved by the incorporation of experimental RNA structure probing data. Many different chemical and enzymatic probes have been developed; however, only one set of quantitative data can be incorporated as constraints for computer-assisted modelling. IPANEMAP is a recent workflow based on RNAfold that can take into account several quantitative or qualitative data sets to model RNA secondary structure. This chapter details the methods for popular chemical probing (DMS, CMCT, SHAPE-CE, and SHAPE-Map) and the subsequent analysis and structure prediction using IPANEMAP.

Models, Molecular , Nucleic Acid Conformation , RNA , Software , Workflow , RNA/chemistry , RNA/genetics , Computational Biology/methods

14.

Modified Nucleotides and RNA Structure Prediction.

Varenyk, Yuliia; Lorenz, Ronny.

Methods Mol Biol ; 2726: 169-207, 2024.

Article En | MEDLINE | ID: mdl-38780732

Nucleotide modifications are occurrent in all types of RNA and play an important role in RNA structure formation and stability. Modified bases not only possess the ability to shift the RNA structure ensemble towards desired functional confirmations. By changes in the base pairing partner preference, they may even enlarge or reduce the conformational space, i.e., the number and types of structures the RNA molecule can adopt. However, most methods to predict RNA secondary structure do not provide the means to include the effect of modifications on the result. With the help of a heavily modified transfer RNA (tRNA) molecule, this chapter demonstrates how to include the effect of different base modifications into secondary structure prediction using the ViennaRNA Package. The constructive approach demonstrated here allows for the calculation of minimum free energy structure and suboptimal structures at different levels of modified base support. In particular we, show how to incorporate the isomerization of uridine to pseudouridine ( Ψ ) and the reduction of uridine to dihydrouridine (D).

Nucleic Acid Conformation , RNA , RNA/chemistry , RNA, Transfer/chemistry , RNA, Transfer/metabolism , Nucleotides/chemistry , Base Pairing , Computational Biology/methods , Thermodynamics , Software , Uridine/chemistry , Models, Molecular , Pseudouridine/chemistry

15.

Assessing the Quality of Cotranscriptional Folding Simulations.

Kühnl, Felix; Stadler, Peter F; Findeiß, Sven.

Methods Mol Biol ; 2726: 347-376, 2024.

Article En | MEDLINE | ID: mdl-38780738

Structural changes in RNAs are an important contributor to controlling gene expression not only at the posttranscriptional stage but also during transcription. A subclass of riboswitches and RNA thermometers located in the 5' region of the primary transcript regulates the downstream functional unit - usually an ORF - through premature termination of transcription. Not only such elements occur naturally, but they are also attractive devices in synthetic biology. The possibility to design such riboswitches or RNA thermometers is thus of considerable practical interest. Since these functional RNA elements act already during transcription, it is important to model and understand the dynamics of folding and, in particular, the formation of intermediate structures concurrently with transcription. Cotranscriptional folding simulations are therefore an important step to verify the functionality of design constructs before conducting expensive and labor-intensive wet lab experiments. For RNAs, full-fledged molecular dynamics simulations are far beyond practical reach because of both the size of the molecules and the timescales of interest. Even at the simplified level of secondary structures, further approximations are necessary. The BarMap approach is based on representing the secondary structure landscape for each individual transcription step by a coarse-grained representation that only retains a small set of low-energy local minima and the energy barriers between them. The folding dynamics between two transcriptional elongation steps is modeled as a Markov process on this representation. Maps between pairs of consecutive coarse-grained landscapes make it possible to follow the folding process as it changes in response to transcription elongation. In its original implementation, the BarMap software provides a general framework to investigate RNA folding dynamics on temporally changing landscapes. It is, however, difficult to use in particular for specific scenarios such as cotranscriptional folding. To overcome this limitation, we developed the user-friendly BarMap-QA pipeline described in detail in this contribution. It is illustrated here by an elaborate example that emphasizes the careful monitoring of several quality measures. Using an iterative workflow, a reliable and complete kinetics simulation of a synthetic, transcription-regulating riboswitch is obtained using minimal computational resources. All programs and scripts used in this contribution are free software and available for download as a source distribution for Linux® or as a platform-independent Docker® image including support for Apple macOS® and Microsoft Windows®.

Molecular Dynamics Simulation , Nucleic Acid Conformation , RNA Folding , Transcription, Genetic , Riboswitch/genetics , RNA/chemistry , RNA/genetics , Software

16.

How Parameters Influence SHAPE-Directed Predictions.

Greenwood, Torin; Heitsch, Christine E.

Methods Mol Biol ; 2726: 105-124, 2024.

Article En | MEDLINE | ID: mdl-38780729

The structure of an RNA sequence encodes information about its biological function. Dynamic programming algorithms are often used to predict the conformation of an RNA molecule from its sequence alone, and adding experimental data as auxiliary information improves prediction accuracy. This auxiliary data is typically incorporated into the nearest neighbor thermodynamic model22 by converting the data into pseudoenergies. Here, we look at how much of the space of possible structures auxiliary data allows prediction methods to explore. We find that for a large class of RNA sequences, auxiliary data shifts the predictions significantly. Additionally, we find that predictions are highly sensitive to the parameters which define the auxiliary data pseudoenergies. In fact, the parameter space can typically be partitioned into regions where different structural predictions predominate.

Algorithms , Computational Biology , Nucleic Acid Conformation , RNA , Thermodynamics , RNA/chemistry , RNA/genetics , Computational Biology/methods , Software

17.

How to do RNA-RNA Interaction Prediction? A Use-Case Driven Handbook Using IntaRNA.

Raden, Martin; Miladi, Milad.

Methods Mol Biol ; 2726: 209-234, 2024.

Article En | MEDLINE | ID: mdl-38780733

Computational prediction of RNA-RNA interactions (RRI) is a central methodology for the specific investigation of inter-molecular RNA interactions and regulatory effects of non-coding RNAs like eukaryotic microRNAs or prokaryotic small RNAs. Available methods can be classified according to their underlying prediction strategies, each implicating specific capabilities and restrictions often not transparent to the non-expert user. Within this work, we review seven classes of RRI prediction strategies and discuss the advantages and limitations of respective tools, since such knowledge is essential for selecting the right tool in the first place.Among the RRI prediction strategies, accessibility-based approaches have been shown to provide the most reliable predictions. Here, we describe how IntaRNA, as one of the state-of-the-art accessibility-based tools, can be applied in various use cases for the task of computational RRI prediction. Detailed hands-on examples for individual RRI predictions as well as large-scale target prediction scenarios are provided. We illustrate the flexibility and capabilities of IntaRNA through the examples. Each example is designed using real-life data from the literature and is accompanied by instructions on interpreting the respective results from IntaRNA output. Our use-case driven instructions enable non-expert users to comprehensively understand and utilize IntaRNA's features for effective RRI predictions.

Computational Biology , Software , Computational Biology/methods , RNA/genetics , RNA/metabolism , Algorithms , Humans , MicroRNAs/genetics , MicroRNAs/metabolism

18.

Evolutionary Structure Conservation and Covariance Scores.

Eggenhofer, Florian; Höner Zu Siederdissen, Christian.

Methods Mol Biol ; 2726: 255-284, 2024.

Article En | MEDLINE | ID: mdl-38780735

Effective homology search for non-coding RNAs is frequently not possible via sequence similarity alone. Current methods leverage evolutionary information like structure conservation or covariance scores to identify homologs in organisms that are phylogenetically more distant. In this chapter, we introduce the theoretical background of evolutionary structure conservation and covariance score, and we show hands-on how current methods in the field are applied on example datasets.

Computational Biology , Evolution, Molecular , Computational Biology/methods , Phylogeny , Algorithms , RNA, Untranslated/genetics , Conserved Sequence , Humans , Animals , Software , Sequence Alignment/methods

19.

Developing Complex RNA Design Applications in the Infrared Framework.

Yao, Hua-Ting; Ponty, Yann; Will, Sebastian.

Methods Mol Biol ; 2726: 285-313, 2024.

Article En | MEDLINE | ID: mdl-38780736

Applications in biotechnology and bio-medical research call for effective strategies to design novel RNAs with very specific properties. Such advanced design tasks require support by computational tools but at the same time put high demands on their flexibility and expressivity to model the application-specific requirements. To address such demands, we present the computational framework Infrared. It supports developing advanced customized design tools, which generate RNA sequences with specific properties, often in a few lines of Python code. This text guides the reader in tutorial format through the development of complex design applications. Thanks to the declarative, compositional approach of Infrared, we can describe this development as a step-by-step extension of an elementary design task. Thus, we start with generating sequences that are compatible with a single RNA structure and go all the way to RNA design targeting complex positive and negative design objectives with respect to single or even multiple target structures. Finally, we present a "real-world" application of computational design to create an RNA device for biotechnology: we use Infrared to generate design candidates of an artificial "AND" riboswitch, which activates gene expression in the simultaneous presence of two different small metabolites. In these applications, we exploit that the system can generate, in an efficient (fixed-parameter tractable) way, multiple diverse designs that satisfy a number of constraints and have high quality w.r.t. to an objective (by sampling from a Boltzmann distribution).

Computational Biology , Nucleic Acid Conformation , RNA , Software , RNA/genetics , RNA/chemistry , Computational Biology/methods , Riboswitch/genetics , Biotechnology/methods

20.

The Multiscale Ernwin/SPQR RNA Structure Prediction Pipeline.

Thiel, Bernhard C; Poblete, Simón; Hofacker, Ivo L.

Methods Mol Biol ; 2726: 377-399, 2024.

Article En | MEDLINE | ID: mdl-38780739

Aside from the well-known role in protein synthesis, RNA can perform catalytic, regulatory, and other essential biological functions which are determined by its three-dimensional structure. In this regard, a great effort has been made during the past decade to develop computational tools for the prediction of the structure of RNAs from the knowledge of their sequence, incorporating experimental data to refine or guide the modeling process. Nevertheless, this task can become exceptionally challenging when dealing with long noncoding RNAs, constituted by more than 200 nucleotides, due to their large size and the specific interactions involved. In this chapter, we describe a multiscale approach to predict such structures, incorporating SAXS experimental data into a hierarchical procedure which couples two coarse-grained representations: Ernwin, a helix-based approach, which deals with the global arrangement of secondary structure elements, and SPQR, a nucleotide-centered coarse-grained model, which corrects and refines the structures predicted at the coarser level.We describe the methodology through its application on the Braveheart long noncoding RNA, starting from the SAXS and secondary structure data to propose a refined, all-atom structure.

Nucleic Acid Conformation , RNA, Long Noncoding , Scattering, Small Angle , X-Ray Diffraction , RNA, Long Noncoding/chemistry , RNA, Long Noncoding/genetics , X-Ray Diffraction/methods , Computational Biology/methods , Software , Models, Molecular , RNA/chemistry , RNA/genetics , Algorithms