Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
1.
Methods Mol Biol ; 2847: 17-31, 2025.
Artigo em Inglês | MEDLINE | ID: mdl-39312134

RESUMO

RNA is present in all domains of life. It was once thought to be solely involved in protein expression, but recent advances have revealed its crucial role in catalysis and gene regulation through noncoding RNA. With a growing interest in exploring RNAs with specific structures, there is an increasing focus on designing RNA structures for in vivo and in vitro experimentation and for therapeutics. The development of RNA secondary structure prediction methods has also spurred the growth of RNA design software. However, there are challenges to designing RNA sequences that meet secondary structure requirements. One major challenge is that the secondary structure design problem is likely NP-hard, making it computationally intensive. Another issue is that objective functions need to consider the folding ensemble of RNA molecules to avoid off target structures. In this chapter, we provide protocols for two software tools from the RNAstructure package: "Design" for structured RNA sequence design and "orega" for unstructured RNA sequence design.


Assuntos
Biologia Computacional , Conformação de Ácido Nucleico , RNA , Software , RNA/química , RNA/genética , Biologia Computacional/métodos , Dobramento de RNA , Análise de Sequência de RNA/métodos , Algoritmos
2.
Methods Mol Biol ; 2847: 121-135, 2025.
Artigo em Inglês | MEDLINE | ID: mdl-39312140

RESUMO

Fundamental to the diverse biological functions of RNA are its 3D structure and conformational flexibility, which enable single sequences to adopt a variety of distinct 3D states. Currently, computational RNA design tasks are often posed as inverse problems, where sequences are designed based on adopting a single desired secondary structure without considering 3D geometry and conformational diversity. In this tutorial, we present gRNAde, a geometric RNA design pipeline operating on sets of 3D RNA backbone structures to design sequences that explicitly account for RNA 3D structure and dynamics. gRNAde is a graph neural network that uses an SE (3) equivariant encoder-decoder framework for generating RNA sequences conditioned on backbone structures where the identities of the bases are unknown. We demonstrate the utility of gRNAde for fixed-backbone re-design of existing RNA structures of interest from the PDB, including riboswitches, aptamers, and ribozymes. gRNAde is more accurate in terms of native sequence recovery while being significantly faster compared to existing physics-based tools for 3D RNA inverse design, such as Rosetta.


Assuntos
Aprendizado Profundo , Conformação de Ácido Nucleico , RNA , Software , RNA/química , RNA/genética , Biologia Computacional/métodos , RNA Catalítico/química , RNA Catalítico/genética , Modelos Moleculares , Redes Neurais de Computação
3.
Bioessays ; : e2400155, 2024 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-39404756

RESUMO

The performance of deep Neural Networks (NNs) in the text (ChatGPT) and image (DALL-E2) domains has attracted worldwide attention. Convolutional NNs (CNNs), Large Language Models (LLMs), Denoising Diffusion Probabilistic Models (DDPMs)/Noise Conditional Score Networks (NCSNs), and Graph NNs (GNNs) have impacted computer vision, language editing and translation, automated conversation, image generation, and social network management. Proteins can be viewed as texts written with the alphabet of amino acids, as images, or as graphs of interacting residues. Each of these perspectives suggests the use of tools from a different area of deep learning for protein structural biology. Here, I review how CNNs, LLMs, DDPMs/NCSNs, and GNNs have led to major advances in protein structure prediction, inverse folding, protein design, and small molecule design. This review is primarily intended as a deep learning primer for practicing experimental structural biologists. However, extensive references to the deep learning literature should also make it relevant to readers who have a background in machine learning, physics or statistics, and an interest in protein structural biology.

4.
Front Immunol ; 15: 1322712, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38390326

RESUMO

Accurate computational identification of B-cell epitopes is crucial for the development of vaccines, therapies, and diagnostic tools. However, current structure-based prediction methods face limitations due to the dependency on experimentally solved structures. Here, we introduce DiscoTope-3.0, a markedly improved B-cell epitope prediction tool that innovatively employs inverse folding structure representations and a positive-unlabelled learning strategy, and is adapted for both solved and predicted structures. Our tool demonstrates a considerable improvement in performance over existing methods, accurately predicting linear and conformational epitopes across multiple independent datasets. Most notably, DiscoTope-3.0 maintains high predictive performance across solved, relaxed and predicted structures, alleviating the need for experimental structures and extending the general applicability of accurate B-cell epitope prediction by 3 orders of magnitude. DiscoTope-3.0 is made widely accessible on two web servers, processing over 100 structures per submission, and as a downloadable package. In addition, the servers interface with RCSB and AlphaFoldDB, facilitating large-scale prediction across over 200 million cataloged proteins. DiscoTope-3.0 is available at: https://services.healthtech.dtu.dk/service.php?DiscoTope-3.0.


Assuntos
Epitopos de Linfócito B , Conformação Molecular
5.
Synth Syst Biotechnol ; 9(2): 217-222, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38385151

RESUMO

The protein inverse folding problem, designing amino acid sequences that fold into desired protein structures, is a critical challenge in biological sciences. Despite numerous data-driven and knowledge-driven methods, there remains a need for a user-friendly toolkit that effectively integrates these approaches for in-silico protein design. In this paper, we present DIProT, an interactive protein design toolkit. DIProT leverages a non-autoregressive deep generative model to solve the inverse folding problem, combined with a protein structure prediction model. This integration allows users to incorporate prior knowledge into the design process, evaluate designs in silico, and form a virtual design loop with human feedback. Our inverse folding model demonstrates competitive performance in terms of effectiveness and efficiency on TS50 and CATH4.2 datasets, with promising sequence recovery and inference time. Case studies further illustrate how DIProT can facilitate user-guided protein design.

6.
Protein Eng Des Sel ; 372024 Jan 29.
Artigo em Inglês | MEDLINE | ID: mdl-38157313

RESUMO

Deep learning methods for protein sequence design focus on modeling and sampling the many- dimensional distribution of amino acid sequences conditioned on the backbone structure. To produce physically foldable sequences, inter-residue couplings need to be considered properly. These couplings are treated explicitly in iterative methods or autoregressive methods. Non-autoregressive models treating these couplings implicitly are computationally more efficient, but still await tests by wet experiment. Currently, sequence design methods are evaluated mainly using native sequence recovery rate and native sequence perplexity. These metrics can be complemented by sequence-structure compatibility metrics obtained from energy calculation or structure prediction. However, existing computational metrics have important limitations that may render the generalization of computational test results to performance in real applications unwarranted. Validation of design methods by wet experiments should be encouraged.


Assuntos
Aprendizado Profundo , Sequência de Aminoácidos , Dobramento de Proteína , Conformação Proteica
7.
BMC Bioinformatics ; 24(1): 373, 2023 Oct 03.
Artigo em Inglês | MEDLINE | ID: mdl-37789284

RESUMO

BACKGROUND: The relationship between the sequence of a protein, its structure, and the resulting connection between its structure and function, is a foundational principle in biological science. Only recently has the computational prediction of protein structure based only on protein sequence been addressed effectively by AlphaFold, a neural network approach that can predict the majority of protein structures with X-ray crystallographic accuracy. A question that is now of acute relevance is the "inverse protein folding problem": predicting the sequence of a protein that folds into a specified structure. This will be of immense value in protein engineering and biotechnology, and will allow the design and expression of recombinant proteins that can, for instance, fold into specified structures as a scaffold for the attachment of recombinant antigens, or enzymes with modified or novel catalytic activities. Here we describe the development of SeqPredNN, a feed-forward neural network trained with X-ray crystallographic structures from the RCSB Protein Data Bank to predict the identity of amino acids in a protein structure using only the relative positions, orientations, and backbone dihedral angles of nearby residues. RESULTS: We predict the sequence of a protein expected to fold into a specified structure and assess the accuracy of the prediction using both AlphaFold and RoseTTAFold to computationally generate the fold of the derived sequence. We show that the sequences predicted by SeqPredNN fold into a structure with a median TM-score of 0.638 when compared to the crystal structure according to AlphaFold predictions, yet these sequences are unique and only 28.4% identical to the sequence of the crystallized protein. CONCLUSIONS: We propose that SeqPredNN will be a valuable tool to generate proteins of defined structure for the design of novel biomaterials, pharmaceuticals, catalysts, and reporter systems. The low sequence identity of its predictions compared to the native sequence could prove useful for developing proteins with modified physical properties, such as water solubility and thermal stability. The speed and ease of use of SeqPredNN offers a significant advantage over physics-based protein design methods.


Assuntos
Redes Neurais de Computação , Proteínas , Sequência de Aminoácidos , Proteínas/química , Aminoácidos/química , Dobramento de Proteína
8.
RNA ; 29(6): 764-776, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-36868786

RESUMO

The design of new RNA sequences that retain the function of a model RNA structure is a challenge in bioinformatics because of the structural complexity of these molecules. RNA can fold into its secondary and tertiary structures by forming stem-loops and pseudoknots. A pseudoknot is a set of base pairs between a region within a stem-loop and nucleotides outside of this stem-loop; this motif is very important for numerous functional structures. It is important for any computational design algorithm to take into account these interactions to give a reliable result for any structures that include pseudoknots. In our study, we experimentally validated synthetic ribozymes designed by Enzymer, which implements algorithms allowing for the design of pseudoknots. Enzymer is a program that uses an inverse folding approach to design pseudoknotted RNAs; we used it in this study to design two types of ribozymes. The ribozymes tested were the hammerhead and the glmS, which have a self-cleaving activity that allows them to liberate the new RNA genome copy during rolling-circle replication or to control the expression of the downstream genes, respectively. We demonstrated the efficiency of Enzymer by showing that the pseudoknotted hammerhead and glmS ribozymes sequences it designed were extensively modified compared to wild-type sequences and were still active.


Assuntos
RNA Catalítico , RNA Catalítico/química , RNA/genética , RNA/química , Pareamento de Bases , Algoritmos , Nucleotídeos , Conformação de Ácido Nucleico
9.
BMC Bioinformatics ; 23(1): 335, 2022 Aug 13.
Artigo em Inglês | MEDLINE | ID: mdl-35964008

RESUMO

BACKGROUND: We study in this work the inverse folding problem for RNA, which is the discovery of sequences that fold into given target secondary structures. RESULTS: We implement a Lévy mutation scheme in an updated version of aRNAque an evolutionary inverse folding algorithm and apply it to the design of RNAs with and without pseudoknots. We find that the Lévy mutation scheme increases the diversity of designed RNA sequences and reduces the average number of evaluations of the evolutionary algorithm. Compared to antaRNA, aRNAque CPU time is higher but more successful in finding designed sequences that fold correctly into the target structures. CONCLUSION: We propose that a Lévy flight offers a better standard mutation scheme for optimizing RNA design. Our new version of aRNAque is available on GitHub as a python script and the benchmark results show improved performance on both Pseudobase++ and the Eterna100 datasets, compared to existing inverse folding tools.


Assuntos
Algoritmos , Dobramento de RNA , Conformação de Ácido Nucleico , RNA/química , Análise de Sequência de RNA/métodos
10.
Int J Mol Sci ; 22(21)2021 Oct 29.
Artigo em Inglês | MEDLINE | ID: mdl-34769173

RESUMO

Computational Protein Design (CPD) has produced impressive results for engineering new proteins, resulting in a wide variety of applications. In the past few years, various efforts have aimed at replacing or improving existing design methods using Deep Learning technology to leverage the amount of publicly available protein data. Deep Learning (DL) is a very powerful tool to extract patterns from raw data, provided that data are formatted as mathematical objects and the architecture processing them is well suited to the targeted problem. In the case of protein data, specific representations are needed for both the amino acid sequence and the protein structure in order to capture respectively 1D and 3D information. As no consensus has been reached about the most suitable representations, this review describes the representations used so far, discusses their strengths and weaknesses, and details their associated DL architecture for design and related tasks.


Assuntos
Biologia Computacional , Aprendizado Profundo , Engenharia de Proteínas , Proteínas , Domínios Proteicos , Proteínas/química , Proteínas/genética
11.
Methods Mol Biol ; 2167: 91-111, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-32712917

RESUMO

Pseudoknots are important motifs for stabilizing the structure of functional RNAs. As an example, pseudoknotted hammerhead ribozymes are highly active compared to minimal ribozymes. The design of new RNA sequences that retain the function of a model RNA structure includes taking in account pseudoknots presence in the structure, which is usually a challenge for bioinformatics tools. Our method includes using "Enzymer," a software for designing RNA sequences with desired secondary structures that may include pseudoknots. Enzymer implements an efficient stochastic search and optimization algorithm to sample RNA sequences from low ensemble defect mutational landscape of an initial design template to generate an RNA sequence that is predicted to fold into the desired target structure.


Assuntos
Biologia Computacional/métodos , Desenho Assistido por Computador , Conformação de Ácido Nucleico , RNA Catalítico/química , RNA Catalítico/genética , Biologia Sintética/métodos , Algoritmos , Sequência de Bases , Eletroforese em Gel de Ágar , Eletroforese em Gel de Poliacrilamida , Técnicas In Vitro , Cinética , Motivos de Nucleotídeos/genética , Reação em Cadeia da Polimerase/métodos , RNA/genética , Dobramento de RNA/genética , RNA Catalítico/metabolismo , Software , Transcrição Gênica
12.
Methods Mol Biol ; 2167: 113-143, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-32712918

RESUMO

Ribozymes are RNAs that catalyze reactions. They occur in nature, and can also be evolved in vitro to catalyze novel reactions. This chapter provides detailed protocols for using inverse folding software to design a ribozyme sequence that will fold to a known ribozyme secondary structure and for testing the catalytic activity of the sequence experimentally. This protocol is able to design sequences that include pseudoknots, which is important as all naturally occurring full-length ribozymes have pseudoknots. The starting point is the known pseudoknot-containing secondary structure of the ribozyme and knowledge of any nucleotides whose identity is required for function. The output of the protocol is a set of sequences that have been tested for function. Using this protocol, we were previously successful at designing highly active double-pseudoknotted HDV ribozymes.


Assuntos
Biologia Computacional/métodos , Vírus Delta da Hepatite/genética , Vírus Delta da Hepatite/metabolismo , RNA Catalítico/genética , RNA Catalítico/metabolismo , Sequência de Bases , Quadruplex G , Técnicas In Vitro , Cinética , Modelos Moleculares , Conformação de Ácido Nucleico , Motivos de Nucleotídeos/genética , Dobramento de RNA/genética , RNA Catalítico/química , Software , Transcrição Gênica
13.
Proteins ; 88(7): 819-829, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-31867753

RESUMO

Designing protein sequences that fold to a given three-dimensional (3D) structure has long been a challenging problem in computational structural biology with significant theoretical and practical implications. In this study, we first formulated this problem as predicting the residue type given the 3D structural environment around the C α atom of a residue, which is repeated for each residue of a protein. We designed a nine-layer 3D deep convolutional neural network (CNN) that takes as input a gridded box with the atomic coordinates and types around a residue. Several CNN layers were designed to capture structure information at different scales, such as bond lengths, bond angles, torsion angles, and secondary structures. Trained on a very large number of protein structures, the method, called ProDCoNN (protein design with CNN), achieved state-of-the-art performance when tested on large numbers of test proteins and benchmark datasets.


Assuntos
Redes Neurais de Computação , Engenharia de Proteínas/estatística & dados numéricos , Proteínas/química , Software , Sequência de Aminoácidos , Benchmarking , Bases de Dados de Proteínas , Conjuntos de Dados como Assunto , Engenharia de Proteínas/métodos , Estrutura Secundária de Proteína , Alinhamento de Sequência
14.
J Comput Biol ; 25(11): 1179-1192, 2018 11.
Artigo em Inglês | MEDLINE | ID: mdl-30133328

RESUMO

Recently, a framework considering ribonucleic acid (RNA) sequences and their RNA secondary structures as pairs has led to new information theoretic perspectives on how the semantics encoded in RNA sequences can be inferred. In this context, the pairing arises naturally from the energy model of RNA secondary structures. Fixing the sequence in the pairing produces the RNA energy landscape, whose partition function was discovered by McCaskill. Dually, fixing the structure induces the energy landscape of sequences. The latter has been considered for designing more efficient inverse folding algorithms. In this work, we present the dual partition function filtered by Hamming distance, together with a Boltzmann sampler using novel dynamic programming routines for the loop-based energy model. The time complexity of the algorithm is [Formula: see text], where [Formula: see text] are Hamming distance and sequence length, respectively, reducing the time complexity of samplers, reported in the literature by [Formula: see text]. We then present two applications, the first in the context of the evolution of natural sequence-structure pairs of microRNAs and the second in constructing neutral paths. The former studies the inverse folding rate (IFR) of sequence-structure pairs, filtered by Hamming distance, observing that such pairs evolve toward higher levels of robustness, that is, increasing IFR. The latter is an algorithm that constructs neutral paths: given two sequences in a neutral network, we employ the sampler to construct short paths connecting them, consisting of sequences all contained in the neutral network.


Assuntos
Algoritmos , Biologia Computacional/métodos , RNA/química , Sequência de Bases , Humanos , Modelos Moleculares , Conformação de Ácido Nucleico
15.
Methods ; 143: 90-101, 2018 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-29660485

RESUMO

This contribution sketches a work flow to design an RNA switch that is able to adapt two structural conformations in a ligand-dependent way. A well characterized RNA aptamer, i.e., knowing its Kd and adaptive structural features, is an essential ingredient of the described design process. We exemplify the principles using the well-known theophylline aptamer throughout this work. The aptamer in its ligand-binding competent structure represents one structural conformation of the switch while an alternative fold that disrupts the binding-competent structure forms the other conformation. To keep it simple we do not incorporate any regulatory mechanism to control transcription or translation. We elucidate a commonly used design process by explicitly dissecting and explaining the necessary steps in detail. We developed a novel objective function which specifies the mechanistics of this simple, ligand-triggered riboswitch and describe an extensive in silico analysis pipeline to evaluate important kinetic properties of the designed sequences. This protocol and the developed software can be easily extended or adapted to fit novel design scenarios and thus can serve as a template for future needs.


Assuntos
Aptâmeros de Nucleotídeos/síntese química , Biologia Computacional/métodos , Conformação de Ácido Nucleico , Riboswitch/genética , Aptâmeros de Nucleotídeos/genética , Biologia Computacional/instrumentação , Cinética , Ligantes , Dobramento de RNA , Software
16.
Genes Genet Syst ; 92(6): 257-265, 2018 May 03.
Artigo em Inglês | MEDLINE | ID: mdl-28757510

RESUMO

It has long been established that in addition to being involved in protein translation, RNA plays essential roles in numerous other cellular processes, including gene regulation and DNA replication. Such roles are known to be dictated by higher-order structures of RNA molecules. It is therefore of prime importance to find an RNA sequence that can fold to acquire a particular function that is desirable for use in pharmaceuticals and basic research. The challenge of finding an RNA sequence for a given structure is known as the RNA design problem. Although there are several algorithms to solve this problem, they mainly consider hard constraints, such as minimum free energy, to evaluate the predicted sequences. Recently, SHAPE data has emerged as a new soft constraint for RNA secondary structure prediction. To take advantage of this new experimental constraint, we report here a new method for accurate design of RNA sequences based on their secondary structures using SHAPE data as pseudo-free energy. We then compare our algorithm with four others: INFO-RNA, ERD, MODENA and RNAifold 2.0. Our algorithm precisely predicts 26 out of 29 new sequences for the structures extracted from the Rfam dataset, while the other four algorithms predict no more than 22 out of 29. The proposed algorithm is comparable to the above algorithms on RNA-SSD datasets, where they can predict up to 33 appropriate sequences for RNA secondary structures out of 34.


Assuntos
Dobramento de RNA/fisiologia , RNA/metabolismo , RNA/fisiologia , Algoritmos , Sequência de Bases , Simulação por Computador , Desenho Assistido por Computador , Conformação de Ácido Nucleico , Dobramento de RNA/genética , Software
17.
BMC Bioinformatics ; 18(1): 468, 2017 Nov 06.
Artigo em Inglês | MEDLINE | ID: mdl-29110632

RESUMO

BACKGROUND: Artificially synthesized RNA molecules provide important ways for creating a variety of novel functional molecules. State-of-the-art RNA inverse folding algorithms can design simple and short RNA sequences of specific GC content, that fold into the target RNA structure. However, their performance is not satisfactory in complicated cases. RESULT: We present a new inverse folding algorithm called MCTS-RNA, which uses Monte Carlo tree search (MCTS), a technique that has shown exceptional performance in Computer Go recently, to represent and discover the essential part of the sequence space. To obtain high accuracy, initial sequences generated by MCTS are further improved by a series of local updates. Our algorithm has an ability to control the GC content precisely and can deal with pseudoknot structures. Using common benchmark datasets for evaluation, MCTS-RNA showed a lot of promise as a standard method of RNA inverse folding. CONCLUSION: MCTS-RNA is available at https://github.com/tsudalab/MCTS-RNA .


Assuntos
Algoritmos , RNA/química , Internet , Método de Monte Carlo , Conformação de Ácido Nucleico , Dobramento de RNA , Análise de Sequência de RNA , Interface Usuário-Computador
18.
J Comput Biol ; 24(9): 851-862, 2017 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-28632429

RESUMO

Theoretical models of protein folding often make simplifying assumptions that allow analysis, yielding interesting theoretical results. In this article, we study models where folding dynamics is primarily driven by local topological features in an iterative manner. We illustrate the merit of the proposed approach through its ability to simulate realistic protein folding processes even when the sequence content information is reduced to just hydrophobic and polar. We then analyze our models and show that under our simple assumptions, certain structures are inherently unstable, and that determining whether structures can be stable is an [Formula: see text]-hard problem. Interestingly, we find that when our model has only two amino acids, the problem becomes solvable in polynomial time.


Assuntos
Modelos Teóricos , Dobramento de Proteína , Estabilidade Proteica
19.
Methods Mol Biol ; 1529: 21-94, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-27914045

RESUMO

Computational protein design (CPD), a yet evolving field, includes computer-aided engineering for partial or full de novo designs of proteins of interest. Designs are defined by a requested structure, function, or working environment. This chapter describes the birth and maturation of the field by presenting 101 CPD examples in a chronological order emphasizing achievements and pending challenges. Integrating these aspects presents the plethora of CPD approaches with the hope of providing a "CPD 101". These reflect on the broader structural bioinformatics and computational biophysics field and include: (1) integration of knowledge-based and energy-based methods, (2) hierarchical designated approach towards local, regional, and global motifs and the integration of high- and low-resolution design schemes that fit each such region, (3) systematic differential approaches towards different protein regions, (4) identification of key hot-spot residues and the relative effect of remote regions, (5) assessment of shape-complementarity, electrostatics and solvation effects, (6) integration of thermal plasticity and functional dynamics, (7) negative design, (8) systematic integration of experimental approaches, (9) objective cross-assessment of methods, and (10) successful ranking of potential designs. Future challenges also include dissemination of CPD software to the general use of life-sciences researchers and the emphasis of success within an in vivo milieu. CPD increases our understanding of protein structure and function and the relationships between the two along with the application of such know-how for the benefit of mankind. Applied aspects range from biological drugs, via healthier and tastier food products to nanotechnology and environmentally friendly enzymes replacing toxic chemicals utilized in the industry.


Assuntos
Biologia Computacional , Engenharia de Proteínas , Proteínas , Biologia Computacional/história , Biologia Computacional/métodos , Simulação por Computador , Enzimas/química , Enzimas/genética , Enzimas/metabolismo , História do Século XX , História do Século XXI , Proteínas de Membrana/química , Proteínas de Membrana/metabolismo , Engenharia de Proteínas/história , Engenharia de Proteínas/métodos , Proteínas/química , Proteínas/genética , Proteínas/metabolismo , Software
20.
J Mol Biol ; 428(5 Pt A): 748-757, 2016 Feb 27.
Artigo em Inglês | MEDLINE | ID: mdl-26902426

RESUMO

Designing RNAs that form specific secondary structures is enabling better understanding and control of living systems through RNA-guided silencing, genome editing and protein organization. Little is known, however, about which RNA secondary structures might be tractable for downstream sequence design, increasing the time and expense of design efforts due to inefficient secondary structure choices. Here, we present insights into specific structural features that increase the difficulty of finding sequences that fold into a target RNA secondary structure, summarizing the design efforts of tens of thousands of human participants and three automated algorithms (RNAInverse, INFO-RNA and RNA-SSD) in the Eterna massive open laboratory. Subsequent tests through three independent RNA design algorithms (NUPACK, DSS-Opt and MODENA) confirmed the hypothesized importance of several features in determining design difficulty, including sequence length, mean stem length, symmetry and specific difficult-to-design motifs such as zigzags. Based on these results, we have compiled an Eterna100 benchmark of 100 secondary structure design challenges that span a large range in design difficulty to help test future efforts. Our in silico results suggest new routes for improving computational RNA design methods and for extending these insights to assess "designability" of single RNA structures, as well as of switches for in vitro and in vivo applications.


Assuntos
Conformação de Ácido Nucleico , RNA/química , Análise de Sequência de RNA/métodos , Algoritmos , Biologia Computacional , Humanos , Modelos Moleculares , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA