Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 54
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Methods Mol Biol ; 2726: 143-168, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38780731

RESUMEN

The 3D structures of many ribonucleic acid (RNA) loops are characterized by highly organized networks of non-canonical interactions. Multiple computational methods have been developed to annotate structures with those interactions or automatically identify recurrent interaction networks. By contrast, the reverse problem that aims to retrieve the geometry of a look from its sequence or ensemble of interactions remains much less explored. In this chapter, we will describe how to retrieve and build families of conserved structural motifs using their underlying network of non-canonical interactions. Then, we will show how to assign sequence alignments to those families and use the software BayesPairing to build statistical models of structural motifs with their associated sequence alignments. From this model, we will apply BayesPairing to identify in new sequences regions where those loop geometries can occur.


Asunto(s)
Emparejamiento Base , Biología Computacional , ARN , Programas Informáticos , Biología Computacional/métodos , ARN/química , ARN/genética , Conformación de Ácido Nucleico , Alineación de Secuencia/métodos , Algoritmos , Motivos de Nucleótidos , Teorema de Bayes , Modelos Moleculares
2.
Nat Biotechnol ; 2024 Apr 15.
Artículo en Inglés | MEDLINE | ID: mdl-38622344

RESUMEN

Citizen science video games are designed primarily for users already inclined to contribute to science, which severely limits their accessibility for an estimated community of 3 billion gamers worldwide. We created Borderlands Science (BLS), a citizen science activity that is seamlessly integrated within a popular commercial video game played by tens of millions of gamers. This integration is facilitated by a novel game-first design of citizen science games, in which the game design aspect has the highest priority, and a suitable task is then mapped to the game design. BLS crowdsources a multiple alignment task of 1 million 16S ribosomal RNA sequences obtained from human microbiome studies. Since its initial release on 7 April 2020, over 4 million players have solved more than 135 million science puzzles, a task unsolvable by a single individual. Leveraging these results, we show that our multiple sequence alignment simultaneously improves microbial phylogeny estimations and UniFrac effect sizes compared to state-of-the-art computational methods. This achievement demonstrates that hyper-gamified scientific tasks attract massive crowds of contributors and offers invaluable resources to the scientific community.

3.
Bioinformatics ; 40(2)2024 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-38291894

RESUMEN

MOTIVATION: Up to 75% of the human genome encodes RNAs. The function of many non-coding RNAs relies on their ability to fold into 3D structures. Specifically, nucleotides inside secondary structure loops form non-canonical base pairs that help stabilize complex local 3D structures. These RNA 3D motifs can promote specific interactions with other molecules or serve as catalytic sites. RESULTS: We introduce PERFUMES, a computational pipeline to identify 3D motifs that can be associated with observable features. Given a set of RNA sequences with associated binary experimental measurements, PERFUMES searches for RNA 3D motifs using BayesPairing2 and extracts those that are over-represented in the set of positive sequences. It also conducts a thermodynamics analysis of the structural context that can support the interpretation of the predictions. We illustrate PERFUMES' usage on the SNRPA protein binding site, for which the tool retrieved both previously known binder motifs and new ones. AVAILABILITY AND IMPLEMENTATION: PERFUMES is an open-source Python package (https://jwgitlab.cs.mcgill.ca/arnaud_chol/perfumes).


Asunto(s)
Perfumes , Humanos , Conformación de Ácido Nucleico , Motivos de Nucleótidos , Emparejamiento Base , ARN/química
4.
Biochem Mol Biol Educ ; 52(2): 145-155, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-37929794

RESUMEN

In the last decade, video games became a common vehicle for citizen science initiatives in life science, allowing participants to contribute to real scientific data analysis while learning about it. Since 2010, our scientific discovery game (SDG) Phylo enlists participants in comparative genomic data analysis. It is frequently used as a learning tool, but the activities were difficult to aggregate to build a coherent teaching activity. Here, we describe a strategy and series of recipes to facilitate the integration of SDGs in courses and implement this approach in Phylo. We developed new roles and functionalities enabling instructors to create assignments and monitor the progress of students. A story mode progressively introduces comparative genomics concepts, allowing users to learn and contribute to the analysis of real genomic sequences. Preliminary results from a user study suggest this framework may help to boost user motivation and clarify pedagogical objectives.


Asunto(s)
Ciencia Ciudadana , Humanos , Aprendizaje , Genómica/métodos , Estudiantes , Motivación
5.
R Soc Open Sci ; 9(5): 211189, 2022 May.
Artículo en Inglés | MEDLINE | ID: mdl-35620007

RESUMEN

Clustering is a central task in many data analysis applications. However, there is no universally accepted metric to decide the occurrence of clusters. Ultimately, we have to resort to a consensus between experts. The problem is amplified with high-dimensional datasets where classical distances become uninformative and the ability of humans to fully apprehend the distribution of the data is challenged. In this paper, we design a mobile human-computing game as a tool to query human perception for the multidimensional data clustering problem. We propose two clustering algorithms that partially or entirely rely on aggregated human answers and report the results of two experiments conducted on synthetic and real-world datasets. We show that our methods perform on par or better than the most popular automated clustering algorithms. Our results suggest that hybrid systems leveraging annotations of partial datasets collected through crowdsourcing platforms can be an efficient strategy to capture the collective wisdom for solving abstract computational problems.

6.
Bioinformatics ; 38(4): 970-976, 2022 01 27.
Artículo en Inglés | MEDLINE | ID: mdl-34791045

RESUMEN

MOTIVATION: RNA 3D motifs are recurrent substructures, modeled as networks of base pair interactions, which are crucial for understanding structure-function relationships. The task of automatically identifying such motifs is computationally hard, and remains a key challenge in the field of RNA structural biology and network analysis. State-of-the-art methods solve special cases of the motif problem by constraining the structural variability in occurrences of a motif, and narrowing the substructure search space. RESULTS: Here, we relax these constraints by posing the motif finding problem as a graph representation learning and clustering task. This framing takes advantage of the continuous nature of graph representations to model the flexibility and variability of RNA motifs in an efficient manner. We propose a set of node similarity functions, clustering methods and motif construction algorithms to recover flexible RNA motifs. Our tool, Vernal can be easily customized by users to desired levels of motif flexibility, abundance and size. We show that Vernal is able to retrieve and expand known classes of motifs, as well as to propose novel motifs. AVAILABILITY AND IMPLEMENTATION: The source code, data and a webserver are available at vernal.cs.mcgill.ca. We also provide a flexible interface and a user-friendly webserver to browse and download our results. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , ARN , ARN/química , Motivos de Nucleótidos , Programas Informáticos , Emparejamiento Base , Biología Computacional
7.
Bioinformatics ; 38(5): 1458-1459, 2022 02 07.
Artículo en Inglés | MEDLINE | ID: mdl-34908108

RESUMEN

SUMMARY: RNA 3D architectures are stabilized by sophisticated networks of (non-canonical) base pair interactions, which can be conveniently encoded as multi-relational graphs and efficiently exploited by graph theoretical approaches and recent progresses in machine learning techniques. RNAglib is a library that eases the use of this representation, by providing clean data, methods to load it in machine learning pipelines and graph-based deep learning models suited for this representation. RNAglib also offers other utilities to model RNA with 2.5 D graphs, such as drawing tools, comparison functions or baseline performances on RNA applications. AVAILABILITY AND IMPLEMENTATION: The method is distributed as a pip package, RNAglib. Data are available in a repository and can be accessed on rnaglib's web page. The source code, data and documentation are available at https://rnaglib.cs.mcgill.ca. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Bibliotecas , Programas Informáticos , Aprendizaje Automático , Documentación , Biblioteca de Genes
8.
PLoS Comput Biol ; 17(5): e1008990, 2021 05.
Artículo en Inglés | MEDLINE | ID: mdl-34048427

RESUMEN

RNA tertiary structure is crucial to its many non-coding molecular functions. RNA architecture is shaped by its secondary structure composed of stems, stacked canonical base pairs, enclosing loops. While stems are precisely captured by free-energy models, loops composed of non-canonical base pairs are not. Nor are distant interactions linking together those secondary structure elements (SSEs). Databases of conserved 3D geometries (a.k.a. modules) not captured by energetic models are leveraged for structure prediction and design, but the computational complexity has limited their study to local elements, loops. Representing the RNA structure as a graph has recently allowed to expend this work to pairs of SSEs, uncovering a hierarchical organization of these 3D modules, at great computational cost. Systematically capturing recurrent patterns on a large scale is a main challenge in the study of RNA structures. In this paper, we present an efficient algorithm to compute maximal isomorphisms in edge colored graphs. We extend this algorithm to a framework well suited to identify RNA modules, and fast enough to considerably generalize previous approaches. To exhibit the versatility of our framework, we first reproduce results identifying all common modules spanning more than 2 SSEs, in a few hours instead of weeks. The efficiency of our new algorithm is demonstrated by computing the maximal modules between any pair of entire RNA in the non-redundant corpus of known RNA 3D structures. We observe that the biggest modules our method uncovers compose large shared sub-structure spanning hundreds of nucleotides and base pairs between the ribosomes of Thermus thermophilus, Escherichia Coli, and Pseudomonas aeruginosa.


Asunto(s)
Conformación de Ácido Nucleico , ARN/química , Algoritmos , Emparejamiento Base , Biología Computacional/métodos
9.
Methods Mol Biol ; 2284: 17-42, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33835435

RESUMEN

Modeling the three-dimensional structure of RNAs is a milestone toward better understanding and prediction of nucleic acids molecular functions. Physics-based approaches and molecular dynamics simulations are not tractable on large molecules with all-atom models. To address this issue, coarse-grained models of RNA three-dimensional structures have been developed. In this chapter, we describe a graphical modeling based on the Leontis-Westhof extended base pair classification. This representation of RNA structures enables us to identify highly conserved structural motifs with complex nucleotide interactions in structure databases. We show how to take advantage of this knowledge to quickly predict three-dimensional structures of large RNA molecules and present the RNA-MoIP web server (http://rnamoip.cs.mcgill.ca) that streamlines the computational and visualization processes. Finally, we show recent advances in the prediction of local 3D motifs from sequence data with the BayesPairing software and discuss its impact toward complete 3D structure prediction.


Asunto(s)
ARN/química , Biología Sintética/métodos , Animales , Emparejamiento Base , Diseño Asistido por Computadora , Humanos , Modelos Moleculares , Conformación Molecular , Simulación de Dinámica Molecular , Conformación de Ácido Nucleico , Programas Informáticos
11.
J Chem Inf Model ; 60(12): 5658-5666, 2020 12 28.
Artículo en Inglés | MEDLINE | ID: mdl-32986426

RESUMEN

Ligand-based drug design has recently benefited from the development of deep generative models. These models enable extensive explorations of the chemical space and provide a platform for molecular optimization. However, the vast majority of current methods does not leverage the structure of the binding target, which potentiates the binding of small molecules and plays a key role in the interaction. We propose an optimization pipeline that leverages complementary structure-based and ligand-based methods. Instead of performing docking on a fixed chemical library, we iteratively select promising compounds in the full chemical space using a ligand-centered generative model. Molecular docking is then used as an oracle to guide compound optimization. This allows for iterative generation of compounds that fit the target structure better and better, without prior knowledge about bioactives. For this purpose, we introduce a new graph to Selfies Variational Autoencoder (VAE) which benefits from an 18-fold faster decoding than the graph to graph state of the art, while achieving a similar performance. We then successfully optimize the generation of molecules toward high docking scores, enabling a 10-fold enrichment of high-scoring compounds found with a fixed computational cost.


Asunto(s)
Descubrimiento de Drogas , Timolol , Diseño de Fármacos , Ligandos , Simulación del Acoplamiento Molecular
12.
Nucleic Acids Res ; 48(14): 7690-7699, 2020 08 20.
Artículo en Inglés | MEDLINE | ID: mdl-32652015

RESUMEN

RNA-small molecule binding is a key regulatory mechanism which can stabilize 3D structures and activate molecular functions. The discovery of RNA-targeting compounds is thus a current topic of interest for novel therapies. Our work is a first attempt at bringing the scalability and generalization abilities of machine learning methods to the problem of RNA drug discovery, as well as a step towards understanding the interactions which drive binding specificity. Our tool, RNAmigos, builds and encodes a network representation of RNA structures to predict likely ligands for novel binding sites. We subject ligand predictions to virtual screening and show that we are able to place the true ligand in the 71st-73rd percentile in two decoy libraries, showing a significant improvement over several baselines, and a state of the art method. Furthermore, we observe that augmenting structural networks with non-canonical base pairing data is the only representation able to uncover a significant signal, suggesting that such interactions are a necessary source of binding specificity. We also find that pre-training with an auxiliary graph representation learning task significantly boosts performance of ligand prediction. This finding can serve as a general principle for RNA structure-function prediction when data is scarce. RNAmigos shows that RNA binding data contains structural patterns with potential for drug discovery, and provides methodological insights for possible applications to other structure-function learning tasks. The source code, data and a Web server are freely available at http://rnamigos.cs.mcgill.ca.


Asunto(s)
ARN/química , Programas Informáticos , Emparejamiento Base , Sitios de Unión , Ligandos , Conformación de Ácido Nucleico
13.
J Comput Biol ; 27(3): 390-402, 2020 03.
Artículo en Inglés | MEDLINE | ID: mdl-32160035

RESUMEN

The growing number of RNA-mediated regulation mechanisms identified in the past decades suggests a widespread impact of RNA-RNA interactions. The efficiency of the regulation relies on highly specific and coordinated interactions while simultaneously repressing the formation of opportunistic complexes. However, the analysis of RNA interactomes is highly challenging because of the large number of potential partners, discrepancy of the size of RNA families, and the inherent noise in interaction predictions. We designed a recursive two-step cross-validation pipeline to capture the specificity of noncoding RNA (ncRNA) messenger RNA (mRNA) interactomes. Our method has been designed to detect significant loss or gain of specificity between ncRNA-mRNA interaction profiles. Applied to small nucleolar RNA-mRNA in Saccharomyces cerevisiae, our results suggest the existence of a repression of ncRNA affinities with mRNAs and thus the existence of an evolutionary pressure leveling down such interactions.


Asunto(s)
Biología Computacional/métodos , ARN Mensajero/metabolismo , ARN no Traducido/metabolismo , Saccharomyces cerevisiae/genética , Bases de Datos Genéticas , Regulación Fúngica de la Expresión Génica , Redes Reguladoras de Genes , ARN de Hongos/metabolismo
14.
Bioinformatics ; 36(9): 2920-2922, 2020 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-31971575

RESUMEN

SUMMARY: RNA design has conceptually evolved from the inverse RNA folding problem. In the classical inverse RNA problem, the user inputs an RNA secondary structure and receives an output RNA sequence that folds into it. Although modern RNA design methods are based on the same principle, a finer control over the resulting sequences is sought. As an important example, a substantial number of non-coding RNA families show high preservation in specific regions, while being more flexible in others and this information should be utilized in the design. By using the additional information, RNA design tools can help solve problems of practical interest in the growing fields of synthetic biology and nanotechnology. incaRNAfbinv 2.0 utilizes a fragment-based approach, enabling a control of specific RNA secondary structure motifs. The new version allows significantly more control over the general RNA shape, and also allows to express specific restrictions over each motif separately, in addition to other advanced features. AVAILABILITY AND IMPLEMENTATION: incaRNAfbinv 2.0 is available through a standalone package and a web-server at https://www.cs.bgu.ac.il/incaRNAfbinv. Source code, command-line and GUI wrappers can be found at https://github.com/matandro/RNAsfbinv. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
ARN , Programas Informáticos , Motivos de Nucleótidos , ARN/genética , Pliegue del ARN , Análisis de Secuencia de ARN
15.
Bioinformatics ; 36(5): 1420-1428, 2020 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-31584628

RESUMEN

MOTIVATION: Protein folding is a dynamic process through which polypeptide chains reach their native 3D structures. Although the importance of this mechanism is widely acknowledged, very few high-throughput computational methods have been developed to study it. RESULTS: In this paper, we report a computational platform named P3Fold that combines statistical and evolutionary information for predicting and analyzing protein folding routes. P3Fold uses coarse-grained modeling and efficient combinatorial schemes to predict residue contacts and evaluate the folding routes of a protein sequence within minutes or hours. To facilitate access to this technology, we devise graphical representations and implement an interactive web interface that allows end-users to leverage P3Fold predictions. Finally, we use P3Fold to conduct large and short scale experiments on the human proteome that reveal the broad conservation and variations of structural intermediates within protein families. AVAILABILITY AND IMPLEMENTATION: A Web server of P3Fold is freely available at http://csb.cs.mcgill.ca/P3Fold. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Pliegue de Proteína , Programas Informáticos , Secuencia de Aminoácidos , Computadores , Humanos , Proteoma
16.
RNA ; 25(12): 1579-1591, 2019 12.
Artículo en Inglés | MEDLINE | ID: mdl-31467146

RESUMEN

The RNA world hypothesis relies on the ability of ribonucleic acids to spontaneously acquire complex structures capable of supporting essential biological functions. Multiple sophisticated evolutionary models have been proposed for their emergence, but they often assume specific conditions. In this work, we explore a simple and parsimonious scenario describing the emergence of complex molecular structures at the early stages of life. We show that at specific GC content regimes, an undirected replication model is sufficient to explain the apparition of multibranched RNA secondary structures-a structural signature of many essential ribozymes. We ran a large-scale computational study to map energetically stable structures on complete mutational networks of 50-nt-long RNA sequences. Our results reveal that the sequence landscape with stable structures is enriched with multibranched structures at a length scale coinciding with the appearance of complex structures in RNA databases. A random replication mechanism preserving a 50% GC content may suffice to explain a natural enrichment of stable complex structures in populations of functional RNAs. In contrast, an evolutionary mechanism eliciting the most stable folds at each generation appears to help reaching multibranched structures at highest GC content.


Asunto(s)
Conformación de Ácido Nucleico , ARN/química , Composición de Base , Secuencia de Bases , Evolución Molecular , Mutación , ARN/genética , Pliegue del ARN , Estabilidad del ARN , Relación Estructura-Actividad , Transcripción Genética
17.
J Chem Inf Model ; 59(6): 2941-2951, 2019 06 24.
Artículo en Inglés | MEDLINE | ID: mdl-30998377

RESUMEN

Over the past two decades, interests in DNA and RNA as drug targets have been growing rapidly. Following the trends observed with protein drug targets, computational approaches for drug design have been developed for this new class of molecules. Our efforts toward the development of a universal docking program, Fitted, led us to focus on nucleic acids. Throughout the development of this docking program, efforts were directed toward displaceable water molecules which must be accurately located for optimal docking-based drug discovery. However, although there is a plethora of methods to place water molecules in and around protein structures, there is, to the best of our knowledge, no such fully automated method for nucleic acids, which are significantly more polar and solvated than proteins. We report herein a new method, Splash'Em (Solvation Potential Laid around Statistical Hydration on Entire Macromolecules) developed to place water molecules within the binding cavity of nucleic acids. This fast method was shown to have high agreement with water positions in crystal structures and will therefore provide essential information to medicinal chemists.


Asunto(s)
ADN/química , ADN/metabolismo , ARN/química , ARN/metabolismo , Agua/química , Enlace de Hidrógeno , Ligandos , Modelos Moleculares , Conformación de Ácido Nucleico
18.
Nucleic Acids Res ; 47(7): 3321-3332, 2019 04 23.
Artículo en Inglés | MEDLINE | ID: mdl-30828711

RESUMEN

RNA structures possess multiple levels of structural organization. A secondary structure, made of Watson-Crick helices connected by loops, forms a scaffold for the tertiary structure. The 3D structures adopted by these loops are therefore critical determinants shaping the global 3D architecture. Earlier studies showed that these local 3D structures can be described as conserved sets of ordered non-Watson-Crick base pairs called RNA structural modules. Unfortunately, the computational efficiency and scope of the current 3D module identification methods are too limited yet to benefit from all the knowledge accumulated in the module databases. We present BayesPairing, an automated, efficient and customizable tool for (i) building Bayesian networks representing RNA 3D modules and (ii) rapid identification of 3D modules in sequences. BayesPairing uses a flexible definition of RNA 3D modules that allows us to consider complex architectures such as multi-branched loops and features multiple algorithmic improvements. We benchmarked our methods using cross-validation techniques on 3409 RNA chains and show that BayesPairing achieves up to ∼70% identification accuracy on module positions and base pair interactions. BayesPairing can handle a broader range of motifs (versatility) and offers considerable running time improvements (efficiency), opening the door to a broad range of large-scale applications.


Asunto(s)
Emparejamiento Base , Teorema de Bayes , ARN/química , Automatización , Bases de Datos Genéticas , Conjuntos de Datos como Asunto , Reproducibilidad de los Resultados , Factores de Tiempo
19.
Eur J Med Chem ; 168: 414-425, 2019 Apr 15.
Artículo en Inglés | MEDLINE | ID: mdl-30831409

RESUMEN

Since the development of the first docking program in 1982, the use of docking-based in silico screening for potentially bioactive molecule discovery has become a common strategy in academia and pharmaceutical industry. Up until recently, application of docking programs has largely focused on drugs binding to proteins. However, with the discovery of promising drug targets in nucleic acids, including RNA riboswitches, DNA G-quadruplexes, and extended repeats in RNA, there has been greater interests in developing drugs for nucleic acids. However, due to major biochemical and physical differences in charges, binding pockets, and solvation, existing docking programs, developed for proteins, face difficulties when adopted directly for nucleic acids. In this review, we cover the current field of in silico docking to nucleic acids, available programs, as well as challenges faced in the field.


Asunto(s)
ADN/química , Simulación del Acoplamiento Molecular , ARN/química , Bibliotecas de Moléculas Pequeñas/química , Estructura Molecular
20.
Methods ; 142: 74-80, 2018 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-29792917

RESUMEN

The field of 3D genomics grew at increasing rates in the last decade. The volume and complexity of 2D and 3D data produced is progressively outpacing the capacities of the technology previously used for distributing genome sequences. The emergence of new technologies provides also novel opportunities for the development of innovative approaches. In this paper, we review the state-of-the-art computing technology, as well as the solutions adopted by the platforms currently available.


Asunto(s)
Macrodatos , Mapeo Cromosómico , Análisis de Datos , Genoma/genética , Imagenología Tridimensional , Nube Computacional , ADN/química , ADN/genética , Bases de Datos Genéticas , Genómica/métodos , Conformación de Ácido Nucleico
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...