RESUMO
MOTIVATION: Large-scale clinical proteomics datasets of infectious pathogens, combined with antimicrobial resistance outcomes, have recently opened the door for machine learning models which aim to improve clinical treatment by predicting resistance early. However, existing prediction frameworks typically train a separate model for each antimicrobial and species in order to predict a pathogen's resistance outcome, resulting in missed opportunities for chemical knowledge transfer and generalizability. RESULTS: We demonstrate the effectiveness of multimodal learning over proteomic and chemical features by exploring two clinically relevant tasks for our proposed deep learning models: drug recommendation and generalized resistance prediction. By adopting this multi-view representation of the pathogenic samples and leveraging the scale of the available datasets, our models outperformed the previous single-drug and single-species predictive models by statistically significant margins. We extensively validated the multi-drug setting, highlighting the challenges in generalizing beyond the training data distribution, and quantitatively demonstrate how suitable representations of antimicrobial drugs constitute a crucial tool in the development of clinically relevant predictive models. AVAILABILITY AND IMPLEMENTATION: The code used to produce the results presented in this article is available at https://github.com/BorgwardtLab/MultimodalAMR.
Assuntos
Antibacterianos , Proteômica , Farmacorresistência Bacteriana , Aprendizado de MáquinaRESUMO
SUMMARY: RNA 3D architectures are stabilized by sophisticated networks of (non-canonical) base pair interactions, which can be conveniently encoded as multi-relational graphs and efficiently exploited by graph theoretical approaches and recent progresses in machine learning techniques. RNAglib is a library that eases the use of this representation, by providing clean data, methods to load it in machine learning pipelines and graph-based deep learning models suited for this representation. RNAglib also offers other utilities to model RNA with 2.5 D graphs, such as drawing tools, comparison functions or baseline performances on RNA applications. AVAILABILITY AND IMPLEMENTATION: The method is distributed as a pip package, RNAglib. Data are available in a repository and can be accessed on rnaglib's web page. The source code, data and documentation are available at https://rnaglib.cs.mcgill.ca. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Bibliotecas , Software , Aprendizado de Máquina , Documentação , Biblioteca GênicaRESUMO
MOTIVATION: RNA 3D motifs are recurrent substructures, modeled as networks of base pair interactions, which are crucial for understanding structure-function relationships. The task of automatically identifying such motifs is computationally hard, and remains a key challenge in the field of RNA structural biology and network analysis. State-of-the-art methods solve special cases of the motif problem by constraining the structural variability in occurrences of a motif, and narrowing the substructure search space. RESULTS: Here, we relax these constraints by posing the motif finding problem as a graph representation learning and clustering task. This framing takes advantage of the continuous nature of graph representations to model the flexibility and variability of RNA motifs in an efficient manner. We propose a set of node similarity functions, clustering methods and motif construction algorithms to recover flexible RNA motifs. Our tool, Vernal can be easily customized by users to desired levels of motif flexibility, abundance and size. We show that Vernal is able to retrieve and expand known classes of motifs, as well as to propose novel motifs. AVAILABILITY AND IMPLEMENTATION: The source code, data and a webserver are available at vernal.cs.mcgill.ca. We also provide a flexible interface and a user-friendly webserver to browse and download our results. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Algoritmos , RNA , RNA/química , Motivos de Nucleotídeos , Software , Pareamento de Bases , Biologia ComputacionalRESUMO
RNA-small molecule binding is a key regulatory mechanism which can stabilize 3D structures and activate molecular functions. The discovery of RNA-targeting compounds is thus a current topic of interest for novel therapies. Our work is a first attempt at bringing the scalability and generalization abilities of machine learning methods to the problem of RNA drug discovery, as well as a step towards understanding the interactions which drive binding specificity. Our tool, RNAmigos, builds and encodes a network representation of RNA structures to predict likely ligands for novel binding sites. We subject ligand predictions to virtual screening and show that we are able to place the true ligand in the 71st-73rd percentile in two decoy libraries, showing a significant improvement over several baselines, and a state of the art method. Furthermore, we observe that augmenting structural networks with non-canonical base pairing data is the only representation able to uncover a significant signal, suggesting that such interactions are a necessary source of binding specificity. We also find that pre-training with an auxiliary graph representation learning task significantly boosts performance of ligand prediction. This finding can serve as a general principle for RNA structure-function prediction when data is scarce. RNAmigos shows that RNA binding data contains structural patterns with potential for drug discovery, and provides methodological insights for possible applications to other structure-function learning tasks. The source code, data and a Web server are freely available at http://rnamigos.cs.mcgill.ca.
Assuntos
RNA/química , Software , Pareamento de Bases , Sítios de Ligação , Ligantes , Conformação de Ácido NucleicoRESUMO
The RNA world hypothesis relies on the ability of ribonucleic acids to spontaneously acquire complex structures capable of supporting essential biological functions. Multiple sophisticated evolutionary models have been proposed for their emergence, but they often assume specific conditions. In this work, we explore a simple and parsimonious scenario describing the emergence of complex molecular structures at the early stages of life. We show that at specific GC content regimes, an undirected replication model is sufficient to explain the apparition of multibranched RNA secondary structures-a structural signature of many essential ribozymes. We ran a large-scale computational study to map energetically stable structures on complete mutational networks of 50-nt-long RNA sequences. Our results reveal that the sequence landscape with stable structures is enriched with multibranched structures at a length scale coinciding with the appearance of complex structures in RNA databases. A random replication mechanism preserving a 50% GC content may suffice to explain a natural enrichment of stable complex structures in populations of functional RNAs. In contrast, an evolutionary mechanism eliciting the most stable folds at each generation appears to help reaching multibranched structures at highest GC content.
Assuntos
Conformação de Ácido Nucleico , RNA/química , Composição de Bases , Sequência de Bases , Evolução Molecular , Mutação , RNA/genética , Dobramento de RNA , Estabilidade de RNA , Relação Estrutura-Atividade , Transcrição GênicaRESUMO
RNA structures possess multiple levels of structural organization. A secondary structure, made of Watson-Crick helices connected by loops, forms a scaffold for the tertiary structure. The 3D structures adopted by these loops are therefore critical determinants shaping the global 3D architecture. Earlier studies showed that these local 3D structures can be described as conserved sets of ordered non-Watson-Crick base pairs called RNA structural modules. Unfortunately, the computational efficiency and scope of the current 3D module identification methods are too limited yet to benefit from all the knowledge accumulated in the module databases. We present BayesPairing, an automated, efficient and customizable tool for (i) building Bayesian networks representing RNA 3D modules and (ii) rapid identification of 3D modules in sequences. BayesPairing uses a flexible definition of RNA 3D modules that allows us to consider complex architectures such as multi-branched loops and features multiple algorithmic improvements. We benchmarked our methods using cross-validation techniques on 3409 RNA chains and show that BayesPairing achieves up to â¼70% identification accuracy on module positions and base pair interactions. BayesPairing can handle a broader range of motifs (versatility) and offers considerable running time improvements (efficiency), opening the door to a broad range of large-scale applications.
Assuntos
Pareamento de Bases , Teorema de Bayes , RNA/química , Automação , Bases de Dados Genéticas , Conjuntos de Dados como Assunto , Reprodutibilidade dos Testes , Fatores de TempoRESUMO
Ligand-based drug design has recently benefited from the development of deep generative models. These models enable extensive explorations of the chemical space and provide a platform for molecular optimization. However, the vast majority of current methods does not leverage the structure of the binding target, which potentiates the binding of small molecules and plays a key role in the interaction. We propose an optimization pipeline that leverages complementary structure-based and ligand-based methods. Instead of performing docking on a fixed chemical library, we iteratively select promising compounds in the full chemical space using a ligand-centered generative model. Molecular docking is then used as an oracle to guide compound optimization. This allows for iterative generation of compounds that fit the target structure better and better, without prior knowledge about bioactives. For this purpose, we introduce a new graph to Selfies Variational Autoencoder (VAE) which benefits from an 18-fold faster decoding than the graph to graph state of the art, while achieving a similar performance. We then successfully optimize the generation of molecules toward high docking scores, enabling a 10-fold enrichment of high-scoring compounds found with a fixed computational cost.
Assuntos
Descoberta de Drogas , Timolol , Desenho de Fármacos , Ligantes , Simulação de Acoplamento MolecularRESUMO
BACKGROUND: Natural antisense transcripts (NATs) are regulatory RNAs that contain sequence complementary to other RNAs, these other RNAs usually being messenger RNAs. In eukaryotic genomes, cis-NATs overlap the gene they complement. RESULTS: Here, our goal is to analyze the distribution and evolutionary conservation of cis-NATs for a variety of available data sets for Arabidopsis thaliana, to gain insights into cis-NAT functional mechanisms and their significance. Cis-NATs derived from traditional sequencing are largely validated by other data sets, although different cis-NAT data sets have different prevalent cis-NAT topologies with respect to overlapping protein-coding genes. A. thaliana cis-NATs have substantial conservation (28-35% in the three substantive data sets analyzed) of expression in A. lyrata. We examined evolutionary sequence conservation at cis-NAT loci in Arabidopsis thaliana across nine sequenced Brassicaceae species (picked for optimal discernment of purifying selection), focussing on the parts of their sequences not overlapping protein-coding transcripts (dubbed 'NOLPs'). We found significant NOLP sequence conservation for 28-34% NATs across different cis-NAT sets. This NAT NOLP sequence conservation versus A. lyrata is generally significantly correlated with conservation of expression. We discover a significant enrichment of transcription factor binding sites (as evidenced by CHIP-seq data) in NOLPs compared to randomly sampled near-gene NOLP-like DNA , that is linked to significant sequence conservation. Conversely, there is no such evidence for a general significant link between NOLPs and formation of small interfering RNAs (siRNAs), with the substantial majority of unique siRNAs arising from the overlapping portions of the cis-NATs. CONCLUSIONS: In aggregate, our results suggest that many cis-NAT NOLPs function in the regulation of conserved promoter/regulatory elements that they 'over-hang'.
Assuntos
Arabidopsis/genética , RNA Antissenso/análise , RNA de Plantas/análise , RNA Interferente Pequeno/análise , Sítios de Ligação , Brassica/classificação , Brassica/genética , Sequência Conservada , Evolução Molecular , Regulação da Expressão Gênica de Plantas , RNA Interferente Pequeno/química , Análise de Sequência de RNA/métodosRESUMO
Tubulins are an ancient family of eukaryotic proteins characterized by an amino-terminal globular domain and disordered carboxyl terminus. These carboxyl termini play important roles in modulating the behavior of microtubules in living cells. However, the atomic-level basis of their function is not well understood. These regions contain multiple acidic residues and their overall charges are modulated in vivo by post-translational modifications, for example, phosphorylation. In this study, we describe an application of NMR and computer Monte Carlo simulations to investigate how the modification of local charge alters the conformational sampling of the γ-tubulin carboxyl terminus. We compared the dynamics of two 39-residue polypeptides corresponding to the carboxyl-terminus of yeast γ-tubulin. One polypeptide comprised the wild-type amino acid sequence while the second contained a Y > D mutation at Y11 in the polypeptide (Y445 in the full protein). This mutation introduces additional negative charge at a site that is phosphorylated in vivo and produces a phenotype with perturbed microtubule function. NMR relaxation measurements show that the Y11D mutation produces dramatic changes in the millisecond-timescale motions of the entire polypeptide. This observation is supported by Monte Carlo simulations that-similar to NMR-predict the WT γ-CT is largely unstructured and that the substitution of Tyr 11 with Asp causes the sampling of extended conformations that are unique to the Y11D polypeptide.
Assuntos
Substituição de Aminoácidos , Tubulina (Proteína)/química , Tubulina (Proteína)/genética , Tirosina/genética , Sequência de Aminoácidos , Sequência Conservada , Hidrodinâmica , Modelos Moleculares , Método de Monte Carlo , Ressonância Magnética Nuclear Biomolecular , Peptídeos/química , Peptídeos/genética , Fenótipo , Fosforilação , Domínios Proteicos , Estrutura Secundária de Proteína , TempoRESUMO
Understanding the connection between complex structural features of RNA and biological function is a fundamental challenge in evolutionary studies and in RNA design. However, building datasets of RNA 3D structures and making appropriate modeling choices remain time-consuming and lack standardization. In this chapter, we describe the use of rnaglib, to train supervised and unsupervised machine learning-based function prediction models on datasets of RNA 3D structures.
Assuntos
Biologia Computacional , Conformação de Ácido Nucleico , RNA , Software , RNA/química , RNA/genética , Biologia Computacional/métodos , Aprendizado de Máquina , Modelos MolecularesRESUMO
Se presenta el caso de un paciente masculino de 43 años, sin antecedentes, ni factores de riesgo para enfermedad coronaria, quien recibió trauma cerrado en región anterior de tórax con un balón, y seguidamente presentó dolor torácico típico; en urgencia se diagnosticó evento coronario agudo con elevación del segmento ST en cara anteroseptal; se realizó ecocardiograma que mostró trastornos segmentarios de la contractilidad, sin evidencia de líquido en pericardio. Se realizó fibrinolisis con estreptoquinasa evolucionando a la mejoría con desaparición del dolor torácico y normalización del ST en los electrocardiogramas (ECG) seriados...