Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 213
1.
Glycobiology ; 2024 Jun 05.
Article En | MEDLINE | ID: mdl-38836441

Heparan sulfate (HS), a sulfated polysaccharide abundant in the extracellular matrix, plays pivotal roles in various physiological and pathological processes by interacting with proteins. Investigating the binding selectivity of HS oligosaccharides to target proteins is essential, but the exhaustive inclusion of all possible oligosaccharides in microarray experiments is impractical. To address this challenge, we present a hybrid pipeline that integrates microarray and in silico techniques to design oligosaccharides with desired protein affinity. Using fibroblast growth factor 2 (FGF2) as a model protein, we assembled an in-house dataset of HS oligosaccharides on microarrays and developed two structural representations: a standard representation with all atoms explicit and a simplified representation with disaccharide units as "quasi-atoms." Predictive Quantitative Structure-Activity Relationship (QSAR) models for FGF2 affinity were developed using the Random Forest (RF) algorithm. The resulting models, considering the applicability domain, demonstrated high predictivity, with a correct classification rate of 0.81-0.80 and improved positive predictive values (PPV) up to 0.95. Virtual screening of 40 new oligosaccharides using the simplified model identified 15 computational hits, 11 of which were experimentally validated for high FGF2 affinity. This hybrid approach marks a significant step toward the targeted design of oligosaccharides with desired protein interactions, providing a foundation for broader applications in glycobiology.

2.
J Chem Inf Model ; 2024 May 20.
Article En | MEDLINE | ID: mdl-38768560

We introduce STOPLIGHT, a web portal to assist medicinal chemists in prioritizing hits from screening campaigns and the selection of compounds for optimization. STOPLIGHT incorporates services to assess 6 physiochemical and structural properties, 6 assay liabilities, and 11 pharmacokinetic properties, for any small molecule represented by its SMILES string. We briefly describe each service and illustrate the utility of this portal with a case study. The STOPLIGHT portal provides a user-friendly tool to guide hit selection in early drug discovery campaigns, whereby compounds with unfavorable properties can be quickly recognized and eliminated.

3.
Adv Inf Retr ; 14609: 34-49, 2024 Mar.
Article En | MEDLINE | ID: mdl-38585224

Nearest neighbor-based similarity searching is a common task in chemistry, with notable use cases in drug discovery. Yet, some of the most commonly used approaches for this task still leverage a brute-force approach. In practice this can be computationally costly and overly time-consuming, due in part to the sheer size of modern chemical databases. Previous computational advancements for this task have generally relied on improvements to hardware or dataset-specific tricks that lack generalizability. Approaches that leverage lower-complexity searching algorithms remain relatively underexplored. However, many of these algorithms are approximate solutions and/or struggle with typical high-dimensional chemical embeddings. Here we evaluate whether a combination of low-dimensional chemical embeddings and a k-d tree data structure can achieve fast nearest neighbor queries while maintaining performance on standard chemical similarity search benchmarks. We examine different dimensionality reductions of standard chemical embeddings as well as a learned, structurally-aware embedding-SmallSA-for this task. With this framework, searches on over one billion chemicals execute in less than a second on a single CPU core, five orders of magnitude faster than the brute-force approach. We also demonstrate that SmallSA achieves competitive performance on chemical similarity benchmarks.

4.
J Med Chem ; 67(8): 6508-6518, 2024 Apr 25.
Article En | MEDLINE | ID: mdl-38568752

Computational models that predict pharmacokinetic properties are critical to deprioritize drug candidates that emerge as hits in high-throughput screening campaigns. We collected, curated, and integrated a database of compounds tested in 12 major end points comprising over 10,000 unique molecules. We then employed these data to build and validate binary quantitative structure-activity relationship (QSAR) models. All trained models achieved a correct classification rate above 0.60 and a positive predictive value above 0.50. To illustrate their utility in drug discovery, we used these models to predict the pharmacokinetic properties for drugs in the NCATS Inxight Drugs database. In addition, we employed the developed models to predict the pharmacokinetic properties of all compounds in the DrugBank. All models described in this paper have been integrated and made publicly available via the PhaKinPro Web-portal that can be accessed at https://phakinpro.mml.unc.edu/.


Quantitative Structure-Activity Relationship , Humans , Internet , Drug Discovery , Pharmaceutical Preparations/metabolism , Pharmaceutical Preparations/chemistry
5.
ArXiv ; 2024 Mar 15.
Article En | MEDLINE | ID: mdl-38560736

Structure-based virtual screening (SBVS) is a key workflow in computational drug discovery. SBVS models are assessed by measuring the enrichment of known active molecules over decoys in retrospective screens. However, the standard formula for enrichment cannot estimate model performance on very large libraries. Additionally, current screening benchmarks cannot easily be used with machine learning (ML) models due to data leakage. We propose an improved formula for calculating VS enrichment and introduce the BayesBind benchmarking set composed of protein targets that are structurally dissimilar to those in the BigBind training set. We assess current models on this benchmark and find that none perform appreciably better than a KNN baseline. We publicly release the BayesBind benchmark at https://github.com/molecularmodelinglab/bigbind.

6.
J Am Chem Soc ; 146(12): 8016-8030, 2024 Mar 27.
Article En | MEDLINE | ID: mdl-38470819

There have been significant advances in the flexibility and power of in vitro cell-free translation systems. The increasing ability to incorporate noncanonical amino acids and complement translation with recombinant enzymes has enabled cell-free production of peptide-based natural products (NPs) and NP-like molecules. We anticipate that many more such compounds and analogs might be accessed in this way. To assess the peptide NP space that is directly accessible to current cell-free technologies, we developed a peptide parsing algorithm that breaks down peptide NPs into building blocks based on ribosomal translation logic. Using the resultant data set, we broadly analyze the biophysical properties of these privileged compounds and perform a retrobiosynthetic analysis to predict which peptide NPs could be directly synthesized in augmented cell-free translation reactions. We then tested these predictions by preparing a library of highly modified peptide NPs. Two macrocyclases, PatG and PCY1, were used to effect the head-to-tail macrocyclization of candidate NPs. This retrobiosynthetic analysis identified a collection of high-priority building blocks that are enriched throughout peptide NPs, yet they had not previously been tested in cell-free translation. To expand the cell-free toolbox into this space, we established, optimized, and characterized the flexizyme-enabled ribosomal incorporation of piperazic acids. Overall, these results demonstrate the feasibility of cell-free translation for peptide NP total synthesis while expanding the limits of the technology. This work provides a novel computational tool for exploration of peptide NP chemical space, that could be expanded in the future to allow design of ribosomal biosynthetic pathways for NPs and NP-like molecules.


Biological Products , Biological Products/chemistry , Cheminformatics , Peptides/chemistry , Peptide Biosynthesis , Amino Acids
7.
Bioinformatics ; 40(1)2024 01 02.
Article En | MEDLINE | ID: mdl-38175789

SUMMARY: Knowledge graphs are being increasingly used in biomedical research to link large amounts of heterogenous data and facilitate reasoning across diverse knowledge sources. Wider adoption and exploration of knowledge graphs in the biomedical research community is limited by requirements to understand the underlying graph structure in terms of entity types and relationships, represented as nodes and edges, respectively, and learn specialized query languages for graph mining and exploration. We have developed a user-friendly interface dubbed ExEmPLAR (Extracting, Exploring, and Embedding Pathways Leading to Actionable Research) to aid reasoning over biomedical knowledge graphs and assist with data-driven research and hypothesis generation. We explain the key functionalities of ExEmPLAR and demonstrate its use with a case study considering the relationship of Trypanosoma cruzi, the etiological agent of Chagas disease, to frequently associated cardiovascular conditions. AVAILABILITY AND IMPLEMENTATION: ExEmPLAR is freely accessible at https://www.exemplar.mml.unc.edu/. For code and instructions for the using the application, see: https://github.com/beasleyjonm/AOP-COP-Path-Extractor.


Biomedical Research , Pattern Recognition, Automated
8.
Nat Rev Drug Discov ; 23(2): 141-155, 2024 02.
Article En | MEDLINE | ID: mdl-38066301

Quantitative structure-activity relationship (QSAR) modelling, an approach that was introduced 60 years ago, is widely used in computer-aided drug design. In recent years, progress in artificial intelligence techniques, such as deep learning, the rapid growth of databases of molecules for virtual screening and dramatic improvements in computational power have supported the emergence of a new field of QSAR applications that we term 'deep QSAR'. Marking a decade from the pioneering applications of deep QSAR to tasks involved in small-molecule drug discovery, we herein describe key advances in the field, including deep generative and reinforcement learning approaches in molecular design, deep learning models for synthetic planning and the application of deep QSAR models in structure-based virtual screening. We also reflect on the emergence of quantum computing, which promises to further accelerate deep QSAR applications and the need for open-source and democratized resources to support computer-aided drug design.


Deep Learning , Quantitative Structure-Activity Relationship , Humans , Artificial Intelligence , Computing Methodologies , Quantum Theory , Drug Discovery/methods , Drug Design
9.
Mol Inform ; 43(1): e202300207, 2024 Jan.
Article En | MEDLINE | ID: mdl-37802967

Recent rapid expansion of make-on-demand, purchasable, chemical libraries comprising dozens of billions or even trillions of molecules has challenged the efficient application of traditional structure-based virtual screening methods that rely on molecular docking. We present a novel computational methodology termed HIDDEN GEM (HIt Discovery using Docking ENriched by GEnerative Modeling) that greatly accelerates virtual screening. This workflow uniquely integrates machine learning, generative chemistry, massive chemical similarity searching and molecular docking of small, selected libraries in the beginning and the end of the workflow. For each target, HIDDEN GEM nominates a small number of top-scoring virtual hits prioritized from ultra-large chemical libraries. We have benchmarked HIDDEN GEM by conducting virtual screening campaigns for 16 diverse protein targets using Enamine REAL Space library comprising 37 billion molecules. We show that HIDDEN GEM yields the highest enrichment factors as compared to state of the art accelerated virtual screening methods, while requiring the least computational resources. HIDDEN GEM can be executed with any docking software and employed by users with limited computational resources.


Small Molecule Libraries , Software , Small Molecule Libraries/chemistry , Molecular Docking Simulation , Workflow
10.
J Chem Inf Model ; 64(7): 2488-2495, 2024 Apr 08.
Article En | MEDLINE | ID: mdl-38113513

Deep learning methods that predict protein-ligand binding have recently been used for structure-based virtual screening. Many such models have been trained using protein-ligand complexes with known crystal structures and activities from the PDBBind data set. However, because PDBbind only includes 20K complexes, models typically fail to generalize to new targets, and model performance is on par with models trained with only ligand information. Conversely, the ChEMBL database contains a wealth of chemical activity information but includes no information about binding poses. We introduce BigBind, a data set that maps ChEMBL activity data to proteins from the CrossDocked data set. BigBind comprises 583 K ligand activities and includes 3D structures of the protein binding pockets. Additionally, we augmented the data by adding an equal number of putative inactives for each target. Using this data, we developed Banana (basic neural network for binding affinity), a neural network-based model to classify active from inactive compounds, defined by a 10 µM cutoff. Our model achieved an AUC of 0.72 on BigBind's test set, while a ligand-only model achieved an AUC of 0.59. Furthermore, Banana achieved competitive performance on the LIT-PCBA benchmark (median EF1% 1.81) while running 16,000 times faster than molecular docking with Gnina. We suggest that Banana, as well as other models trained on this data set, will significantly improve the outcomes of prospective virtual screening tasks.


Proteins , Ubiquitin-Protein Ligases , Molecular Docking Simulation , Ligands , Prospective Studies , Proteins/chemistry , Protein Binding , Ubiquitin-Protein Ligases/metabolism
11.
J Alzheimers Dis ; 96(2): 499-505, 2023.
Article En | MEDLINE | ID: mdl-37807778

Vaccine repurposing that considers individual genotype may aid personalized prevention of Alzheimer's disease (AD). In this retrospective cohort study, we used Cardiovascular Health Study data to estimate associations of pneumococcal polysaccharide vaccine and flu shots received between ages 65-75 with AD onset at age 75 or older, taking into account rs6859 polymorphism in NECTIN2 gene (AD risk factor). Pneumococcal vaccine, and total count of vaccinations against pneumonia and flu, were associated with lower odds of AD in carriers of rs6859 A allele, but not in non-carriers. We conclude that pneumococcal polysaccharide vaccine is a promising candidate for genotype-tailored AD prevention.


Alzheimer Disease , Pneumonia, Pneumococcal , Humans , Aged , Pneumonia, Pneumococcal/prevention & control , Retrospective Studies , Alzheimer Disease/genetics , Alzheimer Disease/prevention & control , Vaccination , Pneumococcal Vaccines/therapeutic use , Genotype
12.
NPJ Vaccines ; 8(1): 129, 2023 Sep 01.
Article En | MEDLINE | ID: mdl-37658087

COVID-19 vaccines have been instrumental tools in the fight against SARS-CoV-2 helping to reduce disease severity and mortality. At the same time, just like any other therapeutic, COVID-19 vaccines were associated with adverse events. Women have reported menstrual cycle irregularity after receiving COVID-19 vaccines, and this led to renewed fears concerning COVID-19 vaccines and their effects on fertility. Herein we devised an informatics workflow to explore the causal drivers of menstrual cycle irregularity in response to vaccination with mRNA COVID-19 vaccine BNT162b2. Our methods relied on gene expression analysis in response to vaccination, followed by network biology analysis to derive testable hypotheses regarding the causal links between BNT162b2 and menstrual cycle irregularity. Five high-confidence transcription factors were identified as causal drivers of BNT162b2-induced menstrual irregularity, namely: IRF1, STAT1, RelA (p65 NF-kB subunit), STAT2 and IRF3. Furthermore, some biomarkers of menstrual irregularity, including TNF, IL6R, IL6ST, LIF, BIRC3, FGF2, ARHGDIB, RPS3, RHOU, MIF, were identified as topological genes and predicted as causal drivers of menstrual irregularity. Our network-based mechanism reconstruction results indicated that BNT162b2 exerted biological effects similar to those resulting from prolactin signaling. However, these effects were short-lived and didn't raise concerns about long-term infertility issues. This approach can be applied to interrogate the functional links between drugs/vaccines and other side effects.

13.
J Med Chem ; 66(18): 12828-12839, 2023 09 28.
Article En | MEDLINE | ID: mdl-37677128

Hits from high-throughput screening (HTS) of chemical libraries are often false positives due to their interference with assay detection technology. In response, we generated the largest publicly available library of chemical liabilities and developed "Liability Predictor," a free web tool to predict HTS artifacts. More specifically, we generated, curated, and integrated HTS data sets for thiol reactivity, redox activity, and luciferase (firefly and nano) activity and developed and validated quantitative structure-interference relationship (QSIR) models to predict these nuisance behaviors. The resulting models showed 58-78% external balanced accuracy for 256 external compounds per assay. QSIR models developed and validated herein identify nuisance compounds among experimental hits more reliably than do popular PAINS filters. Both the models and the curated data sets were implemented in "Liability Predictor," publicly available at https://liability.mml.unc.edu/. "Liability Predictor" may be used as part of chemical library design or for triaging HTS hits.


Artifacts , High-Throughput Screening Assays , High-Throughput Screening Assays/methods , Small Molecule Libraries/chemistry
14.
Proteins ; 91(12): 1822-1828, 2023 Dec.
Article En | MEDLINE | ID: mdl-37697630

In the ligand prediction category of CASP15, the challenge was to predict the positions and conformations of small molecules binding to proteins that were provided as amino acid sequences or as models generated by the AlphaFold2 program. For most targets, we used our template-based ligand docking program ClusPro ligTBM, also implemented as a public server available at https://ligtbm.cluspro.org/. Since many targets had multiple chains and a number of ligands, several templates, and some manual interventions were required. In a few cases, no templates were found, and we had to use direct docking using the Glide program. Nevertheless, ligTBM was shown to be a very useful tool, and by any ranking criteria, our group was ranked among the top five best-performing teams. In fact, all the best groups used template-based docking methods. Thus, it appears that the AlphaFold2-generated models, despite the high accuracy of the predicted backbone, have local differences from the x-ray structure that make the use of direct docking methods more challenging. The results of CASP15 confirm that this limitation can be frequently overcome by homology-based docking.


Proteins , Software , Protein Conformation , Molecular Docking Simulation , Ligands , Proteins/chemistry , Protein Binding , Binding Sites
15.
J Cheminform ; 15(1): 82, 2023 Sep 19.
Article En | MEDLINE | ID: mdl-37726809

We report the major highlights of the School of Cheminformatics in Latin America, Mexico City, November 24-25, 2022. Six lectures, one workshop, and one roundtable with four editors were presented during an online public event with speakers from academia, big pharma, and public research institutions. One thousand one hundred eighty-one students and academics from seventy-nine countries registered for the meeting. As part of the meeting, advances in enumeration and visualization of chemical space, applications in natural product-based drug discovery, drug discovery for neglected diseases, toxicity prediction, and general guidelines for data analysis were discussed. Experts from ChEMBL presented a workshop on how to use the resources of this major compounds database used in cheminformatics. The school also included a round table with editors of cheminformatics journals. The full program of the meeting and the recordings of the sessions are publicly available at https://www.youtube.com/@SchoolChemInfLA/featured .

16.
ArXiv ; 2023 Jul 26.
Article En | MEDLINE | ID: mdl-37547658

Molecular docking aims to predict the 3D pose of a small molecule in a protein binding site. Traditional docking methods predict ligand poses by minimizing a physics-inspired scoring function. Recently, a diffusion model has been proposed that iteratively refines a ligand pose. We combine these two approaches by training a pose scoring function in a diffusion-inspired manner. In our method, PLANTAIN, a neural network is used to develop a very fast pose scoring function. We parameterize a simple scoring function on the fly and use L-BFGS minimization to optimize an initially random ligand pose. Using rigorous benchmarking practices, we demonstrate that our method achieves state-of-the-art performance while running ten times faster than the next-best method. We release PLANTAIN publicly and hope that it improves the utility of virtual screening workflows.

17.
FEMS Microbiol Rev ; 47(5)2023 09 05.
Article En | MEDLINE | ID: mdl-37596064

Understanding the origins of past and present viral epidemics is critical in preparing for future outbreaks. Many viruses, including SARS-CoV-2, have led to significant consequences not only due to their virulence, but also because we were unprepared for their emergence. We need to learn from large amounts of data accumulated from well-studied, past pandemics and employ modern informatics and therapeutic development technologies to forecast future pandemics and help minimize their potential impacts. While acknowledging the complexity and difficulties associated with establishing reliable outbreak predictions, herein we provide a perspective on the regions of the world that are most likely to be impacted by future outbreaks. We specifically focus on viruses with epidemic potential, namely SARS-CoV-2, MERS-CoV, DENV, ZIKV, MAYV, LASV, noroviruses, influenza, Nipah virus, hantaviruses, Oropouche virus, MARV, and Ebola virus, which all require attention from both the public and scientific community to avoid societal catastrophes like COVID-19. Based on our literature review, data analysis, and outbreak simulations, we posit that these future viral epidemics are unavoidable, but that their societal impacts can be minimized by strategic investment into basic virology research, epidemiological studies of neglected viral diseases, and antiviral drug discovery.


COVID-19 , Zika Virus Infection , Zika Virus , Humans , COVID-19/epidemiology , SARS-CoV-2 , Disease Outbreaks
18.
Antiviral Res ; 217: 105620, 2023 09.
Article En | MEDLINE | ID: mdl-37169224

Diseases caused by new viruses cost thousands if not millions of human lives and trillions of dollars. We have identified, collected, curated, and integrated all chemogenomics data from ChEMBL for 13 emerging viruses that hold the greatest potential threat to global human health. By identifying and solving several challenges related to data annotation accuracy, we developed a highly curated and thoroughly annotated database of compounds tested in both phenotypic and target-based assays for these viruses that we dubbed SMACC (Small Molecule Antiviral Compound Collection). The pilot version of the SMACC database contains over 32,500 entries for 13 viruses. By analyzing data in SMACC, we have identified ∼50 compounds with polyviral inhibition profile, mostly covering flavi- and coronaviruses. The SMACC database may serve as a reference for virologists and medicinal chemists working on the development of novel BSA agents in preparation for future viral outbreaks. SMACC is publicly available at https://smacc.mml.unc.edu.


Coronavirus Infections , Viruses , Humans , Antiviral Agents/pharmacology , Viruses/genetics , Databases, Factual
19.
J Control Release ; 353: 903-914, 2023 01.
Article En | MEDLINE | ID: mdl-36402234

Active learning (AL) has become a subject of active recent research both in industry and academia as an efficient approach for rapid design and discovery of novel chemicals, materials, and polymers. Herein, we have assessed the applicability of AL for the discovery of polymeric micelle formulations for poorly soluble drugs. We were motivated by the key advantages of this approach making it a desirable strategy for rational design of drug delivery systems due toto its ability to (i) employ relatively small datasets for model development, (ii) iterate between model development and model assessment using small external datasets that can be either generated in focused experimental studies or formed from subsets of the initial training data, and (iii) progressively evolve models towards increasingly more reliable predictions and the identification of novel chemicals with the desired properties. In this study, we compared various AL protocols for their effectiveness in finding biologically active molecules using synthetic datasets. We have investigated the dependency of AL performance on the size of the initial training set, the relative complexity of the task, and the choice of the initial training dataset. We found that AL techniques as applied to regression modeling offer no benefits over random search, while AL used for classification tasks performs better than models built for randomly selected training sets but still quite far from perfect. Using the best performing AL protocol,. Finally, the best performing AL approach was employed to discover and experimentally validate novel binding polymers for a case study of asialoglycoprotein receptor (ASGPR).


Polymers , Problem-Based Learning , Polymers/chemistry , Micelles , Drug Delivery Systems , Peptides
20.
Regul Toxicol Pharmacol ; 136: 105277, 2022 Dec.
Article En | MEDLINE | ID: mdl-36288772

Exogenous metal particles and ions from implant devices are known to cause severe toxic events with symptoms ranging from adverse local tissue reactions to systemic toxicities, potentially leading to the development of cancers, heart conditions, and neurological disorders. Toxicity mechanisms, also known as Adverse Outcome Pathways (AOPs), that explain these metal-induced toxicities are severely understudied. Therefore, we deployed in silico structure- and knowledge-based approaches to identify proteome-level perturbations caused by metals and pathways that link these events to human diseases. We captured 177 structure-based, 347 knowledge-based, and 402 imputed metal-gene/protein relationships for chromium, cobalt, molybdenum, nickel, and titanium. We prioritized 72 proteins hypothesized to directly contact implant surfaces and contribute to adverse outcomes. Results of this exploratory analysis were formalized as structured AOPs. We considered three case studies reflecting the following possible situations: (i) the metal-protein-disease relationship was previously known; (ii) the metal-protein, protein-disease, and metal-disease relationships were individually known but were not linked (as a unified AOP); and (iii) one of three relationships was unknown and was imputed by our methods. These situations were illustrated by case studies on nickel-induced allergy/hypersensitivity, cobalt-induced heart failure, and titanium-induced periprosthetic osteolysis, respectively. All workflows, data, and results are freely available in https://github.com/DnlRKorn/Knowledge_Based_Metallomics/. An interactive view of select data is available at the ROBOKOP Neo4j Browser at http://robokopkg.renci.org/browser/.


Adverse Outcome Pathways , Nickel , Humans , Nickel/adverse effects , Titanium/toxicity , Metals/toxicity , Cobalt , Chromium
...