Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 348
Filter
1.
PLoS One ; 19(7): e0306202, 2024.
Article in English | MEDLINE | ID: mdl-38968199

ABSTRACT

Chemical information has become increasingly ubiquitous and has outstripped the pace of analysis and interpretation. We have developed an R package, uafR, that automates a grueling retrieval process for gas -chromatography coupled mass spectrometry (GC -MS) data and allows anyone interested in chemical comparisons to quickly perform advanced structural similarity matches. Our streamlined cheminformatics workflows allow anyone with basic experience in R to pull out component areas for tentative compound identifications using the best published understanding of molecules across samples (pubchem.gov). Interpretations can now be done at a fraction of the time, cost, and effort it would typically take using a standard chemical ecology data analysis pipeline. The package was tested in two experimental contexts: (1) A dataset of purified internal standards, which showed our algorithms correctly identified the known compounds with R2 values ranging from 0.827-0.999 along concentrations ranging from 1 × 10-5 to 1 × 103 ng/µl, (2) A large, previously published dataset, where the number and types of compounds identified were comparable (or identical) to those identified with the traditional manual peak annotation process, and NMDS analysis of the compounds produced the same pattern of significance as in the original study. Both the speed and accuracy of GC -MS data processing are drastically improved with uafR because it allows users to fluidly interact with their experiment following tentative library identifications [i.e. after the m/z spectra have been matched against an installed chemical fragmentation database (e.g. NIST)]. Use of uafR will allow larger datasets to be collected and systematically interpreted quickly. Furthermore, the functions of uafR could allow backlogs of previously collected and annotated data to be processed by new personnel or students as they are being trained. This is critical as we enter the era of exposomics, metabolomics, volatilomes, and landscape level, high-throughput chemotyping. This package was developed to advance collective understanding of chemical data and is applicable to any research that benefits from GC -MS analysis. It can be downloaded for free along with sample datasets from Github at github.org/castratton/uafR or installed directly from R or RStudio using the developer tools: 'devtools::install_github("castratton/uafR")'.


Subject(s)
Algorithms , Gas Chromatography-Mass Spectrometry , Software , Gas Chromatography-Mass Spectrometry/methods , Cheminformatics/methods
2.
PLoS One ; 19(6): e0302105, 2024.
Article in English | MEDLINE | ID: mdl-38889115

ABSTRACT

The present study was focused on exploring the efficient inhibitors of closed state (form) of type III effector Xanthomonas outer protein Q (XopQ) (PDB: 4P5F) from the 44 phytochemicals of Picrasma quassioides using cutting-edge computational analysis. Among them, Kumudine B showed excellent binding energy (-11.0 kcal/mol), followed by Picrasamide A, Quassidine I and Quassidine J with the targeted closed state of XopQ protein compared to the reference standard drug (Streptomycin). The molecular dynamics (MD) simulations performed at 300 ns validated the stability of top lead ligands (Kumudine B, Picrasamide A, and Quassidine I)-bound XopQ protein complex with slightly lower fluctuation than Streptomycin. The MM-PBSA calculation confirmed the strong interactions of top lead ligands (Kumudine B and QuassidineI) with XopQ protein, as they offered the least binding energy. The results of absorption, distribution, metabolism, excretion, and toxicity (ADMET) analysis confirmed that Quassidine I, Kumudine B and Picrasamide A were found to qualify most of the drug-likeness rules with excellent bioavailability scores compared to Streptomycin. Results of the computational studies suggested that Kumudine B, Picrasamide A, and Quassidine I could be considered potential compounds to design novel antibacterial drugs against X. oryzae infection. Further in vitro and in vivo antibacterial activities of Kumudine B, Picrasamide A, and Quassidine I are required to confirm their therapeutic potentiality in controlling the X. oryzae infection.


Subject(s)
Anti-Bacterial Agents , Molecular Dynamics Simulation , Xanthomonas , Anti-Bacterial Agents/pharmacology , Anti-Bacterial Agents/chemistry , Xanthomonas/drug effects , Cheminformatics/methods , Molecular Docking Simulation , Bacterial Proteins/antagonists & inhibitors , Bacterial Proteins/metabolism , Bacterial Proteins/chemistry
3.
BMC Bioinformatics ; 25(1): 225, 2024 Jun 26.
Article in English | MEDLINE | ID: mdl-38926641

ABSTRACT

PURPOSE: Large Language Models (LLMs) like Generative Pre-trained Transformer (GPT) from OpenAI and LLaMA (Large Language Model Meta AI) from Meta AI are increasingly recognized for their potential in the field of cheminformatics, particularly in understanding Simplified Molecular Input Line Entry System (SMILES), a standard method for representing chemical structures. These LLMs also have the ability to decode SMILES strings into vector representations. METHOD: We investigate the performance of GPT and LLaMA compared to pre-trained models on SMILES in embedding SMILES strings on downstream tasks, focusing on two key applications: molecular property prediction and drug-drug interaction prediction. RESULTS: We find that SMILES embeddings generated using LLaMA outperform those from GPT in both molecular property and DDI prediction tasks. Notably, LLaMA-based SMILES embeddings show results comparable to pre-trained models on SMILES in molecular prediction tasks and outperform the pre-trained models for the DDI prediction tasks. CONCLUSION: The performance of LLMs in generating SMILES embeddings shows great potential for further investigation of these models for molecular embedding. We hope our study bridges the gap between LLMs and molecular embedding, motivating additional research into the potential of LLMs in the molecular representation field. GitHub: https://github.com/sshaghayeghs/LLaMA-VS-GPT .


Subject(s)
Cheminformatics , Cheminformatics/methods , Drug Interactions , Molecular Structure
4.
Molecules ; 29(12)2024 Jun 12.
Article in English | MEDLINE | ID: mdl-38930871

ABSTRACT

Synthetic efforts toward complex natural product (NP) scaffolds are useful ones, particularly those aimed at expanding their bioactive chemical space. Here, we utilised an orthogonal cheminformatics-based approach to predict the potential biological activities for a series of synthetic bis-indole alkaloids inspired by elusive sponge-derived NPs, echinosulfone A (1) and echinosulfonic acids A-D (2-5). Our work includes the first synthesis of desulfato-echinosulfonic acid C, an α-hydroxy bis(3'-indolyl) alkaloid (17), and its full NMR characterisation. This synthesis provides corroborating evidence for the structure revision of echinosulfonic acids A-C. Additionally, we demonstrate a robust synthetic strategy toward a diverse range of α-methine bis(3'-indolyl) acids and acetates (11-16) without the need for silica-based purification in either one or two steps. By integrating our synthetic library of bis-indoles with bioactivity data for 2048 marine indole alkaloids (reported up to the end of 2021), we analyzed their overlap with marine natural product chemical diversity. Notably, the C-6 dibrominated α-hydroxy bis(3'-indolyl) and α-methine bis(3'-indolyl) analogues (11, 14, and 17) were found to contain significant overlap with antibacterial C-6 dibrominated marine bis-indoles, guiding our biological evaluation. Validating the results of our cheminformatics analyses, the dibrominated α-methine bis(3'-indolyl) alkaloids (11, 12, 14, and 15) were found to exhibit antibacterial activities against methicillin-sensitive and -resistant Staphylococcus aureus. Further, while investigating other synthetic approaches toward bis-indole alkaloids, 16 incorrectly assigned synthetic α-hydroxy bis(3'-indolyl) alkaloids were identified. After careful analysis of their reported NMR data, and comparison with those obtained for the synthetic bis-indoles reported herein, all of the structures have been revised to α-methine bis(3'-indolyl) alkaloids.


Subject(s)
Anti-Bacterial Agents , Cheminformatics , Indole Alkaloids , Anti-Bacterial Agents/pharmacology , Anti-Bacterial Agents/chemistry , Anti-Bacterial Agents/chemical synthesis , Indole Alkaloids/chemistry , Indole Alkaloids/pharmacology , Indole Alkaloids/chemical synthesis , Cheminformatics/methods , Microbial Sensitivity Tests , Molecular Structure , Structure-Activity Relationship , Biological Products/chemistry , Biological Products/pharmacology , Biological Products/chemical synthesis
5.
J Chem Inf Model ; 64(11): 4392-4409, 2024 Jun 10.
Article in English | MEDLINE | ID: mdl-38815246

ABSTRACT

By accelerating time-consuming processes with high efficiency, computing has become an essential part of many modern chemical pipelines. Machine learning is a class of computing methods that can discover patterns within chemical data and utilize this knowledge for a wide variety of downstream tasks, such as property prediction or substance generation. The complex and diverse chemical space requires complex machine learning architectures with great learning power. Recently, learning models based on transformer architectures have revolutionized multiple domains of machine learning, including natural language processing and computer vision. Naturally, there have been ongoing endeavors in adopting these techniques to the chemical domain, resulting in a surge of publications within a short period. The diversity of chemical structures, use cases, and learning models necessitate a comprehensive summarization of existing works. In this paper, we review recent innovations in adapting transformers to solve learning problems in chemistry. Because chemical data is diverse and complex, we structure our discussion based on chemical representations. Specifically, we highlight the strengths and weaknesses of each representation, the current progress of adapting transformer architectures, and future directions.


Subject(s)
Cheminformatics , Machine Learning , Cheminformatics/methods
6.
Food Chem ; 454: 139794, 2024 Oct 01.
Article in English | MEDLINE | ID: mdl-38797094

ABSTRACT

Sweet potatoes are rich in cardioprotective phytochemicals with potential anti-platelet aggregation activity, although this benefit may vary among cultivars/genotypes. The phenolic profile [HPLC-ESI(-)-qTOF-MS2], cheminformatics (ADMET properties, affinity toward platelet proteins) and anti-PA activity of phenolic-rich hydroalcoholic extracts obtained from orange (OSP) and purple (PSP) sweet potato storage roots, was evaluated. The phenolic richness [Hydroxycinnamic acids> flavonoids> benzoic acids] was PSP > OSP. Their main chlorogenic acids could interact with platelet proteins (integrins/adhesins, kinases/metalloenzymes) but their bioavailability could be poor. Just OSP exhibited a dose-dependent anti-platelet aggregation activity [inductor (IC50, mg.ml-1): thrombin receptor activator peptide-6 (0.55) > Adenosine-5'-diphosphate (1.02) > collagen (1.56)] and reduced P-selectin expression (0.75-1.0 mg.ml-1) but not glycoprotein IIb/IIIa secretion. The explored anti-PA activity of OSP/PSP seems to be inversely related to their phenolic richness. The poor first-pass bioavailability of its chlorogenic acids (documented in silico) may represent a further obstacle for their anti-PA in vivo.


Subject(s)
Ipomoea batatas , Phenols , Plant Extracts , Plant Roots , Platelet Aggregation Inhibitors , Platelet Aggregation , Ipomoea batatas/chemistry , Phenols/chemistry , Platelet Aggregation/drug effects , Plant Extracts/chemistry , Plant Extracts/pharmacology , Platelet Aggregation Inhibitors/chemistry , Platelet Aggregation Inhibitors/pharmacology , Plant Roots/chemistry , Humans , Cheminformatics , Animals , Blood Platelets/metabolism , Blood Platelets/drug effects
7.
J Chem Inf Model ; 64(7): 2125-2128, 2024 Apr 08.
Article in English | MEDLINE | ID: mdl-38587006
8.
ACS Chem Biol ; 19(4): 938-952, 2024 04 19.
Article in English | MEDLINE | ID: mdl-38565185

ABSTRACT

Phenotypic assays have become an established approach to drug discovery. Greater disease relevance is often achieved through cellular models with increased complexity and more detailed readouts, such as gene expression or advanced imaging. However, the intricate nature and cost of these assays impose limitations on their screening capacity, often restricting screens to well-characterized small compound sets such as chemogenomics libraries. Here, we outline a cheminformatics approach to identify a small set of compounds with likely novel mechanisms of action (MoAs), expanding the MoA search space for throughput limited phenotypic assays. Our approach is based on mining existing large-scale, phenotypic high-throughput screening (HTS) data. It enables the identification of chemotypes that exhibit selectivity across multiple cell-based assays, which are characterized by persistent and broad structure activity relationships (SAR). We validate the effectiveness of our approach in broad cellular profiling assays (Cell Painting, DRUG-seq, and Promotor Signature Profiling) and chemical proteomics experiments. These experiments revealed that the compounds behave similarly to known chemogenetic libraries, but with a notable bias toward novel protein targets. To foster collaboration and advance research in this area, we have curated a public set of such compounds based on the PubChem BioAssay dataset and made it available for use by the scientific community.


Subject(s)
Drug Discovery , High-Throughput Screening Assays , Small Molecule Libraries , Drug Discovery/methods , High-Throughput Screening Assays/methods , Cheminformatics/methods , Small Molecule Libraries/chemistry , Structure-Activity Relationship
9.
Sci Rep ; 14(1): 9801, 2024 04 29.
Article in English | MEDLINE | ID: mdl-38684706

ABSTRACT

The Covid-19 pandemic outbreak has accelerated tremendous efforts to discover a therapeutic strategy that targets severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) to control viral infection. Various viral proteins have been identified as potential drug targets, however, to date, no specific therapeutic cure is available against the SARS-CoV-2. To address this issue, the present work reports a systematic cheminformatic approach to identify the potent andrographolide derivatives that can target methyltransferases of SARS-CoV-2, i.e. nsp14 and nsp16 which are crucial for the replication of the virus and host immune evasion. A consensus of cheminformatics methodologies including virtual screening, molecular docking, ADMET profiling, molecular dynamics simulations, free-energy landscape analysis, molecular mechanics generalized born surface area (MM-GBSA), and density functional theory (DFT) was utilized. Our study reveals two new andrographolide derivatives (PubChem CID: 2734589 and 138968421) as natural bioactive molecules that can form stable complexes with both proteins via hydrophobic interactions, hydrogen bonds and electrostatic interactions. The toxicity analysis predicts class four toxicity for both compounds with LD50 value in the range of 500-700 mg/kg. MD simulation reveals the stable formation of the complex for both the compounds and their average trajectory values were found to be lower than the control inhibitor and protein alone. MMGBSA analysis corroborates the MD simulation result and showed the lowest energy for the compounds 2734589 and 138968421. The DFT and MEP analysis also predicts the better reactivity and stability of both the hit compounds. Overall, both andrographolide derivatives exhibit good potential as potent inhibitors for both nsp14 and nsp16 proteins, however, in-vitro and in vivo assessment would be required to prove their efficacy and safety in clinical settings. Moreover, the drug discovery strategy aiming at the dual target approach might serve as a useful model for inventing novel drug molecules for various other diseases.


Subject(s)
Antiviral Agents , Diterpenes , Methyltransferases , Molecular Docking Simulation , Molecular Dynamics Simulation , SARS-CoV-2 , Viral Nonstructural Proteins , Diterpenes/pharmacology , Diterpenes/chemistry , SARS-CoV-2/drug effects , SARS-CoV-2/enzymology , Methyltransferases/antagonists & inhibitors , Methyltransferases/chemistry , Methyltransferases/metabolism , Antiviral Agents/pharmacology , Antiviral Agents/chemistry , Humans , Viral Nonstructural Proteins/antagonists & inhibitors , Viral Nonstructural Proteins/chemistry , Viral Nonstructural Proteins/metabolism , Cheminformatics/methods , COVID-19/virology , Enzyme Inhibitors/chemistry , Enzyme Inhibitors/pharmacology , COVID-19 Drug Treatment
10.
J Org Chem ; 89(7): 4932-4946, 2024 04 05.
Article in English | MEDLINE | ID: mdl-38451837

ABSTRACT

The concise synthesis of a small library of fluorinated piperidines from readily available dihydropyridinone derivatives has been described. The effect of the fluorination on different positions has then been evaluated by chemoinformatic tools. In particular, the compounds' pKa's have been calculated, revealing that the fluorine atoms notably lowered their basicity, which is correlated to the affinity for hERG channels resulting in cardiac toxicity. The "lead-likeness" and three-dimensionality have also been evaluated to assess their ability as useful fragments for drug design. A random screening on a panel of representative proteolytic enzymes was then carried out and revealed that one scaffold is recognized by the catalytic pocket of 3CLPro (main protease of SARS-CoV-2 coronavirus).


Subject(s)
Cheminformatics , Drug Discovery , SARS-CoV-2 , Drug Design , Protease Inhibitors/pharmacology , Antiviral Agents/pharmacology
11.
SLAS Discov ; 29(4): 100155, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38518955

ABSTRACT

In June 2022, EU-OS came to the decision to make public a solubility data set of 100+K compounds obtained from several of the EU-OS proprietary screening compound collections. Leveraging on the interest of SLAS for screening scientific development it was decided to launch a joint EUOS-SLAS competition within the chemoinformatics and machine learning (ML) communities. The competition was open to real world computation experts, for the best, most predictive, classification model of compound solubility. The aim of the competition was multiple: from a practical side, the winning model should then serve as a cornerstone for future solubility predictions having used the largest training set so far publicly available. From a higher project perspective, the intent was to focus the energies and experiences, even if professionally not precisely coming from Pharma R&D; to address the issue of how to predict compound solubility. Here we report how the competition was ideated and the practical aspects of conducting it within the Kaggle framework, leveraging of the versatility and the open-source nature of this data science platform. Consideration on results and challenges encountered have been also examined.


Subject(s)
Machine Learning , Solubility , Cheminformatics/methods , Humans , Drug Discovery/methods
12.
J Am Chem Soc ; 146(12): 8016-8030, 2024 Mar 27.
Article in English | MEDLINE | ID: mdl-38470819

ABSTRACT

There have been significant advances in the flexibility and power of in vitro cell-free translation systems. The increasing ability to incorporate noncanonical amino acids and complement translation with recombinant enzymes has enabled cell-free production of peptide-based natural products (NPs) and NP-like molecules. We anticipate that many more such compounds and analogs might be accessed in this way. To assess the peptide NP space that is directly accessible to current cell-free technologies, we developed a peptide parsing algorithm that breaks down peptide NPs into building blocks based on ribosomal translation logic. Using the resultant data set, we broadly analyze the biophysical properties of these privileged compounds and perform a retrobiosynthetic analysis to predict which peptide NPs could be directly synthesized in augmented cell-free translation reactions. We then tested these predictions by preparing a library of highly modified peptide NPs. Two macrocyclases, PatG and PCY1, were used to effect the head-to-tail macrocyclization of candidate NPs. This retrobiosynthetic analysis identified a collection of high-priority building blocks that are enriched throughout peptide NPs, yet they had not previously been tested in cell-free translation. To expand the cell-free toolbox into this space, we established, optimized, and characterized the flexizyme-enabled ribosomal incorporation of piperazic acids. Overall, these results demonstrate the feasibility of cell-free translation for peptide NP total synthesis while expanding the limits of the technology. This work provides a novel computational tool for exploration of peptide NP chemical space, that could be expanded in the future to allow design of ribosomal biosynthetic pathways for NPs and NP-like molecules.


Subject(s)
Biological Products , Biological Products/chemistry , Cheminformatics , Peptides/chemistry , Peptide Biosynthesis , Amino Acids
13.
Cell ; 187(9): 2194-2208.e22, 2024 Apr 25.
Article in English | MEDLINE | ID: mdl-38552625

ABSTRACT

Effective treatments for complex central nervous system (CNS) disorders require drugs with polypharmacology and multifunctionality, yet designing such drugs remains a challenge. Here, we present a flexible scaffold-based cheminformatics approach (FSCA) for the rational design of polypharmacological drugs. FSCA involves fitting a flexible scaffold to different receptors using different binding poses, as exemplified by IHCH-7179, which adopted a "bending-down" binding pose at 5-HT2AR to act as an antagonist and a "stretching-up" binding pose at 5-HT1AR to function as an agonist. IHCH-7179 demonstrated promising results in alleviating cognitive deficits and psychoactive symptoms in mice by blocking 5-HT2AR for psychoactive symptoms and activating 5-HT1AR to alleviate cognitive deficits. By analyzing aminergic receptor structures, we identified two featured motifs, the "agonist filter" and "conformation shaper," which determine ligand binding pose and predict activity at aminergic receptors. With these motifs, FSCA can be applied to the design of polypharmacological ligands at other receptors.


Subject(s)
Cheminformatics , Drug Design , Polypharmacology , Animals , Mice , Humans , Cheminformatics/methods , Ligands , Receptor, Serotonin, 5-HT2A/metabolism , Receptor, Serotonin, 5-HT2A/chemistry , Receptor, Serotonin, 5-HT1A/metabolism , Receptor, Serotonin, 5-HT1A/chemistry , Male , Binding Sites
14.
Adv Protein Chem Struct Biol ; 139: 27-55, 2024.
Article in English | MEDLINE | ID: mdl-38448138

ABSTRACT

The integration of computational resources and chemoinformatics has revolutionized translational health research. It has offered a powerful set of tools for accelerating drug discovery. This chapter overviews the computational resources and chemoinformatics methods used in translational health research. The resources and methods can be used to analyze large datasets, identify potential drug candidates, predict drug-target interactions, and optimize treatment regimens. These resources have the potential to transform the drug discovery process and foster personalized medicine research. We discuss insights into their various applications in translational health and emphasize the need for addressing challenges, promoting collaboration, and advancing the field to fully realize the potential of these tools in transforming healthcare.


Subject(s)
Cheminformatics , Drug Discovery , Precision Medicine
15.
J Chem Inf Model ; 64(6): 1966-1974, 2024 Mar 25.
Article in English | MEDLINE | ID: mdl-38437714

ABSTRACT

Chemical diversity is challenging to describe objectively. Despite this, various notions of chemical diversity are used throughout the medicinal chemistry optimization process in drug discovery. In this work, we show the usefulness of considering exploited vectors during different phases of the drug design process to provide a quantitative and objective description of chemical diversity. We have developed a concise and fast approach to enumerate and analyze the exploited vector patterns (EVPs) of molecular compound series, which can then be used in archetypal compound selection tasks, from hit matter identification to hit expansion and lead optimization. We first show that EVPs can be used to assess the progressibility of compounds in a fragment library design exercise. By considering EVPs, we then show how a set of compounds can be prioritized for hit expansion using EVP-based, customizable diversity sampling approaches, reducing the time taken and mitigating human biases. We also show that EVPs are a useful tool to analyze SAR data, offering the chance to uncover correlations between different vectors without predetermining the molecular scaffold structures. The codes used to perform these tasks are presented as easy-to-use Jupyter notebooks, which can be readily adapted for further related tasks.


Subject(s)
Cheminformatics , Drug Discovery , Humans , Drug Design , Molecular Structure , Chemistry, Pharmaceutical
16.
J Chem Inf Model ; 64(8): 2948-2954, 2024 Apr 22.
Article in English | MEDLINE | ID: mdl-38488634

ABSTRACT

SMARTS is a widely used language in cheminformatics for defining substructural queries for database lookups, reaction templates for chemical transformations, and other applications. As an extension to SMILES, many SMARTS patterns can represent the same query. Despite this, no canonicalization algorithm invariant of the line notation sequence or atomic numbering is publicly available. Here, we introduce RDCanon, an open-source Python package that can be used to standardize SMARTS queries. RDCanon is designed to ensure that the sequence of atomic queries remains consistent for all graphs representing the same substructure query and to ensure a canonical sequence of primitives within each individual atom query; furthermore, the algorithm can be applied to canonicalize the order of reactants, agents, and products and their atom map numbers in reaction SMARTS templates. As part of its canonicalization algorithm, RDCanon provides a mechanism in which the canonicalized SMARTS is optimized for speed against specific molecular databases. Several case studies are provided to showcase improved efficiency in substructure matching and retrosynthetic analysis.


Subject(s)
Algorithms , Software , Programming Languages , Cheminformatics/methods , Databases, Chemical
17.
J Chem Inf Model ; 64(8): 3173-3179, 2024 Apr 22.
Article in English | MEDLINE | ID: mdl-38554112

ABSTRACT

In this work, we propose a versatile molecule and reaction encoding binary data format that aims to bridge the gap between the advantages of SMILES, like local stereo- and implicit hydrogen encoding, and block-structured MDL MOL with a 2D layout and explicit bond encoding, while addressing their respective limitations. Our new format introduces a balance between size efficiency, processing speed, and comprehensive representation, making it well-suited for various applications in cheminformatics, including deep learning, data storage, and searching. By offering an explicit approach to store atom connectivity (including implicit hydrogens), electronic state, stereochemistry, and other crucial molecular attributes, our proposal seeks to enhance data storage efficiency and promote interoperability among different software tools.


Subject(s)
Cheminformatics , Software , Cheminformatics/methods , Molecular Structure
18.
Expert Opin Drug Discov ; 19(4): 403-414, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38300511

ABSTRACT

INTRODUCTION: Large chemical spaces (CSs) include traditional large compound collections, combinatorial libraries covering billions to trillions of molecules, DNA-encoded chemical libraries comprising complete combinatorial CSs in a single mixture, and virtual CSs explored by generative models. The diverse nature of these types of CSs require different chemoinformatic approaches for navigation. AREAS COVERED: An overview of different types of large CSs is provided. Molecular representations and similarity metrics suitable for large CS exploration are discussed. A summary of navigation of CSs in generative models is provided. Methods for characterizing and comparing CSs are discussed. EXPERT OPINION: The size of large CSs might restrict navigation to specialized algorithms and limit it to considering neighborhoods of structurally similar molecules. Efficient navigation of large CSs not only requires methods that scale with size but also requires smart approaches that focus on better but not necessarily larger molecule selections. Deep generative models aim to provide such approaches by implicitly learning features relevant for targeted biological properties. It is unclear whether these models can fulfill this ideal as validation is difficult as long as the covered CSs remain mainly virtual without experimental verification.


Subject(s)
Algorithms , Cheminformatics , Humans
19.
Food Chem ; 442: 138525, 2024 Jun 01.
Article in English | MEDLINE | ID: mdl-38271906

ABSTRACT

Species mislabeling of commercial loliginidae squid can undermine important conservation efforts and prevent consumers from making informed decisions. A comprehensive lipidomic fingerprint of Uroteuthis singhalensis, Uroteuthis edulis, and Uroteuthis duvauceli rings was established using high-resolution mass spectrometry-based lipidomics and chemoinformatics analysis. The principal component analysis showed a clear separation of sample groups, with R2X and Q2 values of 0.97 and 0.85 for ESI+ and 0.96 and 0.86 for ESI-, indicating a good model fit. The optimized OPLS-DA and PLS-DA models could discriminate the species identity of validation samples with 100 % accuracy. A total of 67 and 90 lipid molecules were putatively identified as biomarkers in ESI+ and ESI-, respectively. Identified lipids, including PC(40:6), C14 sphingomyelin, PS(O-36:0), and PE(41:4), played an important role in species discrimination. For the first time, this study provides a detailed lipidomics profile of commercially important loliginidae squid and establishes a faster workflow for species authentication.


Subject(s)
Lipidomics , Tandem Mass Spectrometry , Chromatography, High Pressure Liquid , Cheminformatics
20.
J Chem Inf Model ; 64(3): 638-652, 2024 Feb 12.
Article in English | MEDLINE | ID: mdl-38294781

ABSTRACT

A simple approach was developed to computationally construct a polymer dataset by combining simplified molecular-input line-entry system (SMILES) strings of a targeted polymer backbone and a variety of molecular fragments. This method was used to create 14 polymer datasets by combining seven polymer backbones and molecules from two large molecular datasets (MOSES and QM9). Polymer backbones that were studied include four polydimethylsiloxane (PDMS) based backbones, poly(ethylene oxide) (PEO), poly(allyl glycidyl ether) (PAGE), and polyphosphazene (PPZ). The generated polymer datasets can be used for various cheminformatics tasks, including high-throughput screening for gas permeability and selectivity. This study utilized machine learning (ML) models to screen the polymers for CO2/CH4 and CO2/N2 gas separation using membranes. Several polymers of interest were identified. The results highlight that employing an ML model fitted to polymer selectivities leads to higher accuracy in predicting polymer selectivity compared to using the ratio of predicted permeabilities.


Subject(s)
Carbon Dioxide , Polymers , Polyethylene Glycols , Cheminformatics , High-Throughput Screening Assays
SELECTION OF CITATIONS
SEARCH DETAIL
...