Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 129
Filtrar
1.
J Cheminform ; 16(1): 96, 2024 Aug 08.
Artigo em Inglês | MEDLINE | ID: mdl-39118180

RESUMO

An automated pipeline for comprehensive calculation of intermolecular interaction energies based on molecular force-fields using the Tinker molecular modelling package is presented. Starting with non-optimized chemically intuitive monomer structures, the pipeline allows the approximation of global minimum energy monomers and dimers, configuration sampling for various monomer-monomer distances, estimation of coordination numbers by molecular dynamics simulations, and the evaluation of differential pair interaction energies. The latter are used to derive Flory-Huggins parameters and isotropic particle-particle repulsions for Dissipative Particle Dynamics (DPD). The computational results for force fields MM3, MMFF94, OPLS-AA and AMOEBA09 are analyzed with Density Functional Theory (DFT) calculations and DPD simulations for a mixture of the non-ionic polyoxyethylene alkyl ether surfactant C10E4 with water to demonstrate the usefulness of the approach.Scientific ContributionTo our knowledge, there is currently no open computational pipeline for differential pair interaction energies at all. This work aims to contribute an (at least academically available, open) approach based on molecular force fields that provides a robust and efficient computational scheme for their automated calculation for small to medium-sized (organic) molecular dimers. The usefulness of the proposed new calculation scheme is demonstrated for the generation of mesoscopic particles with their mutual repulsive interactions.

2.
J Cheminform ; 16(1): 78, 2024 Jul 05.
Artigo em Inglês | MEDLINE | ID: mdl-38970120

RESUMO

Accurate recognition of hand-drawn chemical structures is crucial for digitising hand-written chemical information in traditional laboratory notebooks or facilitating stylus-based structure entry on tablets or smartphones. However, the inherent variability in hand-drawn structures poses challenges for existing Optical Chemical Structure Recognition (OCSR) software. To address this, we present an enhanced Deep lEarning for Chemical ImagE Recognition (DECIMER) architecture that leverages a combination of Convolutional Neural Networks (CNNs) and Transformers to improve the recognition of hand-drawn chemical structures. The model incorporates an EfficientNetV2 CNN encoder that extracts features from hand-drawn images, followed by a Transformer decoder that converts the extracted features into Simplified Molecular Input Line Entry System (SMILES) strings. Our models were trained using synthetic hand-drawn images generated by RanDepict, a tool for depicting chemical structures with different style elements. A benchmark was performed using a real-world dataset of hand-drawn chemical structures to evaluate the model's performance. The results indicate that our improved DECIMER architecture exhibits a significantly enhanced recognition accuracy compared to other approaches. SCIENTIFIC CONTRIBUTION: The new DECIMER model presented here refines our previous research efforts and is currently the only open-source model tailored specifically for the recognition of hand-drawn chemical structures. The enhanced model performs better in handling variations in handwriting styles, line thicknesses, and background noise, making it suitable for real-world applications. The DECIMER hand-drawn structure recognition model and its source code have been made available as an open-source package under a permissive license.

3.
Metabolites ; 14(2)2024 Feb 10.
Artigo em Inglês | MEDLINE | ID: mdl-38393009

RESUMO

Scientific workflows facilitate the automation of data analysis tasks by integrating various software and tools executed in a particular order. To enable transparency and reusability in workflows, it is essential to implement the FAIR principles. Here, we describe our experiences implementing the FAIR principles for metabolomics workflows using the Metabolome Annotation Workflow (MAW) as a case study. MAW is specified using the Common Workflow Language (CWL), allowing for the subsequent execution of the workflow on different workflow engines. MAW is registered using a CWL description on WorkflowHub. During the submission process on WorkflowHub, a CWL description is used for packaging MAW using the Workflow RO-Crate profile, which includes metadata in Bioschemas. Researchers can use this narrative discussion as a guideline to commence using FAIR practices for their bioinformatics or cheminformatics workflows while incorporating necessary amendments specific to their research area.

4.
Crit Rev Microbiol ; : 1-40, 2024 Jan 25.
Artigo em Inglês | MEDLINE | ID: mdl-38270170

RESUMO

Microbial communities thrive through interactions and communication, which are challenging to study as most microorganisms are not cultivable. To address this challenge, researchers focus on the extracellular space where communication events occur. Exometabolomics and interactome analysis provide insights into the molecules involved in communication and the dynamics of their interactions. Advances in sequencing technologies and computational methods enable the reconstruction of taxonomic and functional profiles of microbial communities using high-throughput multi-omics data. Network-based approaches, including community flux balance analysis, aim to model molecular interactions within and between communities. Despite these advances, challenges remain in computer-assisted biosynthetic capacities elucidation, requiring continued innovation and collaboration among diverse scientists. This review provides insights into the current state and future directions of computer-assisted biosynthetic capacities elucidation in studying microbial communities.


Computer-assisted biosynthetic capacities elucidation accelerates our ability to interpret microbial interactions, allowing us to understand better and establish a balance within ecosystems.

5.
Magn Reson Chem ; 62(2): 74-83, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38112483

RESUMO

In October 2003, 20 years ago, the open-source and open-content database NMRshiftDB was announced. Since then, the database, renamed as nmrshiftdb2 later, has been continuously available and is one of the longer-running projects in the field of open data in chemistry. After 20 years, we evaluate the success of the project and present lessons learnt for similar projects.

6.
Front Microbiol ; 14: 1295994, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38116530

RESUMO

Diatoms (Bacillariophyceae) are aquatic photosynthetic microalgae with an ecological role as primary producers in the aquatic food web. They account substantially for global carbon, nitrogen, and silicon cycling. Elucidating the chemical space of diatoms is crucial to understanding their physiology and ecology. To expand the known chemical space of a cosmopolitan marine diatom, Skeletonema marinoi, we performed High-Resolution Liquid Chromatography-Tandem Mass Spectrometry (LC-MS2) for untargeted metabolomics data acquisition. The spectral data from LC-MS2 was used as input for the Metabolome Annotation Workflow (MAW) to obtain putative annotations for all measured features. A suspect list of metabolites previously identified in the Skeletonema spp. was generated to verify the results. These known metabolites were then added to the putative candidate list from LC-MS2 data to represent an expanded catalog of 1970 metabolites estimated to be produced by S. marinoi. The most prevalent chemical superclasses, based on the ChemONT ontology in this expanded dataset, were organic acids and derivatives, organoheterocyclic compounds, lipids and lipid-like molecules, and organic oxygen compounds. The metabolic profile from this study can aid the bioprospecting of marine microalgae for medicine, biofuel production, agriculture, and environmental conservation. The proposed analysis can be applicable for assessing the chemical space of other microalgae, which can also provide molecular insights into the interaction between marine organisms and their role in the functioning of ecosystems.

7.
J Cheminform ; 15(1): 98, 2023 Oct 16.
Artigo em Inglês | MEDLINE | ID: mdl-37845745

RESUMO

In recent years, cheminformatics has experienced significant advancements through the development of new open-source software tools based on various cheminformatics programming toolkits. However, adopting these toolkits presents challenges, including proper installation, setup, deployment, and compatibility management. In this work, we present the Cheminformatics Microservice. This open-source solution provides a unified interface for accessing commonly used functionalities of multiple cheminformatics toolkits, namely RDKit, Chemistry Development Kit (CDK), and Open Babel. In addition, more advanced functionalities like structure generation and Optical Chemical Structure Recognition (OCSR) are made available through the Cheminformatics Microservice based on pre-existing tools. The software service also enables developers to extend the functionalities easily and to seamlessly integrate them with existing workflows and applications. It is built on FastAPI and containerized using Docker, making it highly scalable. An instance of the microservice is publicly available at https://api.naturalproducts.net . The source code is publicly accessible on GitHub, accompanied by comprehensive documentation, version control, and continuous integration and deployment workflows. All resources can be found at the following link: https://github.com/Steinbeck-Lab/cheminformatics-microservice .

9.
Nat Commun ; 14(1): 5045, 2023 Aug 19.
Artigo em Inglês | MEDLINE | ID: mdl-37598180

RESUMO

The number of publications describing chemical structures has increased steadily over the last decades. However, the majority of published chemical information is currently not available in machine-readable form in public databases. It remains a challenge to automate the process of information extraction in a way that requires less manual intervention - especially the mining of chemical structure depictions. As an open-source platform that leverages recent advancements in deep learning, computer vision, and natural language processing, DECIMER.ai (Deep lEarning for Chemical IMagE Recognition) strives to automatically segment, classify, and translate chemical structure depictions from the printed literature. The segmentation and classification tools are the only openly available packages of their kind, and the optical chemical structure recognition (OCSR) core application yields outstanding performance on all benchmark datasets. The source code, the trained models and the datasets developed in this work have been published under permissive licences. An instance of the DECIMER web application is available at https://decimer.ai .

10.
J Cheminform ; 15(1): 32, 2023 Mar 04.
Artigo em Inglês | MEDLINE | ID: mdl-36871033

RESUMO

Mapping the chemical space of compounds to chemical structures remains a challenge in metabolomics. Despite the advancements in untargeted liquid chromatography-mass spectrometry (LC-MS) to achieve a high-throughput profile of metabolites from complex biological resources, only a small fraction of these metabolites can be annotated with confidence. Many novel computational methods and tools have been developed to enable chemical structure annotation to known and unknown compounds such as in silico generated spectra and molecular networking. Here, we present an automated and reproducible Metabolome Annotation Workflow (MAW) for untargeted metabolomics data to further facilitate and automate the complex annotation by combining tandem mass spectrometry (MS2) input data pre-processing, spectral and compound database matching with computational classification, and in silico annotation. MAW takes the LC-MS2 spectra as input and generates a list of putative candidates from spectral and compound databases. The databases are integrated via the R package Spectra and the metabolite annotation tool SIRIUS as part of the R segment of the workflow (MAW-R). The final candidate selection is performed using the cheminformatics tool RDKit in the Python segment (MAW-Py). Furthermore, each feature is assigned a chemical structure and can be imported to a chemical structure similarity network. MAW is following the FAIR (Findable, Accessible, Interoperable, Reusable) principles and has been made available as the docker images, maw-r and maw-py. The source code and documentation are available on GitHub ( https://github.com/zmahnoor14/MAW ). The performance of MAW is evaluated on two case studies. MAW can improve candidate ranking by integrating spectral databases with annotation tools like SIRIUS which contributes to an efficient candidate selection procedure. The results from MAW are also reproducible and traceable, compliant with the FAIR guidelines. Taken together, MAW could greatly facilitate automated metabolite characterization in diverse fields such as clinical metabolomics and natural product discovery.

11.
Molecules ; 28(3)2023 Feb 02.
Artigo em Inglês | MEDLINE | ID: mdl-36771127

RESUMO

The structure elucidation of small organic molecules (<1500 Dalton) through 1D and 2D nuclear magnetic resonance (NMR) data analysis is a potentially challenging, combinatorial problem. This publication presents Sherlock, a free and open-source Computer-Assisted Structure Elucidation (CASE) software where the user controls the chain of elementary operations through a versatile graphical user interface, including spectral peak picking, addition of automatically or user-defined structure constraints, structure generation, ranking and display of the solutions. A set of forty-five compounds was selected in order to illustrate the new possibilities offered to organic chemists by Sherlock for improving the reliability and traceability of structure elucidation results.

12.
Curr Opin Struct Biol ; 79: 102542, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36805192

RESUMO

Recent years have seen a sharp increase in the development of deep learning and artificial intelligence-based molecular informatics. There has been a growing interest in applying deep learning to several subfields, including the digital transformation of synthetic chemistry, extraction of chemical information from the scientific literature, and AI in natural product-based drug discovery. The application of AI to molecular informatics is still constrained by the fact that most of the data used for training and testing deep learning models are not available as FAIR and open data. As open science practices continue to grow in popularity, initiatives which support FAIR and open data as well as open-source software have emerged. It is becoming increasingly important for researchers in the field of molecular informatics to embrace open science and to submit data and software in open repositories. With the advent of open-source deep learning frameworks and cloud computing platforms, academic researchers are now able to deploy and test their own deep learning models with ease. With the development of new and faster hardware for deep learning and the increasing number of initiatives towards digital research data management infrastructures, as well as a culture promoting open data, open source, and open science, AI-driven molecular informatics will continue to grow. This review examines the current state of open data and open algorithms in molecular informatics, as well as ways in which they could be improved in future.


Assuntos
Inteligência Artificial , Aprendizado de Máquina , Algoritmos , Software , Informática
13.
J Cheminform ; 15(1): 23, 2023 Feb 19.
Artigo em Inglês | MEDLINE | ID: mdl-36803857

RESUMO

The influence of molecular fragmentation and parameter settings on a mesoscopic dissipative particle dynamics (DPD) simulation of lamellar bilayer formation for a C10E4/water mixture is studied. A "bottom-up" decomposition of C10E4 into the smallest fragment molecules (particles) that satisfy chemical intuition leads to convincing simulation results which agree with experimental findings for bilayer formation and thickness. For integration of the equations of motion Shardlow's S1 scheme proves to be a favorable choice with best overall performance. Increasing the integration time steps above the common setting of 0.04 DPD units leads to increasingly unphysical temperature drifts, but also to increasingly rapid formation of bilayer superstructures without significantly distorted particle distributions up to an integration time step of 0.12. A scaling of the mutual particle-particle repulsions that guide the dynamics has negligible influence within a considerable range of values but exhibits apparent lower thresholds beyond which a simulation fails. Repulsion parameter scaling and molecular particle decomposition show a mutual dependence. For mapping of concentrations to molecule numbers in the simulation box particle volume scaling should be taken into account. A repulsion parameter morphing investigation suggests to not overstretch repulsion parameter accuracy considerations.

14.
J Cheminform ; 15(1): 1, 2023 Jan 02.
Artigo em Inglês | MEDLINE | ID: mdl-36593523

RESUMO

Developing and implementing computational algorithms for the extraction of specific substructures from molecular graphs (in silico molecule fragmentation) is an iterative process. It involves repeated sequences of implementing a rule set, applying it to relevant structural data, checking the results, and adjusting the rules. This requires a computational workflow with data import, fragmentation algorithm integration, and result visualisation. The described workflow is normally unavailable for a new algorithm and must be set up individually. This work presents an open Java rich client Graphical User Interface (GUI) application to support the development of new in silico molecule fragmentation algorithms and make them readily available upon release. The MORTAR (MOlecule fRagmenTAtion fRamework) application visualises fragmentation results of a set of molecules in various ways and provides basic analysis features. Fragmentation algorithms can be integrated and developed within MORTAR by using a specific wrapper class. In addition, fragmentation pipelines with any combination of the available fragmentation methods can be executed. Upon release, three fragmentation algorithms are already integrated: ErtlFunctionalGroupsFinder, Sugar Removal Utility, and Scaffold Generator. These algorithms, as well as all cheminformatics functionalities in MORTAR, are implemented based on the Chemistry Development Kit (CDK).

15.
J Cheminform ; 14(1): 85, 2022 Dec 13.
Artigo em Inglês | MEDLINE | ID: mdl-36510332

RESUMO

Homologous series are groups of related compounds that share the same core structure attached to a motif that repeats to different degrees. Compounds forming homologous series are of interest in multiple domains, including natural products, environmental chemistry, and drug design. However, many homologous compounds remain unannotated as such in compound datasets, which poses obstacles to understanding chemical diversity and their analytical identification via database matching. To overcome these challenges, an algorithm to detect homologous series within compound datasets was developed and implemented using the RDKit. The algorithm takes a list of molecules as SMILES strings and a monomer (i.e., repeating unit) encoded as SMARTS as its main inputs. In an iterative process, substructure matching of repeating units, molecule fragmentation, and core detection lead to homologous series classification through grouping of identical cores. Three open compound datasets from environmental chemistry (NORMAN Suspect List Exchange, NORMAN-SLE), exposomics (PubChemLite for Exposomics), and natural products (the COlleCtion of Open NatUral producTs, COCONUT) were subject to homologous series classification using the algorithm. Over 2000, 12,000, and 5000 series with CH2 repeating units were classified in the NORMAN-SLE, PubChemLite, and COCONUT respectively. Validation of classified series was performed using published homologous series and structure categories, including a comparison with a similar existing method for categorising PFAS compounds. The OngLai algorithm and its implementation for classifying homologues are openly available at: https://github.com/adelenelai/onglai-classify-homologues .

16.
J Cheminform ; 14(1): 79, 2022 Nov 10.
Artigo em Inglês | MEDLINE | ID: mdl-36357931

RESUMO

The concept of molecular scaffolds as defining core structures of organic molecules is utilised in many areas of chemistry and cheminformatics, e.g. drug design, chemical classification, or the analysis of high-throughput screening data. Here, we present Scaffold Generator, a comprehensive open library for the generation, handling, and display of molecular scaffolds, scaffold trees and networks. The new library is based on the Chemistry Development Kit (CDK) and highly customisable through multiple settings, e.g. five different structural framework definitions are available. For display of scaffold hierarchies, the open GraphStream Java library is utilised. Performance snapshots with natural products (NP) from the COCONUT (COlleCtion of Open Natural prodUcTs) database and drug molecules from DrugBank are reported. The generation of a scaffold network from more than 450,000 NP can be achieved within a single day.

17.
Angew Chem Int Ed Engl ; 61(51): e202203038, 2022 12 19.
Artigo em Inglês | MEDLINE | ID: mdl-36347644

RESUMO

Research data management (RDM) is needed to assist experimental advances and data collection in the chemical sciences. Many funders require RDM because experiments are often paid for by taxpayers and the resulting data should be deposited sustainably for posterity. However, paper notebooks are still common in laboratories and research data is often stored in proprietary and/or dead-end file formats without experimental context. Data must mature beyond a mere supplement to a research paper. Electronic lab notebooks (ELN) and laboratory information management systems (LIMS) allow researchers to manage data better and they simplify research and publication. Thus, an agreement is needed on minimum information standards for data handling to support structured approaches to data reporting. As digitalization becomes part of curricular teaching, future generations of digital native chemists will embrace RDM and ELN as an organic part of their research.


Assuntos
Gerenciamento de Dados , Laboratórios
18.
Membranes (Basel) ; 12(6)2022 Jun 14.
Artigo em Inglês | MEDLINE | ID: mdl-35736327

RESUMO

Different charge treatment approaches are examined for cyclotide-induced plasma membrane disruption by lipid extraction studied with dissipative particle dynamics. A pure Coulomb approach with truncated forces tuned to avoid individual strong ion pairing still reveals hidden statistical pairing effects that may lead to artificial membrane stabilization or distortion of cyclotide activity depending on the cyclotide's charge state. While qualitative behavior is not affected in an apparent manner, more sensitive quantitative evaluations can be systematically biased. The findings suggest a charge smearing of point charges by an adequate charge distribution. For large mesoscopic simulation boxes, approximations for the Ewald sum to account for mirror charges due to periodic boundary conditions are of negligible influence.

19.
J Cheminform ; 14(1): 31, 2022 Jun 06.
Artigo em Inglês | MEDLINE | ID: mdl-35668480

RESUMO

The development of deep learning-based optical chemical structure recognition (OCSR) systems has led to a need for datasets of chemical structure depictions. The diversity of the features in the training data is an important factor for the generation of deep learning systems that generalise well and are not overfit to a specific type of input. In the case of chemical structure depictions, these features are defined by the depiction parameters such as bond length, line thickness, label font style and many others. Here we present RanDepict, a toolkit for the creation of diverse sets of chemical structure depictions. The diversity of the image features is generated by making use of all available depiction parameters in the depiction functionalities of the CDK, RDKit, and Indigo. Furthermore, there is the option to enhance and augment the image with features such as curved arrows, chemical labels around the structure, or other kinds of distortions. Using depiction feature fingerprints, RanDepict ensures diversely picked image features. Here, the depiction and augmentation features are summarised in binary vectors and the MaxMin algorithm is used to pick diverse samples out of all valid options. By making all resources described herein publicly available, we hope to contribute to the development of deep learning-based OCSR systems.

20.
J Cheminform ; 14(1): 36, 2022 Jun 09.
Artigo em Inglês | MEDLINE | ID: mdl-35681226

RESUMO

The translation of images of chemical structures into machine-readable representations of the depicted molecules is known as optical chemical structure recognition (OCSR). There has been a lot of progress over the last three decades in this field, but the development of systems for the recognition of complex hand-drawn structure depictions is still at the beginning. Currently, there is no data for the systematic evaluation of OCSR methods on hand-drawn structures available. Here we present DECIMER - Hand-drawn molecule images, a standardised, openly available benchmark dataset of 5088 hand-drawn depictions of diversely picked chemical structures. Every structure depiction in the dataset is mapped to a machine-readable representation of the underlying molecule. The dataset is openly available and published under the CC-BY 4.0 licence which applies very few limitations. We hope that it will contribute to the further development of the field.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA