Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 19 de 19
Filter
Add more filters










Publication year range
1.
J Cheminform ; 15(1): 75, 2023 Aug 30.
Article in English | MEDLINE | ID: mdl-37649050

ABSTRACT

Siamese networks, representing a novel class of neural networks, consist of two identical subnetworks sharing weights but receiving different inputs. Here we present a similarity-based pairing method for generating compound pairs to train Siamese neural networks for regression tasks. In comparison with the conventional exhaustive pairing, it reduces the algorithm complexity from O(n2) to O(n). It also results in a better prediction performance consistently on the three physicochemical datasets, using a multilayer perceptron with the circular fingerprint as a proof of concept. We further include into a Siamese neural network the transformer-based Chemformer, which extracts task-specific features from the simplified molecular-input line-entry system representation of compounds. Additionally, we propose a means to measure the prediction uncertainty by utilizing the variance in predictions from a set of reference compounds. Our results demonstrate that the high prediction accuracy correlates with the high confidence. Finally, we investigate implications of the similarity property principle in machine learning.

4.
ACS Omega ; 7(30): 26573-26581, 2022 Aug 02.
Article in English | MEDLINE | ID: mdl-35936431

ABSTRACT

Matched molecular pairs (MMPs) are nowadays a commonly applied concept in drug design. They are used in many computational tools for structure-activity relationship analysis, biological activity prediction, or optimization of physicochemical properties. However, until now it has not been shown in a rigorous way that MMPs, that is, changing only one substituent between two molecules, can be predicted with higher accuracy and precision in contrast to any other chemical compound pair. It is expected that any model should be able to predict such a defined change with high accuracy and reasonable precision. In this study, we examine the predictability of four classical properties relevant for drug design ranging from simple physicochemical parameters (log D and solubility) to more complex cell-based ones (permeability and clearance), using different data sets and machine learning algorithms. Our study confirms that additive data are the easiest to predict, which highlights the importance of recognition of nonadditivity events and the challenging complexity of predicting properties in case of scaffold hopping. Despite deep learning being well suited to model nonlinear events, these methods do not seem to be an exception of this observation. Though they are in general performing better than classical machine learning methods, this leaves the field with a still standing challenge.

5.
J Cheminform ; 14(1): 18, 2022 Mar 28.
Article in English | MEDLINE | ID: mdl-35346368

ABSTRACT

Molecular optimization aims to improve the drug profile of a starting molecule. It is a fundamental problem in drug discovery but challenging due to (i) the requirement of simultaneous optimization of multiple properties and (ii) the large chemical space to explore. Recently, deep learning methods have been proposed to solve this task by mimicking the chemist's intuition in terms of matched molecular pairs (MMPs). Although MMPs is a widely used strategy by medicinal chemists, it offers limited capability in terms of exploring the space of structural modifications, therefore does not cover the complete space of solutions. Often more general transformations beyond the nature of MMPs are feasible and/or necessary, e.g. simultaneous modifications of the starting molecule at different places including the core scaffold. This study aims to provide a general methodology that offers more general structural modifications beyond MMPs. In particular, the same Transformer architecture is trained on different datasets. These datasets consist of a set of molecular pairs which reflect different types of transformations. Beyond MMP transformation, datasets reflecting general structural changes are constructed from ChEMBL based on two approaches: Tanimoto similarity (allows for multiple modifications) and scaffold matching (allows for multiple modifications but keep the scaffold constant) respectively. We investigate how the model behavior can be altered by tailoring the dataset while using the same model architecture. Our results show that the models trained on differently prepared datasets transform a given starting molecule in a way that it reflects the nature of the dataset used for training the model. These models could complement each other and unlock the capability for the chemists to pursue different options for improving a starting molecule.

6.
J Cheminform ; 13(1): 89, 2021 Nov 17.
Article in English | MEDLINE | ID: mdl-34789335

ABSTRACT

Recently, we have released the de novo design platform REINVENT in version 2.0. This improved and extended iteration supports far more features and scoring function components, which allows bespoke and tailor-made protocols to maximize impact in small molecule drug discovery projects. A major obstacle of generative models is producing active compounds, in which predictive (QSAR) models have been applied to enrich target activity. However, QSAR models are inherently limited by their applicability domains. To overcome these limitations, we introduce a structure-based scoring component for REINVENT. DockStream is a flexible, stand-alone molecular docking wrapper that provides access to a collection of ligand embedders and docking backends. Using the benchmarking and analysis workflow provided in DockStream, execution and subsequent analysis of a variety of docking configurations can be automated. Docking algorithms vary greatly in performance depending on the target and the benchmarking and analysis workflow provides a streamlined solution to identifying productive docking configurations. We show that an informative docking configuration can inform the REINVENT agent to optimize towards improving docking scores using public data. With docking activated, REINVENT is able to retain key interactions in the binding site, discard molecules which do not fit the binding cavity, harness unused (sub-)pockets, and improve overall performance in the scaffold-hopping scenario. The code is freely available at https://github.com/MolecularAI/DockStream .

7.
ACS Omega ; 6(16): 11086-11094, 2021 Apr 27.
Article in English | MEDLINE | ID: mdl-34056263

ABSTRACT

Activity prediction plays an essential role in drug discovery by directing search of drug candidates in the relevant chemical space. Despite being applied successfully to image recognition and semantic similarity, the Siamese neural network has rarely been explored in drug discovery where modelling faces challenges such as insufficient data and class imbalance. Here, we present a Siamese recurrent neural network model (SiameseCHEM) based on bidirectional long short-term memory architecture with a self-attention mechanism, which can automatically learn discriminative features from the SMILES representations of small molecules. Subsequently, it is used to categorize bioactivity of small molecules via N-shot learning. Trained on random SMILES strings, it proves robust across five different datasets for the task of binary or categorical classification of bioactivity. Benchmarking against two baseline machine learning models which use the chemistry-rich ECFP fingerprints as the input, the deep learning model outperforms on three datasets and achieves comparable performance on the other two. The failure of both baseline methods on SMILES strings highlights that the deep learning model may learn task-specific chemistry features encoded in SMILES strings.

8.
J Cheminform ; 13(1): 26, 2021 Mar 20.
Article in English | MEDLINE | ID: mdl-33743817

ABSTRACT

A main challenge in drug discovery is finding molecules with a desirable balance of multiple properties. Here, we focus on the task of molecular optimization, where the goal is to optimize a given starting molecule towards desirable properties. This task can be framed as a machine translation problem in natural language processing, where in our case, a molecule is translated into a molecule with optimized properties based on the SMILES representation. Typically, chemists would use their intuition to suggest chemical transformations for the starting molecule being optimized. A widely used strategy is the concept of matched molecular pairs where two molecules differ by a single transformation. We seek to capture the chemist's intuition from matched molecular pairs using machine translation models. Specifically, the sequence-to-sequence model with attention mechanism, and the Transformer model are employed to generate molecules with desirable properties. As a proof of concept, three ADMET properties are optimized simultaneously: logD, solubility, and clearance, which are important properties of a drug. Since desirable properties often vary from project to project, the user-specified desirable property changes are incorporated into the input as an additional condition together with the starting molecules being optimized. Thus, the models can be guided to generate molecules satisfying the desirable properties. Additionally, we compare the two machine translation models based on the SMILES representation, with a graph-to-graph translation model HierG2G, which has shown the state-of-the-art performance in molecular optimization. Our results show that the Transformer can generate more molecules with desirable properties by making small modifications to the given starting molecules, which can be intuitive to chemists. A further enrichment of diverse molecules can be achieved by using an ensemble of models.

9.
Nucleic Acids Res ; 48(W1): W48-W53, 2020 07 02.
Article in English | MEDLINE | ID: mdl-32297936

ABSTRACT

Due to the increasing amount of publicly available protein structures searching, enriching and investigating these data still poses a challenging task. The ProteinsPlus web service (https://proteins.plus) offers a broad range of tools addressing these challenges. The web interface to the tool collection focusing on protein-ligand interactions has been geared towards easy and intuitive access to a large variety of functionality for life scientists. Since our last publication, the ProteinsPlus web service has been extended by additional services as well as it has undergone substantial infrastructural improvements. A keyword search functionality was added on the start page of ProteinsPlus enabling users to work on structures without knowing their PDB code. The tool collection has been augmented by three tools: StructureProfiler validates ligands and active sites using selection criteria of well-established protein-ligand benchmark data sets, WarPP places water molecules in the ligand binding sites of a protein, and METALizer calculates, predicts and scores coordination geometries of metal ions based on surrounding complex atoms. Additionally, all tools provided by ProteinsPlus are available through a REST service enabling the automated integration in structure processing and modeling pipelines.


Subject(s)
Proteins/chemistry , Software , Binding Sites , Ligands , Metals/chemistry , Models, Molecular , Proteins/metabolism , Water/chemistry
10.
J Comput Aided Mol Des ; 33(3): 307-330, 2019 03.
Article in English | MEDLINE | ID: mdl-30756207

ABSTRACT

Targeting the interaction with or displacement of the 'right' water molecule can significantly increase inhibitor potency in structure-guided drug design. Multiple computational approaches exist to predict which waters should be targeted for displacement to achieve the largest gain in potency. However, the relative success of different methods remains underexplored. Here, we present a comparison of the ability of five water prediction programs (3D-RISM, SZMAP, WaterFLAP, WaterRank, and WaterMap) to predict crystallographic water locations, calculate their binding free energies, and to relate differences in these energies to observed changes in potency. The structural cohort included nine Bruton's Tyrosine Kinase (BTK) structures, and nine bromodomain structures. Each program accurately predicted the locations of most crystallographic water molecules. However, the predicted binding free energies correlated poorly with the observed changes in inhibitor potency when solvent atoms were displaced by chemical changes in closely related compounds.


Subject(s)
Agammaglobulinaemia Tyrosine Kinase/chemistry , Computer Simulation , Models, Molecular , Protein Kinase Inhibitors/chemistry , Water/chemistry , Crystallography, X-Ray , Ligands , Protein Binding , Protein Domains , Software , Solvents/chemistry , Structure-Activity Relationship , Thermodynamics
11.
J Chem Inf Model ; 58(8): 1625-1637, 2018 08 27.
Article in English | MEDLINE | ID: mdl-30036062

ABSTRACT

Water molecules are of great importance for the correct representation of ligand binding interactions. Throughout the last years, water molecules and their integration into drug design strategies have received increasing attention. Nowadays a variety of tools are available to place and score water molecules. However, the most frequently applied software solutions require substantial computational resources. In addition, none of the existing methods has been rigorously evaluated on the basis of a large number of diverse protein complexes. Therefore, we present a novel method for placing water molecules, called WarPP, based on interaction geometries previously derived from protein crystal structures. Using a large, previously compiled, high-quality validation set of almost 1500 protein-ligand complexes containing almost 20 000 crystallographically observed water molecules in their active sites, we validated our placement strategy. We correctly placed 80% of the water molecules within 1.0 Šof a crystallographically observed one.


Subject(s)
Proteins/chemistry , Water/chemistry , Binding Sites , Databases, Protein , Ligands , Models, Molecular , Protein Conformation , Thermodynamics
12.
J Chem Inf Model ; 57(10): 2437-2447, 2017 10 23.
Article in English | MEDLINE | ID: mdl-28981269

ABSTRACT

Macromolecular structures resolved by X-ray crystallography are essential for life science research. While some methods exist to automatically quantify the quality of the electron density fit, none of them is without flaws. Especially the question of how well individual parts like atoms, small fragments, or molecules are supported by electron density is difficult to quantify. While taking experimental uncertainties correctly into account, they do not offer an answer on how reliable an individual atom position is. A rapid quantification of this atomic position reliability would be highly valuable in structure-based molecular design. To overcome this limitation, we introduce the electron density score EDIA for individual atoms and molecular fragments. EDIA assesses rapidly, automatically, and intuitively the fit of individual as well as multiple atoms (EDIAm) into electron density accompanied by an integrated error analysis. The computation is based on the standard 2fo - fc electron density map in combination with the model of the molecular structure. For evaluating partial structures, EDIAm shows significant advantages compared to the real-space R correlation coefficient (RSCC) and the real-space difference density Z score (RSZD) from the molecular modeler's point of view. Thus, EDIA abolishes the time-consuming step of visually inspecting the electron density during structure selection and curation. It supports daily modeling tasks of medicinal and computational chemists and enables a fully automated assembly of large-scale, high-quality structure data sets. Furthermore, EDIA scores can be applied for model validation and method development in computer-aided molecular design. In contrast to measuring the deviation from the structure model by root-mean-squared deviation, EDIA scores allow comparison to the underlying experimental data taking its uncertainty into account.


Subject(s)
Crystallography, X-Ray/methods , Electrons , Models, Molecular , Ligands , Peptide Fragments/chemistry
13.
J Chem Inf Model ; 57(9): 2132-2142, 2017 09 25.
Article in English | MEDLINE | ID: mdl-28891648

ABSTRACT

Noncovalent interactions play an important role in macromolecular complexes. The assessment of molecular interactions is often based on knowledge derived from statistics on structural data. Within the last years, the available data in the Brookhaven Protein Data Bank has increased dramatically, quantitatively as well as qualitatively. This development allows the derivation of enhanced interaction models and motivates new ways of data analysis. Here, we present a method to facilitate the analysis of noncovalent interactions enabling detailed insights into the nature of molecular interactions. The method is integrated into a highly variable framework enabling the adaption to user-specific requirements. NAOMInova, the user interface for our method, allows the generation of specific statistics with respect to the chemical environment of substructures. The substructures as well as the analyzed set of protein structures can be chosen arbitrarily. Although NAOMInova was primarily made for data exploration in protein-ligand crystal structures, it can be used in combination with any structure collection, for example, analysis of a carbonyl in the neighborhood of an aromatic ring on a set of structures resulting from a MD simulation. Additionally, a filter for different atom attributes can be applied including the experimental support by electron density for single atoms. In this publication, we present the underlying algorithmic techniques of our method and show application examples that demonstrate NAOMInova's ability to support individual analysis of noncovalent interactions in protein structures. NAOMInova is available at http://www.zbh.uni-hamburg.de/naominova .


Subject(s)
Computational Biology/methods , Macromolecular Substances/chemistry , Macromolecular Substances/metabolism , User-Computer Interface , Computer Graphics , Models, Molecular , Molecular Conformation
14.
J Biotechnol ; 261: 207-214, 2017 Nov 10.
Article in English | MEDLINE | ID: mdl-28610996

ABSTRACT

Nowadays, computational approaches are an integral part of life science research. Problems related to interpretation of experimental results, data analysis, or visualization tasks highly benefit from the achievements of the digital era. Simulation methods facilitate predictions of physicochemical properties and can assist in understanding macromolecular phenomena. Here, we will give an overview of the methods developed in our group that aim at supporting researchers from all life science areas. Based on state-of-the-art approaches from structural bioinformatics and cheminformatics, we provide software covering a wide range of research questions. Our all-in-one web service platform ProteinsPlus (http://proteins.plus) offers solutions for pocket and druggability prediction, hydrogen placement, structure quality assessment, ensemble generation, protein-protein interaction classification, and 2D-interaction visualization. Additionally, we provide a software package that contains tools targeting cheminformatics problems like file format conversion, molecule data set processing, SMARTS editing, fragment space enumeration, and ligand-based virtual screening. Furthermore, it also includes structural bioinformatics solutions for inverse screening, binding site alignment, and searching interaction patterns across structure libraries. The software package is available at http://software.zbh.uni-hamburg.de.


Subject(s)
Computational Biology , Internet , Software , Databases, Protein
15.
Nucleic Acids Res ; 45(W1): W337-W343, 2017 07 03.
Article in English | MEDLINE | ID: mdl-28472372

ABSTRACT

With currently more than 126 000 publicly available structures and an increasing growth rate, the Protein Data Bank constitutes a rich data source for structure-driven research in fields like drug discovery, crop science and biotechnology in general. Typical workflows in these areas involve manifold computational tools for the analysis and prediction of molecular functions. Here, we present the ProteinsPlus web server that offers a unified easy-to-use interface to a broad range of tools for the early phase of structure-based molecular modeling. This includes solutions for commonly required pre-processing tasks like structure quality assessment (EDIA), hydrogen placement (Protoss) and the search for alternative conformations (SIENA). Beyond that, it also addresses frequent problems as the generation of 2D-interaction diagrams (PoseView), protein-protein interface classification (HyPPI) as well as automatic pocket detection and druggablity assessment (DoGSiteScorer). The unified ProteinsPlus interface covering all featured approaches provides various facilities for intuitive input and result visualization, case-specific parameterization and download options for further processing. Moreover, its generalized workflow allows the user a quick familiarization with the different tools. ProteinsPlus also stores the calculated results temporarily for future request and thus facilitates convenient result communication and re-access. The server is freely available at http://proteins.plus.


Subject(s)
Protein Conformation , Software , Binding Sites , Hydrogen/chemistry , Internet , Ligands , Models, Molecular , Protein Interaction Mapping , Proteins/chemistry
16.
Proteins ; 85(8): 1550-1566, 2017 Aug.
Article in English | MEDLINE | ID: mdl-28486771

ABSTRACT

Reliable computational prediction of protein side chain conformations and the energetic impact of amino acid mutations are the key aspects for the optimization of biotechnologically relevant enzymatic reactions using structure-based design. By improving the protein stability, higher yields can be achieved. In addition, tuning the substrate selectivity of an enzymatic reaction by directed mutagenesis can lead to higher turnover rates. This work presents a novel approach to predict the conformation of a side chain mutation along with the energetic effect on the protein structure. The HYDE scoring concept applied here describes the molecular interactions primarily by evaluating the effect of dehydration and hydrogen bonding on molecular structures in aqueous solution. Here, we evaluate its capability of side-chain conformation prediction in classic remutation experiments. Furthermore, we present a new data set for evaluating "cross-mutations," a new experiment that resembles real-world application scenarios more closely. This data set consists of protein pairs with up to five point mutations. Thus, structural changes are attributed to point mutations only. In the cross-mutation experiment, the original protein structure is mutated with the aim to predict the structure of the side chain as in the paired mutated structure. The comparison of side chain conformation prediction ("remutation") showed that the performance of HYDEprotein is qualitatively comparable to state-of-the art methods. The ability of HYDEprotein to predict the energetic effect of a mutation is evaluated in the third experiment. Herein, the effect on protein stability is predicted correctly in 70% of the evaluated cases. Proteins 2017; 85:1550-1566. © 2017 Wiley Periodicals, Inc.


Subject(s)
Amino Acids/chemistry , Point Mutation , Water/chemistry , beta-Glucosidase/chemistry , Amino Acid Substitution , Amino Acids/genetics , Desiccation , Humans , Hydrogen Bonding , Protein Conformation, alpha-Helical , Protein Conformation, beta-Strand , Protein Stability , Software , Solutions , Structure-Activity Relationship , Thermodynamics , beta-Glucosidase/genetics
17.
J Med Chem ; 60(10): 4245-4257, 2017 05 25.
Article in English | MEDLINE | ID: mdl-28497966

ABSTRACT

Protein-ligand interactions are the fundamental basis for molecular design in pharmaceutical research, biocatalysis, and agrochemical development. Especially hydrogen bonds are known to have special geometric requirements and therefore deserve a detailed analysis. In modeling approaches a more general description of hydrogen bond geometries, using distance and directionality, is applied. A first study of their geometries was performed based on 15 protein structures in 1982. Currently there are about 95 000 protein-ligand structures available in the PDB, providing a solid foundation for a new large-scale statistical analysis. Here, we report a comprehensive investigation of geometric and functional properties of hydrogen bonds. Out of 22 defined functional groups, eight are fully in accordance with theoretical predictions while 14 show variations from expected values. On the basis of these results, we derived interaction geometries to improve current computational models. It is expected that these observations will be useful in designing new chemical structures for biological applications.


Subject(s)
Proteins/metabolism , Animals , Databases, Protein , Drug Discovery , Humans , Hydrogen Bonding , Ligands , Molecular Docking Simulation , Proteins/chemistry
18.
J Comput Aided Mol Des ; 30(8): 583-94, 2016 08.
Article in English | MEDLINE | ID: mdl-27565795

ABSTRACT

Ligand-based virtual screening is a well established method to find new lead molecules in todays drug discovery process. In order to be applicable in day to day practice, such methods have to face multiple challenges. The most important part is the reliability of the results, which can be shown and compared in retrospective studies. Furthermore, in the case of 3D methods, they need to provide biologically relevant molecular alignments of the ligands, that can be further investigated by a medicinal chemist. Last but not least, they have to be able to screen large databases in reasonable time. Many algorithms for ligand-based virtual screening have been proposed in the past, most of them based on pairwise comparisons. Here, a new method is introduced called mRAISE. Based on structural alignments, it uses a descriptor-based bitmap search engine (RAISE) to achieve efficiency. Alignments created on the fly by the search engine get evaluated with an independent shape-based scoring function also used for ranking of compounds. The correct ranking as well as the alignment quality of the method are evaluated and compared to other state of the art methods. On the commonly used Directory of Useful Decoys dataset mRAISE achieves an average area under the ROC curve of 0.76, an average enrichment factor at 1 % of 20.2 and an average hit rate at 1 % of 55.5. With these results, mRAISE is always among the top performing methods with available data for comparison. To access the quality of the alignments calculated by ligand-based virtual screening methods, we introduce a new dataset containing 180 prealigned ligands for 11 diverse targets. Within the top ten ranked conformations, the alignment closest to X-ray structure calculated with mRAISE has a root-mean-square deviation of less than 2.0 Å for 80.8 % of alignment pairs and achieves a median of less than 2.0 Å for eight of the 11 cases. The dataset used to rate the quality of the calculated alignments is freely available at http://www.zbh.uni-hamburg.de/mraise-dataset.html . The table of all PDB codes contained in the ensembles can be found in the supplementary material. The software tool mRAISE is freely available for evaluation purposes and academic use (see http://www.zbh.uni-hamburg.de/raise ).


Subject(s)
Drug Discovery , Software , Algorithms , Binding Sites , Databases, Chemical , Databases, Protein , Drug Discovery/methods , Humans , Ligands , Models, Molecular , Proteins/metabolism
19.
J Chem Inf Model ; 55(4): 771-83, 2015 Apr 27.
Article in English | MEDLINE | ID: mdl-25742501

ABSTRACT

Water molecules play important roles in many biological processes, especially when mediating protein-ligand interactions. Dehydration and the hydrophobic effect are of central importance for estimating binding affinities. Due to the specific geometric characteristics of hydrogen bond functions of water molecules, meaning two acceptor and two donor functions in a tetrahedral arrangement, they have to be modeled accurately. Despite many attempts in the past years, accurate prediction of water molecules-structurally as well as energetically-remains a grand challenge. One reason is certainly the lack of experimental data, since energetic contributions of water molecules can only be measured indirectly. However, on the structural side, the electron density clearly shows the positions of stable water molecules. This information has the potential to improve models on water structure and energy in proteins and protein interfaces. On the basis of a high-resolution subset of the Protein Data Bank, we have conducted an extensive statistical analysis of 2.3 million water molecules, discriminating those water molecules that are well resolved and those without much evidence of electron density. In order to perform this classification, we introduce a new measurement of electron density around an individual atom enabling the automatic quantification of experimental support. On the basis of this measurement, we present an analysis of water molecules with a detailed profile of geometric and structural features. This data, which is freely available, can be applied to not only modeling and validation of new water models in structural biology but also in molecular design.


Subject(s)
Electrons , Models, Molecular , Proteins/chemistry , Water/chemistry , Databases, Protein , Hydrogen Bonding , Hydrophobic and Hydrophilic Interactions , Protein Conformation , Proteins/metabolism
SELECTION OF CITATIONS
SEARCH DETAIL
...