RESUMEN
Feature attribution techniques are popular choices within the explainable artificial intelligence toolbox, as they can help elucidate which parts of the provided inputs used by an underlying supervised-learning method are considered relevant for a specific prediction. In the context of molecular design, these approaches typically involve the coloring of molecular graphs, whose presentation to medicinal chemists can be useful for making a decision of which compounds to synthesize or prioritize. The consistency of the highlighted moieties alongside expert background knowledge is expected to contribute to the understanding of machine-learning models in drug design. Quantitative evaluation of such coloring approaches, however, has so far been limited to substructure identification tasks. We here present an approach that is based on maximum common substructure algorithms applied to experimentally-determined activity cliffs. Using the proposed benchmark, we found that molecule coloring approaches in conjunction with classical machine-learning models tend to outperform more modern, graph-neural-network alternatives. The provided benchmark data are fully open sourced, which we hope will facilitate the testing of newly developed molecular feature attribution techniques.
Asunto(s)
Inteligencia Artificial , Benchmarking , Algoritmos , Aprendizaje Automático , Redes Neurales de la ComputaciónRESUMEN
Graph neural networks are able to solve certain drug discovery tasks such as molecular property prediction and de novo molecule generation. However, these models are considered "black-box" and "hard-to-debug". This study aimed to improve modeling transparency for rational molecular design by applying the integrated gradients explainable artificial intelligence (XAI) approach for graph neural network models. Models were trained for predicting plasma protein binding, hERG channel inhibition, passive permeability, and cytochrome P450 inhibition. The proposed methodology highlighted molecular features and structural elements that are in agreement with known pharmacophore motifs, correctly identified property cliffs, and provided insights into unspecific ligand-target interactions. The developed XAI approach is fully open-sourced and can be used by practitioners to train new models on other clinically relevant endpoints.
Asunto(s)
Inteligencia Artificial , Redes Neurales de la Computación , Descubrimiento de Drogas , LigandosRESUMEN
SUMMARY: Virtual screening pipelines are one of the most popular used tools in structure-based drug discovery, since they can can reduce both time and cost associated with experimental assays. Recent advances in deep learning methodologies have shown that these outperform classical scoring functions at discriminating binder protein-ligand complexes. Here, we present BindScope, a web application for large-scale active-inactive classification of compounds based on deep convolutional neural networks. Performance is on a pair with current state-of-the-art pipelines. Users can screen on the order of hundreds of compounds at once and interactively visualize the results. AVAILABILITY AND IMPLEMENTATION: BindScope is available as part of the PlayMolecule.org web application suite. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Descubrimiento de Drogas , Internet , Aprendizaje Profundo , Descubrimiento de Drogas/métodos , Ligandos , Redes Neurales de la ComputaciónRESUMEN
Motivation: Structure-based drug discovery methods exploit protein structural information to design small molecules binding to given protein pockets. This work proposes a purely data driven, structure-based approach for imaging ligands as spatial fields in target protein pockets. We use an end-to-end deep learning framework trained on experimental protein-ligand complexes with the intention of mimicking a chemist's intuition at manually placing atoms when designing a new compound. We show that these models can generate spatial images of ligand chemical properties like occupancy, aromaticity and donor-acceptor matching the protein pocket. Results: The predicted fields considerably overlap with those of unseen ligands bound to the target pocket. Maximization of the overlap between the predicted fields and a given ligand on the Astex diverse set recovers the original ligand crystal poses in 70 out of 85 cases within a threshold of 2 Å RMSD. We expect that these models can be used for guiding structure-based drug discovery approaches. Availability and implementation: LigVoxel is available as part of the PlayMolecule.org molecular web application suite. Supplementary information: Supplementary data are available at Bioinformatics online.
Asunto(s)
Descubrimiento de Drogas , Redes Neurales de la Computación , Proteínas/química , Programas Informáticos , Sitios de Unión , Biología Computacional , Ligandos , Unión Proteica , Conformación ProteicaRESUMEN
Sulfur dioxide is generally used as an antimicrobial in wine to counteract the activity of spoilage yeasts, including Brettanomyces bruxellensis. However, this chemical does not exert the same effectiveness on different B. bruxellensis yeasts since some strains can proliferate in the final product leading to a negative sensory profile due to 4-ethylguaiacol and 4-ethylphenol. Thus, the capability of deciphering the general molecular mechanisms characterizing this yeast species' response in presence of SO2 stress could be considered strategic for a better management of SO2 in winemaking. A RNA-Seq approach was used to investigate the gene expression of two strains of B. bruxellensis, AWRI 1499 and CBS 2499 having different genetic backgrounds, when exposed to a SO2 pulse. Results revealed that sulphites affected yeast culturability and metabolism, but not volatile phenol production suggesting that a phenotypical heterogeneity could be involved for the SO2 cell adaptation. The transcriptomics variation in response to SO2 stress confirmed the strain-related response in B. bruxellensis and the GO analysis of common differentially expressed genes showed that the detoxification process carried out by SSU1 gene can be considered as the principal specific adaptive response to counteract the SO2 presence. However, nonspecific mechanisms can be exploited by cells to assist the SO2 tolerance; namely, the metabolisms related to sugar alcohol (polyols) and oxidative stress, and structural compounds.
Asunto(s)
Brettanomyces/genética , Brettanomyces/metabolismo , Fermentación , Estrés Fisiológico , Dióxido de Azufre/metabolismo , Vino/microbiología , Microbiología de Alimentos , Perfilación de la Expresión Génica , RNA-Seq , TranscriptomaRESUMEN
Chemical space is impractically large, and conventional structure-based virtual screening techniques cannot be used to simply search through the entire space to discover effective bioactive molecules. To address this shortcoming, we propose a generative adversarial network to generate, rather than search, diverse three-dimensional ligand shapes complementary to the pocket. Furthermore, we show that the generated molecule shapes can be decoded using a shape-captioning network into a sequence of SMILES enabling directly the structure-based de novo drug design. We evaluate the quality of the method by both structure- (docking) and ligand-based [quantitative structure-activity relationship (QSAR)] virtual screening methods. For both evaluation approaches, we observed enrichment compared to random sampling from initial chemical space of ZINC drug-like compounds.
Asunto(s)
Diseño de Fármacos , Descubrimiento de Drogas , Modelos Químicos , Redes Neurales de la Computación , Proteínas/química , Bibliotecas de Moléculas Pequeñas/química , Humanos , Ligandos , Conformación Molecular , Proteínas/metabolismo , Relación Estructura-Actividad Cuantitativa , Bibliotecas de Moléculas Pequeñas/metabolismoRESUMEN
In this work, we propose a machine learning approach to generate novel molecules starting from a seed compound, its three-dimensional (3D) shape, and its pharmacophoric features. The pipeline draws inspiration from generative models used in image analysis and represents a first example of the de novo design of lead-like molecules guided by shape-based features. A variational autoencoder is used to perturb the 3D representation of a compound, followed by a system of convolutional and recurrent neural networks that generate a sequence of SMILES tokens. The generative design of novel scaffolds and functional groups can cover unexplored regions of chemical space that still possess lead-like properties.
Asunto(s)
Aprendizaje Automático , Preparaciones Farmacéuticas/química , Diseño de Fármacos , Enlace de Hidrógeno , Interacciones Hidrofóbicas e Hidrofílicas , Modelos Moleculares , Conformación Molecular , Estructura Molecular , Relación Estructura-Actividad CuantitativaRESUMEN
Accurately predicting protein-ligand binding affinities is an important problem in computational chemistry since it can substantially accelerate drug discovery for virtual screening and lead optimization. We propose here a fast machine-learning approach for predicting binding affinities using state-of-the-art 3D-convolutional neural networks and compare this approach to other machine-learning and scoring methods using several diverse data sets. The results for the standard PDBbind (v.2016) core test-set are state-of-the-art with a Pearson's correlation coefficient of 0.82 and a RMSE of 1.27 in pK units between experimental and predicted affinity, but accuracy is still very sensitive to the specific protein used. KDEEP is made available via PlayMolecule.org for users to test easily their own protein-ligand complexes, with each prediction taking a fraction of a second. We believe that the speed, performance, and ease of use of KDEEP makes it already an attractive scoring function for modern computational chemistry pipelines.
Asunto(s)
Biología Computacional/métodos , Aprendizaje Profundo , Proteínas/química , Bases de Datos de Proteínas , Descubrimiento de Drogas , Ligandos , Modelos Químicos , Unión Proteica , Relación Estructura-ActividadRESUMEN
ADME (Absorption, Distribution, Metabolism, Excretion) properties are key parameters to judge whether a drug candidate exhibits a desired pharmacokinetic (PK) profile. In this study, we tested multi-task machine learning (ML) models to predict ADME and animal PK endpoints trained on in-house data generated at Boehringer Ingelheim. Models were evaluated both at the design stage of a compound (i. e., no experimental data of test compounds available) and at testing stage when a particular assay would be conducted (i. e., experimental data of earlier conducted assays may be available). Using realistic time-splits, we found a clear benefit in performance of multi-task graph-based neural network models over single-task model, which was even stronger when experimental data of earlier assays is available. In an attempt to explain the success of multi-task models, we found that especially endpoints with the largest numbers of data points (physicochemical endpoints, clearance in microsomes) are responsible for increased predictivity in more complex ADME and PK endpoints. In summary, our study provides insight into how data for multiple ADME/PK endpoints in a pharmaceutical company can be best leveraged to optimize predictivity of ML models.
Asunto(s)
Aprendizaje Automático , Animales , Preparaciones Farmacéuticas/metabolismo , Preparaciones Farmacéuticas/química , Humanos , Farmacocinética , Redes Neurales de la Computación , Modelos BiológicosRESUMEN
Despite the many approaches to study differential splicing from RNA-seq, many challenges remain unsolved, including computing capacity and sequencing depth requirements. Here we present SUPPA2, a new method that addresses these challenges, and enables streamlined analysis across multiple conditions taking into account biological variability. Using experimental and simulated data, we show that SUPPA2 achieves higher accuracy compared to other methods, especially at low sequencing depth and short read length. We use SUPPA2 to identify novel Transformer2-regulated exons, novel microexons induced during differentiation of bipolar neurons, and novel intron retention events during erythroblast differentiation.