Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 32
Filtrar
2.
Bioorg Med Chem ; 38: 116119, 2021 05 15.
Artículo en Inglés | MEDLINE | ID: mdl-33831697

RESUMEN

In response to the pandemic caused by SARS-CoV-2, we constructed a hybrid support vector machine (SVM) classification model using a set of publicly posted SARS-CoV-2 pseudotyped particle (PP) entry assay repurposing screen data to identify novel potent compounds as a starting point for drug development to treat COVID-19 patients. Two different molecular descriptor systems, atom typing descriptors and 3D fingerprints (FPs), were employed to construct the SVM classification models. Both models achieved reasonable performance, with the area under the curve of receiver operating characteristic (AUC-ROC) of 0.84 and 0.82, respectively. The consensus prediction outperformed the two individual models with significantly improved AUC-ROC of 0.91, where the compounds with inconsistent classifications were excluded. The consensus model was then used to screen the 173,898 compounds in the NCATS annotated and diverse chemical libraries. Of the 255 compounds selected for experimental confirmation, 116 compounds exhibited inhibitory activities in the SARS-CoV-2 PP entry assay with IC50 values ranged between 0.17 µM and 62.2 µM, representing an enrichment factor of 3.2. These 116 active compounds with diverse and novel structures could potentially serve as starting points for chemistry optimization for COVID-19 drug discovery.


Asunto(s)
Antivirales/farmacología , SARS-CoV-2/efectos de los fármacos , Máquina de Vectores de Soporte/estadística & datos numéricos , Internalización del Virus/efectos de los fármacos , Área Bajo la Curva , Bases de Datos de Compuestos Químicos/estadística & datos numéricos , Reposicionamiento de Medicamentos , Células HEK293 , Humanos , Pruebas de Sensibilidad Microbiana , Curva ROC , Bibliotecas de Moléculas Pequeñas/farmacología
3.
J Med Chem ; 64(12): 8208-8220, 2021 06 24.
Artículo en Inglés | MEDLINE | ID: mdl-33770434

RESUMEN

Epigenetic targets are of significant importance in drug discovery research, as demonstrated by the eight approved epigenetic drugs for treatment of cancer and the increasing availability of chemogenomic data related to epigenetics. This data represents many structure-activity relationships that have not been exploited thus far to develop predictive models to support medicinal chemistry efforts. Herein, we report the first large-scale study of 26 318 compounds with a quantitative measure of biological activity for 55 protein targets with epigenetic activity. We built predictive models with high accuracy for small molecules' epigenetic target profiling through a systematic comparison of the machine learning models trained on different molecular fingerprints. The models were thoroughly validated, showing mean precisions of up to 0.952 for the epigenetic target prediction task. Our results indicate that the models reported herein have considerable potential to identify small molecules with epigenetic activity. Therefore, our results were implemented as a freely accessible web application.


Asunto(s)
Descubrimiento de Drogas/métodos , Epigenómica/métodos , Aprendizaje Automático , Compuestos Orgánicos/química , Bases de Datos de Compuestos Químicos/estadística & datos numéricos , Histona Desacetilasas/metabolismo , Estructura Molecular , Compuestos Orgánicos/metabolismo , Prueba de Estudio Conceptual , Relación Estructura-Actividad , Factores de Transcripción/metabolismo
4.
Bioorg Med Chem Lett ; 40: 127930, 2021 05 15.
Artículo en Inglés | MEDLINE | ID: mdl-33711441

RESUMEN

Delivery of compounds to the brain is critical for the development of effective treatment therapies of multiple central nervous system diseases. Recently a novel insect-based brain uptake model was published utilizing a locust brain ex vivo system. The goal of our study was to develop a priori, in silico cheminformatic models to describe brain uptake in this insect model, as well as evaluate the predictive ability. The machine learning program Orange® was used to evaluate several machine learning (ML) models on a published data set of 25 known drugs, with in vitro data generated by a single laboratory group to reduce inherent inter-laboratory variability. The ML models included in this study were linear regression (LR), support vector machines (SVN), k-nearest neighbor (kNN) and neural nets (NN). The quantitative structure-property relationship models were able to correlate experimental logCtot (concentration of compound in brain) and predicted brain uptake of r2 > 0.5, with the descriptors log(P*MW-0.5) and hydrogen bond donor used in LR, SVN and KNN, while log(P*MW-0.5) and total polar surface area (TPSA) descriptors used in the NN models. Our results indicate that the locust insect model is amenable to data mining chemoinformatics and in silico model development in CNS drug discovery pipelines.


Asunto(s)
Encéfalo/metabolismo , Fármacos del Sistema Nervioso Central/metabolismo , Animales , Fármacos del Sistema Nervioso Central/química , Quimioinformática , Bases de Datos de Compuestos Químicos/estadística & datos numéricos , Saltamontes/metabolismo , Modelos Lineales , Modelos Biológicos , Redes Neurales de la Computación , Máquina de Vectores de Soporte
5.
J Med Chem ; 63(23): 15013-15020, 2020 12 10.
Artículo en Inglés | MEDLINE | ID: mdl-33253557

RESUMEN

While bioisosteric replacements have been extensively investigated, comprehensive analyses of R-/functional groups have thus far been rare in medicinal chemistry. We introduce a new analysis concept for the exploration of chemical substituent space that is based upon bioactive analogue series as a source. From ∼24,000 analogue series, more than 19,000 substituents were isolated that were differently distributed. A subset of ∼400 substituent fragments occurred most frequently in different structural contexts. These substituents contained well-known R-groups as well as novel structures. Substitution site-specific replacement and network analysis revealed that chemically similar substituents preferentially occurred at given sites and identified intuitive substitution pathways that can be explored for compound design. Taken together, the results of our analysis provide new insights into substituent space and identify preferred substituents on the basis of analogue series. As a part of our study, all the data reported are made freely available.


Asunto(s)
Compuestos Orgánicos/química , Preparaciones Farmacéuticas/química , Algoritmos , Química Farmacéutica/métodos , Bases de Datos de Compuestos Químicos/estadística & datos numéricos , Estructura Molecular
6.
Phys Chem Chem Phys ; 22(41): 23766-23772, 2020 Nov 07.
Artículo en Inglés | MEDLINE | ID: mdl-33063077

RESUMEN

Deep learning based methods have been widely applied to predict various kinds of molecular properties in the pharmaceutical industry with increasingly more success. In this study, we propose two novel models for aqueous solubility predictions, based on the Multilevel Graph Convolutional Network (MGCN) and SchNet architectures, respectively. The advantage of the MGCN lies in the fact that it could extract the graph features of the target molecules directly from the (3D) structural information; therefore, it doesn't need to rely on a lot of intra-molecular descriptors to learn the features, which are of significance for accurate predictions of the molecular properties. The SchNet performs well in modelling the interatomic interactions inside a molecule, and such a deep learning architecture is also capable of extracting structural information and further predicting the related properties. The actual accuracy of these two novel approaches was systematically benchmarked with four different independent datasets. We found that both the MGCN and SchNet models performed well for aqueous solubility predictions. In the future, we believe such promising predictive models will be applicable to enhancing the efficiency of the screening, crystallization and delivery of drug molecules, essentially as a useful tool to promote the development of molecular pharmaceutics.


Asunto(s)
Aprendizaje Profundo , Preparaciones Farmacéuticas/química , Agua/química , Bases de Datos de Compuestos Químicos/estadística & datos numéricos , Conjuntos de Datos como Asunto/estadística & datos numéricos , Solubilidad
7.
Comput Biol Chem ; 89: 107398, 2020 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-33059132

RESUMEN

Theileria annulata secretes peptidyl prolyl isomerase enzyme (TaPIN1) to manipulate the host cell oncogenic signaling pathway by disrupting the tumor suppressor F-box and WD repeat domain-containing 7 (FBW7) protein level leading to an increased level of c-Jun proto-oncogene. Buparvaquone is a hydroxynaphthoquinone anti-theilerial drug and has been used to treat theileriosis. However, TaPIN1 contains the A53 P mutation that causes drug resistance. In this study, potential TaPIN1 inhibitors were investigated using a library of naphthoquinone derivatives. Comparative models of mutant (m) and wild type (wt) TaPIN1 were predicted and energy minimization was followed by structure validation. A naphthoquinone (hydroxynaphthalene-1,2-dione, hydroxynaphthalene-1,4-dione) and hydroxynaphthalene-2,3-dione library was screened by Schrödinger Glide HTVS, SP and XP docking methodologies and the docked compounds were ranked by the Glide XP scoring function. The two highest ranked docked compounds Compound 1 (4-hydroxy-3-[3,4,5-trihydroxy-6-(hydroxymethyl)oxan-2-yl]oxynaphthalene-1,2-dione) and Compound 2 (6-acetyl-1,4,5,7,8-pentahydroxynaphthalene-2,3-dione) were used for further molecular dynamics (MD) simulation studies. The MD results showed that ligand Compound 1 was located in the active site of both mTaPIN1 and wtTaPIN1 and could be proposed as a potential inhibitor by acting as a substrate antagonist. However, ligand Compound 2 was displaced away from the binding pocket of wtTaPIN1 but was located near the active site binding pocket of mTaPIN1 suggesting that could be selectively evaluated as a potential inhibitor against the mTaPIN1. Compound 1 and Compound 2 ligands are potential inhibitors but Compound 2 is suggested as a better inhibitor for mTaPIN1. These ligands could also further evaluated as potential inhibitors against human peptidyl prolyl isomerase which causes cancer in humans by using the same mechanism as TaPIN1.


Asunto(s)
Inhibidores Enzimáticos/química , Naftoquinonas/química , Isomerasa de Peptidilprolil/antagonistas & inhibidores , Proteínas Protozoarias/antagonistas & inhibidores , Theileria annulata/enzimología , Dominio Catalítico , Bases de Datos de Compuestos Químicos/estadística & datos numéricos , Inhibidores Enzimáticos/metabolismo , Ensayos Analíticos de Alto Rendimiento , Ligandos , Simulación del Acoplamiento Molecular , Simulación de Dinámica Molecular , Mutación , Naftoquinonas/metabolismo , Isomerasa de Peptidilprolil/química , Isomerasa de Peptidilprolil/genética , Isomerasa de Peptidilprolil/metabolismo , Unión Proteica , Proto-Oncogenes Mas , Proteínas Protozoarias/química , Proteínas Protozoarias/genética , Proteínas Protozoarias/metabolismo
8.
Clin Chem ; 66(9): 1210-1218, 2020 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-32870990

RESUMEN

BACKGROUND: Plasma amino acid (PAA) profiles are used in routine clinical practice for the diagnosis and monitoring of inherited disorders of amino acid metabolism, organic acidemias, and urea cycle defects. Interpretation of PAA profiles is complex and requires substantial training and expertise to perform. Given previous demonstrations of the ability of machine learning (ML) algorithms to interpret complex clinical biochemistry data, we sought to determine if ML-derived classifiers could interpret PAA profiles with high predictive performance. METHODS: We collected PAA profiling data routinely performed within a clinical biochemistry laboratory (2084 profiles) and developed decision support classifiers with several ML algorithms. We tested the generalization performance of each classifier using a nested cross-validation (CV) procedure and examined the effect of various subsampling, feature selection, and ensemble learning strategies. RESULTS: The classifiers demonstrated excellent predictive performance, with the 3 ML algorithms tested producing comparable results. The best-performing ensemble binary classifier achieved a mean precision-recall (PR) AUC of 0.957 (95% CI 0.952, 0.962) and the best-performing ensemble multiclass classifier achieved a mean F4 score of 0.788 (0.773, 0.803). CONCLUSIONS: This work builds upon previous demonstrations of the utility of ML-derived decision support tools in clinical biochemistry laboratories. Our findings suggest that, pending additional validation studies, such tools could potentially be used in routine clinical practice to streamline and aid the interpretation of PAA profiles. This would be particularly useful in laboratories with limited resources and large workloads. We provide the necessary code for other laboratories to develop their own decision support tools.


Asunto(s)
Aminoácidos/sangre , Aprendizaje Automático , Bases de Datos de Compuestos Químicos/estadística & datos numéricos , Humanos
9.
Comput Biol Chem ; 89: 107375, 2020 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-32980746

RESUMEN

Seasonal and pandemic influenza infections are serious threats to public health and the global economy. Since antigenic drift reduces the effectiveness of conventional therapies against the virus, herbal medicine has been proposed as an alternative. Fritillaria thunbergii (FT) have been traditionally used to treat airway inflammatory diseases such as coughs, bronchitis, pneumonia, and fever-based illnesses. Herein, we used a network pharmacology-based strategy to predict potential compounds from Fritillaria thunbergii (FT), target genes, and cellular pathways to better combat influenza and influenza-associated diseases. We identified five compounds, and 47 target genes using a compound-target network (C-T). Two compounds (beta-sitosterol and pelargonidin) and nine target genes (BCL2, CASP3, HSP90AA1, ICAM1, JUN, NOS2, PPARG, PTGS1, PTGS2) were identified using a compound-influenza disease target network (C-D). Protein-protein interaction (PPI) network was constructed and we identified eight proteins from nine target genes formed a network. The compound-disease-pathway network (C-D-P) revealed three classes of pathways linked to influenza: cancer, viral diseases, and inflammation. Taken together, our systems biology data from C-T, C-D, PPI and C-D-P networks predicted potent compounds from FT and new therapeutic targets and pathways involved in influenza.


Asunto(s)
Antivirales/química , Fritillaria/química , Orthomyxoviridae/efectos de los fármacos , Antocianinas/química , Antocianinas/farmacocinética , Antivirales/farmacocinética , Bases de Datos de Compuestos Químicos/estadística & datos numéricos , Bases de Datos Genéticas/estadística & datos numéricos , Humanos , Farmacología/métodos , Mapas de Interacción de Proteínas , Sitoesteroles/química , Sitoesteroles/farmacocinética , Biología de Sistemas/métodos
10.
Anal Chem ; 92(16): 10996-11006, 2020 08 18.
Artículo en Inglés | MEDLINE | ID: mdl-32686928

RESUMEN

An automatic approach to identification of natural products (NPid) in complex extracts by exploring pure shift HSQC (psHSQC) and H2BC spectra of the mixture is developed, which integrated information on chemical shifts (CS), adjacent relationships (AR) and peak intensities (PI) of 1H-13C groups for identification of candidate natural product in a customized NMR database. A weighted comprehensive score is calculated for each candidate from the values of CS, AR and PI to rate the likelihood of its existence in the complex mixture. Using the crude extract of crabapple (Malus fusca) as an example, a customized NMR database of natural products from plants of the genus Malus was constructed. The performance of NPid was first evaluated using simulated data in four scenarios, that is, for identification of structurally similar natural products, identification of natural products with part of peaks missing in psHSQC due to low concentration, without available adjacent relationship information, or without useful peak intensity information. The false positive and false negative rates of the natural products identified by NPid were estimated by Monte Carlo simulation. It shows that AR and PI can effectively reduce the false positive rate of identification. Proof of concept of the proposed method was elucidated on a model mixture consisting of 10 known natural products. Application of this method was then demonstrated on an authentic sample of crude extract of crabapple and 19 known natural products were successfully identified and confirmed by standard spiking.


Asunto(s)
Productos Biológicos/análisis , Extractos Vegetales/análisis , Algoritmos , Productos Biológicos/química , Bases de Datos de Compuestos Químicos/estadística & datos numéricos , Espectroscopía de Resonancia Magnética/estadística & datos numéricos , Malus/química , Estructura Molecular , Extractos Vegetales/química , Prueba de Estudio Conceptual
11.
J Comput Aided Mol Des ; 34(7): 805-815, 2020 07.
Artículo en Inglés | MEDLINE | ID: mdl-31407224

RESUMEN

Generative topographic mapping was used to investigate the possibility to diversify the in-house compounds collection of Boehringer Ingelheim (BI). For this purpose, a 2D map covering the relevant chemical space was trained, and the BI compound library was compared to the Aldrich-Market Select (AMS) database of more than 8M purchasable compounds. In order to discover new (sub)structures, the "AutoZoom" tool was developed and applied in order to analyze chemotypes of molecules residing in heavily populated zones of a map and to extract the corresponding maximum common substructures. A set of 401K new structures from the AMS database was retrieved and checked for drug-likeness and biological activity.


Asunto(s)
Descubrimiento de Drogas/métodos , Bibliotecas de Moléculas Pequeñas , Algoritmos , Diseño Asistido por Computadora/estadística & datos numéricos , Bases de Datos de Compuestos Químicos/estadística & datos numéricos , Bases de Datos Farmacéuticas/estadística & datos numéricos , Diseño de Fármacos , Desarrollo de Medicamentos/estadística & datos numéricos , Descubrimiento de Drogas/estadística & datos numéricos , Humanos , Estructura Molecular , Programas Informáticos , Interfaz Usuario-Computador
12.
J Phys Chem B ; 124(3): 470-478, 2020 01 23.
Artículo en Inglés | MEDLINE | ID: mdl-31829591

RESUMEN

Inspired by methods that utilize chemical-mapping data to guide secondary structure prediction, we sought to develop a framework for using assigned chemical shift data to guide ribonucleic acid (RNA) secondary structure prediction. We first used machine learning to develop classifiers that predict the base-pairing status of individual residues in an RNA based on their assigned chemical shifts. Then, we used these base-pairing status predictions as restraints to guide RNA folding algorithms. Our results showed that we could recover the correct secondary fold of most of the 108 RNAs in our data set with remarkable accuracy. Finally, we tested whether we could use the base-pairing status predictions that we obtained from assigned chemical shift data to conditionally predict the secondary structure of RNA. To achieve this, we attempted to model two distinct conformational states of the microRNA-20b and the fluoride riboswitch using assigned chemical shifts that were available for both conformational states of each of these test RNAs. For both test cases, we found that by using the base-pairing status predictions that we obtained from assigned chemical shift data as folding restraints, we could generate structures that closely resembled the known structure of the two distinct states. A command-line tool for chemical shifts to base-pairing status predictions in RNA has been incorporated into our CS2Structure Git repository and can be accessed via https://github.com/atfrank/CS2Structure .


Asunto(s)
Conformación de Ácido Nucleico , ARN/química , Algoritmos , Emparejamiento Base , Bases de Datos de Compuestos Químicos/estadística & datos numéricos , Aprendizaje Automático , Redes Neurales de la Computación , Resonancia Magnética Nuclear Biomolecular
13.
J Comput Aided Mol Des ; 34(7): 769-782, 2020 07.
Artículo en Inglés | MEDLINE | ID: mdl-31677002

RESUMEN

We present a Focused Library Generator that is able to create from scratch new molecules with desired properties. After training the Generator on the ChEMBL database, transfer learning was used to switch the generator to producing new Mdmx inhibitors that are a promising class of anticancer drugs. Lilly medicinal chemistry filters, molecular docking, and a QSAR IC50 model were used to refine the output of the Generator. Pharmacophore screening and molecular dynamics (MD) simulations were then used to further select putative ligands. Finally, we identified five promising hits with equivalent or even better predicted binding free energies and IC50 values than known Mdmx inhibitors. The source code of the project is available on https://github.com/bigchem/online-chem.


Asunto(s)
Proteínas de Ciclo Celular/antagonistas & inhibidores , Diseño de Fármacos , Proteínas Proto-Oncogénicas/antagonistas & inhibidores , Bibliotecas de Moléculas Pequeñas , Antineoplásicos/química , Antineoplásicos/farmacología , Sitios de Unión , Proteínas de Ciclo Celular/química , Diseño Asistido por Computadora/estadística & datos numéricos , Bases de Datos de Compuestos Químicos/estadística & datos numéricos , Bases de Datos Farmacéuticas , Descubrimiento de Drogas/métodos , Descubrimiento de Drogas/estadística & datos numéricos , Humanos , Ligandos , Simulación del Acoplamiento Molecular , Simulación de Dinámica Molecular , Redes Neurales de la Computación , Unión Proteica , Proteínas Proto-Oncogénicas/química , Relación Estructura-Actividad Cuantitativa
14.
Food Chem Toxicol ; 135: 110921, 2020 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-31669597

RESUMEN

Determining chemical carcinogenicity in the early stages of drug discovery is fundamentally important to prevent the adverse effect of carcinogens on human health. There has been a recent surge of interest in developing computational approaches to predict chemical carcinogenicity. However, the predictive power of many existing approaches is limited, and there is plenty of room for improvement. Here, we develop a new deep learning architecture, termed CapsCarcino, to distinguish between carcinogens and noncarcinogens. CapsCarcino is constructed based on a dynamic routing algorithm that requires less data, extracts more comprehensive information, and does not require feature selection. We find that CapsCarcino provides a significantly improved predictive and generalization ability over, and outperforms five other machine learning models. Specifically, the best model of CapsCarcino achieves an accuracy of 85.0% on an external validation dataset. In addition, we discover that the enhanced predictive capability of CapsCarcino over that of the other methods is robust and can be achieved using sparse datasets. Training on merely 20% of the dataset, CapsCarcino performs comparably to the other methods based on the full training dataset. Further mechanism analysis indicates that CapsCarcino could efficiently learn the characteristics of carcinogens even if structural alerts are insufficiently represented. The results indicate that CapsCarcino should be helpful for carcinogen risk assessment.


Asunto(s)
Carcinógenos/química , Aprendizaje Profundo , Animales , Bases de Datos de Compuestos Químicos/estadística & datos numéricos , Ratas
15.
Comput Biol Chem ; 80: 79-89, 2019 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-30928871

RESUMEN

The current study was set to discover selective Plasmodium falciparum phosphatidylinositol-4-OH kinase type III beta (pfPI4KB) inhibitors as potential antimalarial agents using combined structure-based and ligand-based drug discovery approach. A comparative model of pfPI4KB was first constructed and validated using molecular docking techniques. Performance of Autodock4.2 and Vina4 software in predicting the inhibitor-PI4KB binding mode and energy was assessed based on two Test Sets: Test Set I contained five ligands with resolved crystal structures with PI4KB, while Test Set II considered eleven compounds with known IC50 value towards PI4KB. The outperformance of Autodock as compared to Vina was reported, giving a correlation coefficient (R2) value of 0.87 and 0.90 for Test Set I and Test Set II, respectively. Pharmacophore-based screening was then conducted to identify drug-like molecules from ZINC database with physicochemical similarity to two potent pfPI4KB inhibitors -namely cpa and cpb. For each query inhibitor, the best 1000 hits in terms of TanimotoCombo scores were selected and subjected to molecular docking and molecular dynamics (MD) calculations. Binding energy was then estimated using molecular mechanics-generalized Born surface area (MM-GBSA) approach over 50 ns MD simulations of the inhibitor-pfPI4KB complexes. According to the calculated MM-GBSA binding energies, ZINC78988474 and ZINC20564116 were identified as potent pfPI4KB inhibitors with binding energies better than those of cpa and cpb, with ΔGbinding ≥ -34.56 kcal/mol. The inhibitor-pfPI4KB interaction and stability were examined over 50 ns MD simulation; as well the selectivity of the identified inhibitors towards pfPI4KB over PI4KB was reported.


Asunto(s)
1-Fosfatidilinositol 4-Quinasa/antagonistas & inhibidores , 1-Fosfatidilinositol 4-Quinasa/metabolismo , Antimaláricos/metabolismo , Plasmodium falciparum/enzimología , Inhibidores de Proteínas Quinasas/metabolismo , 1-Fosfatidilinositol 4-Quinasa/química , Secuencia de Aminoácidos , Antimaláricos/química , Dominio Catalítico , Bases de Datos de Compuestos Químicos/estadística & datos numéricos , Descubrimiento de Drogas , Ligandos , Simulación del Acoplamiento Molecular , Simulación de Dinámica Molecular , Estructura Molecular , Unión Proteica , Inhibidores de Proteínas Quinasas/química , Alineación de Secuencia
16.
Comput Biol Chem ; 80: 90-101, 2019 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-30939415

RESUMEN

BACKGROUND: Traditional methods for drug discovery are time-consuming and expensive, so efforts are being made to repurpose existing drugs. To find new ways for drug repurposing, many computational approaches have been proposed to predict drug-target interactions (DTIs). However, due to the high-dimensional nature of the data sets extracted from drugs and targets, traditional machine learning approaches, such as logistic regression analysis, cannot analyze these data sets efficiently. To overcome this issue, we propose LASSO (Least absolute shrinkage and selection operator)-based regularized linear classification models and a LASSO-DNN (Deep Neural Network) model based on LASSO feature selection to predict DTIs. These methods are demonstrated for repurposing drugs for breast cancer treatment. METHODS: We collected drug descriptors, protein sequence data from Drugbank and protein domain information from NCBI. Validated DTIs were downloaded from Drugbank. A new similarity-based approach was developed to build the negative DTIs. We proposed multiple LASSO models to integrate different combinations of feature sets to explore the prediction power and predict DTIs. Furthermore, building on the features extracted from the LASSO models with the best performance, we also introduced a LASSO-DNN model to predict DTIs. The performance of our newly proposed DNN model (LASSO-DNN) was compared with the LASSO, standard logistic (SLG) regression, support vector machine (SVM), and standard DNN models. RESULTS: Experimental results showed that the LASSO-DNN over performed the SLG, LASSO, SVM and standard DNN models. In particular, the LASSO models with protein tripeptide composition (TC) features and domain features were superior to those that contained other protein information, which may imply that TC and domain information could be better representations of proteins. Furthermore, we showed that the top ranked DTIs predicted using the LASSO-DNN model can potentially be used for repurposing existing drugs for breast cancer based on risk gene information. CONCLUSIONS: In summary, we demonstrated that the efficient representations of drug and target features are key for building learning models for predicting DTIs. The disease-associated risk genes identified from large-scale genomic studies are the potential drug targets, which can be used for drug repurposing.


Asunto(s)
Antineoplásicos/metabolismo , Aprendizaje Profundo , Modelos Químicos , Proteínas/metabolismo , Secuencia de Aminoácidos , Antineoplásicos/química , Neoplasias de la Mama/genética , Biología Computacional/métodos , Bases de Datos de Compuestos Químicos/estadística & datos numéricos , Bases de Datos de Proteínas/estadística & datos numéricos , Reposicionamiento de Medicamentos , Genes Relacionados con las Neoplasias/efectos de los fármacos , Estructura Molecular , Unión Proteica , Dominios Proteicos , Proteínas/química , Máquina de Vectores de Soporte
17.
Anal Chem ; 90(21): 12752-12760, 2018 11 06.
Artículo en Inglés | MEDLINE | ID: mdl-30350614

RESUMEN

Liquid chromatography coupled with electrospray ionization tandem mass spectrometry (LC-ESI-MS/MS) is a major analytical technique used for nontargeted identification of metabolites in biological fluids. Typically, in LC-ESI-MS/MS based database assisted structure elucidation pipelines, the exact mass of an unknown compound is used to mine a chemical structure database to acquire an initial set of possible candidates. Subsequent matching of the collision induced dissociation (CID) spectrum of the unknown to the CID spectra of candidate structures facilitates identification. However, this approach often fails because of the large numbers of potential candidates (i.e., false positives) for which CID spectra are not available. To overcome this problem, CID fragmentation predication programs have been developed, but these also have limited success if large numbers of isomers with similar CID spectra are present in the candidate set. In this study, we investigated the use of a retention index (RI) predictive model as an orthogonal method to help improve identification rates. The model was used to eliminate candidate structures whose predicted RI values differed significantly from the experimentally determined RI value of the unknown compound. We tested this approach using a set of ninety-one endogenous metabolites and four in silico CID fragmentation algorithms: CFM-ID, CSI:FingerID, Mass Frontier, and MetFrag. Candidate sets obtained from PubChem and the Human Metabolite Database (HMDB) were ranked with and without RI filtering followed by in silico spectral matching. Upon RI filtering, 12 of the ninety-one metabolites were eliminated from their respective candidate sets, i.e., were scored incorrectly as negatives. For the remaining seventy-nine compounds, we show that RI filtering eliminated an average of 58% from PubChem candidate sets. This resulted in an approximately 2-fold improvement in average rankings when using CFM-ID, Mass Frontier, and MetFrag. In addition, RI filtering slightly increased the occurrence of number one rankings for all 4 fragmentation algorithms. However, RI filtering did not significantly improve average rankings when HMDB was used as the candidate database, nor did it significantly improve average rankings when using CSI:FingerID. Overall, we show that the current RI model incorrectly eliminated more true positives (12) than were expected (4-5) on the basis of the filtering method. However, it slightly improved the number of correct first place rankings and improved overall average rankings when using CFM-ID, Mass Frontier, and MetFrag.


Asunto(s)
Bases de Datos de Compuestos Químicos/estadística & datos numéricos , Metabolómica/métodos , Modelos Químicos , Redes Neurales de la Computación , Algoritmos , Cromatografía Liquida , Simulación por Computador , Estructura Molecular , Espectrometría de Masa por Ionización de Electrospray
18.
ChemMedChem ; 13(20): 2189-2201, 2018 10 22.
Artículo en Inglés | MEDLINE | ID: mdl-30110511

RESUMEN

The blood-brain barrier (BBB) as a part of absorption protects the central nervous system by separating the brain tissue from the bloodstream. In recent years, BBB permeability has become a critical issue in chemical ADMET prediction, but almost all models were built using imbalanced data sets, which caused a high false-positive rate. Therefore, we tried to solve the problem of biased data sets and built a reliable classification model with 2358 compounds. Machine learning and resampling methods were used simultaneously for the refinement of models with both 2 D molecular descriptors and molecular fingerprints to represent the chemicals. Through a series of evaluation, we realized that resampling methods such as Synthetic Minority Oversampling Technique (SMOTE) and SMOTE+edited nearest neighbor could effectively solve the problem of imbalanced data sets and that MACCS fingerprint combined with support vector machine performed the best. After the final construction of a consensus model, the overall accuracy rate was increased to 0.966 for the final external data set. Also, the accuracy rate of the model for the test set was 0.919, with an excellent balanced capacity of 0.925 (sensitivity) to predict BBB-positive compounds and of 0.899 (specificity) to predict BBB-negative compounds. Compared with other BBB classification models, our models reduced the rate of false positives and were more robust in prediction of BBB-positive as well as BBB-negative compounds, which would be quite helpful in early drug discovery.


Asunto(s)
Barrera Hematoencefálica/metabolismo , Simulación por Computador , Bases de Datos de Compuestos Químicos/estadística & datos numéricos , Compuestos Orgánicos/farmacocinética , Máquina de Vectores de Soporte , Algoritmos , Modelos Químicos , Compuestos Orgánicos/química , Permeabilidad
19.
SAR QSAR Environ Res ; 29(9): 661-674, 2018 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-30160175

RESUMEN

Prediction performance often depends on the cross- and test validation protocols applied. Several combinations of different cross-validation variants and model-building techniques were used to reveal their complexity. Two case studies (acute toxicity data) were examined, applying five-fold cross-validation (with random, contiguous and Venetian blind forms) and leave-one-out cross-validation (CV). External test sets showed the effects and differences between the validation protocols. The models were generated with multiple linear regression (MLR), principal component regression (PCR), partial least squares (PLS) regression, artificial neural networks (ANN) and support vector machines (SVM). The comparisons were made by the sum of ranking differences (SRD) and factorial analysis of variance (ANOVA). The largest bias and variance could be assigned to the MLR method and contiguous block cross-validation. SRD can provide a unique and unambiguous ranking of methods and CV variants. Venetian blind cross-validation is a promising tool. The generated models were also compared based on their basic performance parameters (r2 and Q2). MLR produced the largest gap, while PCR gave the smallest. Although PCR is the best validated and balanced technique, SVM always outperformed the other methods, when experimental values were the benchmark. Variable selection was advantageous, and the modelling had a larger influence than CV variants.


Asunto(s)
Descubrimiento de Drogas/métodos , Modelos Moleculares , Relación Estructura-Actividad Cuantitativa , Análisis de Varianza , Bases de Datos de Compuestos Químicos/estadística & datos numéricos , Pruebas de Toxicidad/estadística & datos numéricos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA