Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 32
Filtrar
1.
Mol Pharm ; 19(7): 2151-2163, 2022 07 04.
Artigo em Inglês | MEDLINE | ID: mdl-35671399

RESUMO

Antibacterial drugs (AD) change the metabolic status of bacteria, contributing to bacterial death. However, antibiotic resistance and the emergence of multidrug-resistant bacteria increase interest in understanding metabolic network (MN) mutations and the interaction of AD vs MN. In this study, we employed the IFPTML = Information Fusion (IF) + Perturbation Theory (PT) + Machine Learning (ML) algorithm on a huge dataset from the ChEMBL database, which contains >155,000 AD assays vs >40 MNs of multiple bacteria species. We built a linear discriminant analysis (LDA) and 17 ML models centered on the linear index and based on atoms to predict antibacterial compounds. The IFPTML-LDA model presented the following results for the training subset: specificity (Sp) = 76% out of 70,000 cases, sensitivity (Sn) = 70%, and Accuracy (Acc) = 73%. The same model also presented the following results for the validation subsets: Sp = 76%, Sn = 70%, and Acc = 73.1%. Among the IFPTML nonlinear models, the k nearest neighbors (KNN) showed the best results with Sn = 99.2%, Sp = 95.5%, Acc = 97.4%, and Area Under Receiver Operating Characteristic (AUROC) = 0.998 in training sets. In the validation series, the Random Forest had the best results: Sn = 93.96% and Sp = 87.02% (AUROC = 0.945). The IFPTML linear and nonlinear models regarding the ADs vs MNs have good statistical parameters, and they could contribute toward finding new metabolic mutations in antibiotic resistance and reducing time/costs in antibacterial drug research.


Assuntos
Antibacterianos , Aprendizado de Máquina , Algoritmos , Antibacterianos/farmacologia , Bases de Dados Factuais , Redes e Vias Metabólicas
2.
Int J Mol Sci ; 22(21)2021 Oct 26.
Artigo em Inglês | MEDLINE | ID: mdl-34768951

RESUMO

The theoretical prediction of drug-decorated nanoparticles (DDNPs) has become a very important task in medical applications. For the current paper, Perturbation Theory Machine Learning (PTML) models were built to predict the probability of different pairs of drugs and nanoparticles creating DDNP complexes with anti-glioblastoma activity. PTML models use the perturbations of molecular descriptors of drugs and nanoparticles as inputs in experimental conditions. The raw dataset was obtained by mixing the nanoparticle experimental data with drug assays from the ChEMBL database. Ten types of machine learning methods have been tested. Only 41 features have been selected for 855,129 drug-nanoparticle complexes. The best model was obtained with the Bagging classifier, an ensemble meta-estimator based on 20 decision trees, with an area under the receiver operating characteristic curve (AUROC) of 0.96, and an accuracy of 87% (test subset). This model could be useful for the virtual screening of nanoparticle-drug complexes in glioblastoma. All the calculations can be reproduced with the datasets and python scripts, which are freely available as a GitHub repository from authors.


Assuntos
Antineoplásicos/administração & dosagem , Neoplasias Encefálicas/tratamento farmacológico , Sistemas de Liberação de Medicamentos , Glioblastoma/tratamento farmacológico , Aprendizado de Máquina , Nanopartículas , Bases de Dados de Compostos Químicos , Bases de Dados de Produtos Farmacêuticos , Portadores de Fármacos/administração & dosagem , Desenho de Fármacos , Ensaios de Seleção de Medicamentos Antitumorais , Humanos , Nanopartículas/administração & dosagem , Interface Usuário-Computador
3.
Mol Pharm ; 17(7): 2612-2627, 2020 07 06.
Artigo em Inglês | MEDLINE | ID: mdl-32459098

RESUMO

Nanosystems are gaining momentum in pharmaceutical sciences because of the wide variety of possibilities for designing these systems to have specific functions. Specifically, studies of new cancer cotherapy drug-vitamin release nanosystems (DVRNs) including anticancer compounds and vitamins or vitamin derivatives have revealed encouraging results. However, the number of possible combinations of design and synthesis conditions is remarkably high. In addition, a large number of anticancer and vitamin derivatives have been already assayed, but a notably less number of cases of DVRNs were assayed as a whole (with the anticancer compound and the vitamin linked to them). Our approach combines with the perturbation theory and machine learning (PTML) model to predict the probability of obtaining an interesting DVRN by changing the anticancer compound and/or the vitamin present in a DVRN that is already tested for other anticancer compounds or vitamins that have not been tested yet as part of a DVRN. In a previous work, we built a linear PTML model useful for the design of these nanosystems. In doing so, we used information fusion (IF) techniques to carry out data enrichment of DVRN data compiled from the literature with the data for preclinical assays of vitamins from the ChEMBL database. The design features of DVRNs and the assay conditions of nanoparticles (NPs) and vitamins were included as multiplicative PT operators (PTOs) to the system, which indicates the importance of these variables. However, the previous work omitted experiments with nonlinear ML techniques and different types of PTOs such as metric-based PTOs. More importantly, the previous work does not consider the structure of the anticancer drug to be included in the new DVRNs. In this work, we are going to accomplish three main objectives (tasks). In the first task, we found a new model, alternative to the one published before, for the rational design of DVRNs using metric-based PTOs. The most accurate PTML model was the artificial neural network model, which showed values of specificity, sensitivity, and accuracy in the range of 90-95% in training and external validation series for more than 130,000 cases (DVRNs vs ChEMBL assays). Furthermore, in the second task, we used IF techniques to carry out data enrichment of our previous data set. In doing so, we constructed a new working data set of >970,000 cases with the data of preclinical assays of DVRNs, vitamins, and anticancer compounds from the ChEMBL database. All these assays have multiple continuous variables or descriptors dk and categorical variables cj (conditions of the assays) for drugs (dack, cacj), vitamins (dvk, cvj), and NPs (dnk, cnj). These data include >20,000 potential anticancer compounds with >270 protein targets (cac1), >580 assay cell organisms (cac2), and so forth. Furthermore, we include >36,000 assay vitamin derivatives in >6200 types of cells (c2vit), >120 assay organisms (c3vit), >60 assay strains (c4vit), and so forth. The enriched data set also contains >20 types of DVRNs (c5n) with 9 NP core materials (c4n), 8 synthesis methods (c7n), and so forth. We expressed all this information with PTOs and developed a qualitatively new PTML model that incorporates information of the anticancer drugs. This new model presents 96-97% of accuracy for training and external validation subsets. In the last task, we carried out a comparative study of ML and/or PTML models published and described how the models we are presenting cover the gap of knowledge in terms of drug delivery. In conclusion, we present here for the first time a multipurpose PTML model that is able to select NPs, anticancer compounds, and vitamins and their conditions of assay for DVRN design.


Assuntos
Antineoplásicos/administração & dosagem , Protocolos de Quimioterapia Combinada Antineoplásica/administração & dosagem , Sistemas de Liberação de Medicamentos/métodos , Nanopartículas/química , Neoplasias/tratamento farmacológico , Vitaminas/administração & dosagem , Big Data , Simulação por Computador , Bases de Dados Factuais , Liberação Controlada de Fármacos , Modelos Lineares , Aprendizado de Máquina
4.
Int J Mol Sci ; 21(3)2020 Feb 05.
Artigo em Inglês | MEDLINE | ID: mdl-32033398

RESUMO

Osteosarcoma is the most common subtype of primary bone cancer, affecting mostly adolescents. In recent years, several studies have focused on elucidating the molecular mechanisms of this sarcoma; however, its molecular etiology has still not been determined with precision. Therefore, we applied a consensus strategy with the use of several bioinformatics tools to prioritize genes involved in its pathogenesis. Subsequently, we assessed the physical interactions of the previously selected genes and applied a communality analysis to this protein-protein interaction network. The consensus strategy prioritized a total list of 553 genes. Our enrichment analysis validates several studies that describe the signaling pathways PI3K/AKT and MAPK/ERK as pathogenic. The gene ontology described TP53 as a principal signal transducer that chiefly mediates processes associated with cell cycle and DNA damage response It is interesting to note that the communality analysis clusters several members involved in metastasis events, such as MMP2 and MMP9, and genes associated with DNA repair complexes, like ATM, ATR, CHEK1, and RAD51. In this study, we have identified well-known pathogenic genes for osteosarcoma and prioritized genes that need to be further explored.


Assuntos
Neoplasias Ósseas/genética , Neoplasias Ósseas/patologia , Osteossarcoma/genética , Osteossarcoma/patologia , Biologia Computacional/métodos , Consenso , Reparo do DNA/genética , Regulação Neoplásica da Expressão Gênica/genética , Ontologia Genética , Redes Reguladoras de Genes/genética , Humanos , Mapas de Interação de Proteínas/genética , Transdução de Sinais/genética
5.
J Proteome Res ; 18(7): 2735-2746, 2019 07 05.
Artigo em Inglês | MEDLINE | ID: mdl-31081631

RESUMO

Predicting enzyme function and enzyme subclasses is always a key objective in fields such as biotechnology, biochemistry, medicinal chemistry, physiology, and so on. The Protein Data Bank (PDB) is the largest information archive of biological macromolecular structures, with more than 150 000 entries for proteins, nucleic acids, and complex assemblies. Among these entries, there are more than 4000 proteins whose functions remain unknown because no detectable homology to proteins whose functions are known has been found. The problem is that our ability to isolate proteins and identify their sequences far exceeds our ability to assign them a defined function. As a result, there is a growing interest in this topic, and several methods have been developed to identify protein function based on these innovative approaches. In this work, we have applied perturbation theory to an original data set consisting of 19 187 enzymes representing all 59 subclasses present in the protein data bank. In addition, we developed a series of artificial neural network models able to predict enzyme-enzyme pairs of query-template sequences with accuracy, specificity, and sensitivity greater than 90% in both training and validation series. As a likely application of this methodology and to further validate our approach, we used our novel model to predict a set of enzymes belonging to the yeast Pichia stipites. This yeast has been widely studied because it is commonly present in nature and produces a high ethanol yield by converting lignocellulosic biomass into bioethanol through the xylose reductase enzyme. Using this premise, we tested our model on 222 enzymes including xylose reductase, that is, the enzyme responsible for the conversion of biomass into bioethanol.


Assuntos
Biocombustíveis/microbiologia , Enzimas/classificação , Proteoma/análise , Aldeído Redutase , Etanol/metabolismo , Lignina/metabolismo , Métodos , Modelos Teóricos , Redes Neurais de Computação , Pichia/enzimologia
6.
Chem Res Toxicol ; 32(9): 1811-1823, 2019 09 16.
Artigo em Inglês | MEDLINE | ID: mdl-31327231

RESUMO

ChEMBL biological activities prediction for 1-5-bromofur-2-il-2-bromo-2-nitroethene (G1) is a difficult task for cytokine immunotoxicity. The current study presents experimental results for G1 interaction with mouse Th1/Th2 and pro-inflammatory cytokines using a cytometry bead array (CBA). In the in vitro test of CBA, the results show no significant differences between the mean values of the Th1/Th2 cytokines for the samples treated with G1 with respect to the negative control, but there are moderate differences for cytokine values between different periods (24/48 h). The experiments show no significant differences between the mean values of the pro-inflammatory cytokines for the samples treated with G1, regarding the negative control, except for the values of tumor necrosis factor (TNF) and Interleukin (IL6) between the group treated with G1 and the negative control at 48 h. Differences occur for these cytokines in the periods (24/48 h). The study confirmed that the antimicrobial G1 did not alter the Th1/Th2 cytokines concentration in vitro in different periods, but it can alter TNF and IL6. G1 promotes free radicals production and activates damage processes in macrophages culture. In order to predict all ChEMBL activities for drugs in other experimental conditions, a ChEMBL data set was constructed using 25 biological activities, 1366 assays, 2 assay types, 4 assay organisms, 2 organisms, and 12 cytokine targets. Molecular descriptors calculated with Rcpi and 15 machine learning methods were used to find the best model able to predict if a drug could be active or not against a specific cytokine, in specific experimental conditions. The best model is based on 120 selected molecular descriptors and a deep neural network with area under the curve of the receiver operating characteristic of 0.904 and accuracy of 0.832. This model predicted 1384 G1 biological activities against cytokines in all ChEMBL data set experimental conditions.


Assuntos
Antibacterianos/farmacologia , Antifúngicos/farmacologia , Citocinas/metabolismo , Furanos/farmacologia , Equilíbrio Th1-Th2/efeitos dos fármacos , Animais , Árvores de Decisões , Aprendizado Profundo , Análise Discriminante , Feminino , Camundongos Endogâmicos BALB C , Células Th1/efeitos dos fármacos , Células Th2/efeitos dos fármacos
7.
Mol Pharm ; 16(10): 4200-4212, 2019 10 07.
Artigo em Inglês | MEDLINE | ID: mdl-31426639

RESUMO

Retroviral infections, such as HIV, are, until now, diseases with no cure. Medicine and pharmaceutical chemistry need and consider it a huge goal to define target proteins of new antiretroviral compounds. ChEMBL manages Big Data features with a complex data set, which is hard to organize. This makes information difficult to analyze due to a big number of characteristics described in order to predict new drug candidates for retroviral infections. For this reason, we propose to develop a new predictive model combining perturbation theory (PT) bases and machine learning (ML) modeling to create a new tool that can take advantage of all the available information. The PTML model proposed in this work for the ChEMBL data set preclinical experimental assays for antiretroviral compounds consists of a linear equation with four variables. The PT operators used are founded on multicondition moving averages, combining different features and simplifying the difficulty to manage all data. More than 140 000 preclinical assays for 56 105 compounds with different characteristics or experimental conditions have been carried out and can be found in ChEMBL database, covering combinations with 359 biological activity parameters (c0), 55 protein accessions (c1), 83 cell lines (c2), 64 organisms of assay (c3), and 773 subtypes or strains. We have included 150 148 preclinical experimental assays for HIV virus, 1188 for HTLV virus, 84 for simian immunodeficiency virus, 370 for murine leukemia virus, 119 for Rous sarcoma virus, 1581 for MMTV, etc. We also included 5277 assays for hepatitis B virus. The developed PTML model reached considerable values in sensibility (73.05% for training and 73.10% for validation), specificity (86.61% for training and 87.17% for validation), and accuracy (75.84% for training and 75.98% for validation). We also compared alternative PTML models with different PT operators such as covariance, moments, and exponential terms. Finally, we made a comparison between literature ML models with our PTML model and also artificial neural network (ANN) nonlinear models. We conclude that this PTML model is the first one to consider multiple characteristics of preclinical experimental antiretroviral assays combined, generating a simple, useful, and adaptable instrument, which could reduce time and costs in antiretroviral drugs research.


Assuntos
Antirretrovirais/química , Química Farmacêutica/métodos , Simulação por Computador , Mineração de Dados/métodos , Bases de Dados Factuais , Aprendizado de Máquina , Modelos Teóricos , Humanos , Redes Neurais de Computação
8.
J Chem Inf Model ; 59(3): 1109-1120, 2019 03 25.
Artigo em Inglês | MEDLINE | ID: mdl-30802402

RESUMO

Predicting the activity of new chemical compounds over pathogenic microorganisms with different metabolic reaction networks (MRN s) is an important goal due to the different susceptibility to antibiotics. The ChEMBL database contains >160 000 outcomes of preclinical assays of antimicrobial activity for 55 931 compounds with >365 parameters of activity (MIC, IC50, etc.) and >90 bacteria strains of >25 bacterial species. In addition, the Leong and Barabàsi data set includes >40 MRNs of microorganisms. However, there are no models able to predict antibacterial activity for multiple assays considering both drug and MRN structures at the same time. In this work, we combined perturbation theory, machine learning, and information fusion techniques to develop the first PTMLIF model. The best linear model found presented values of specificity = 90.31/90.40 and sensitivity = 88.14/88.07 in training/validation series. We carried out a comparison to nonlinear artificial neural network (ANN) techniques and previous models from the literature. Next, we illustrated the practical use of the model with an experimental case of study. We reported for the first time the isolation and characterization of terpenes from the plant Cissus incisa. The antibacterial activity of the terpenes was experimentally determined. The more active compounds were phytol and α-amyrin, with MIC = 100 µg/mL for Vancomycin-resistant Enterococcus faecium and Acinetobacter baumannii resistant to carbapenems. These compounds are already known from other sources. However, they have been isolated and evaluated for the first time here against several strains of multidrug-resistant bacteria including World Health Organization (WHO) priority pathogens. Last, we used the model to predict the activity of these compounds versus other microorganisms with different MRNs in order to find other potential targets.


Assuntos
Antibacterianos/farmacologia , Aprendizado de Máquina , Modelos Biológicos , Acinetobacter baumannii/efeitos dos fármacos , Acinetobacter baumannii/metabolismo , Enterococcus faecium/efeitos dos fármacos , Enterococcus faecium/metabolismo , Redes e Vias Metabólicas , Testes de Sensibilidade Microbiana
9.
J Chem Inf Model ; 59(6): 2538-2544, 2019 06 24.
Artigo em Inglês | MEDLINE | ID: mdl-31083984

RESUMO

Quantitative structure-activity relationships (QSAR) modeling is a well-known computational technique with wide applications in fields such as drug design, toxicity predictions, nanomaterials, etc. However, QSAR researchers still face certain problems to develop robust classification-based QSAR models, especially while handling response data pertaining to diverse experimental and/or theoretical conditions. In the present work, we have developed an open source standalone software "QSAR-Co" (available to download at https://sites.google.com/view/qsar-co ) to setup classification-based QSAR models that allow mining the response data coming from multiple conditions. The software comprises two modules: (1) the Model development module and (2) the Screen/Predict module. This user-friendly software provides several functionalities required for developing a robust multitasking or multitarget classification-based QSAR model using linear discriminant analysis or random forest techniques, with appropriate validation, following the principles set by the Organisation for Economic Co-operation and Development (OECD) for applying QSAR models in regulatory assessments.


Assuntos
Descoberta de Drogas , Relação Quantitativa Estrutura-Atividade , Software , Análise Discriminante , Desenho de Fármacos , Descoberta de Drogas/métodos , Humanos
10.
J Proteome Res ; 17(3): 1258-1268, 2018 03 02.
Artigo em Inglês | MEDLINE | ID: mdl-29336158

RESUMO

The spatial distribution of genes in chromosomes seems not to be random. For instance, only 10% of genes are transcribed from bidirectional promoters in humans, and many more are organized into larger clusters. This raises intriguing questions previously asked by different authors. We would like to add a few more questions in this context, related to gene orientation inversions. Does gene orientation (inversion) follow a random pattern? Is it relevant to biological activity somehow? We define a new kind of network coined as the gene orientation inversion network (GOIN). GOIN's complex network encodes short- and long-range patterns of inversion of the orientation of pairs of gene in the chromosome. We selected Plasmodium falciparum as a case of study due to the high relevance of this parasite to public health (causal agent of malaria). We constructed here for the first time all of the GOINs for the genome of this parasite. These networks have an average of 383 nodes (genes in one chromosome) and 1314 links (pairs of gene with inverse orientation). We calculated node centralities and other parameters of these networks. These numerical parameters were used to study different properties of gene inversion patterns, for example, distribution, local communities, similarity to Erdös-Rényi random networks, randomness, and so on. We find clues that seem to indicate that gene orientation inversion does not follow a random pattern. We noted that some gene communities in the GOINs tend to group genes encoding for RIFIN-related proteins in the proteome of the parasite. RIFIN-like proteins are a second family of clonally variant proteins expressed on the surface of red cells infected with Plasmodium falciparum. Consequently, we used these centralities as input of machine learning (ML) models to predict the RIFIN-like activity of 5365 proteins in the proteome of Plasmodium sp. The best linear ML model found discriminates RIFIN-like from other proteins with sensitivity and specificity 70-80% in training and external validation series. All of these results may point to a possible biological relevance of gene orientation inversion not directly dependent on genetic sequence information. This work opens the gate to the use of GOINs as a tool for the study of the structure of chromosomes and the study of protein function in proteome research.


Assuntos
Cromossomos/química , Redes Reguladoras de Genes , Genes de Protozoários , Proteínas de Membrana/genética , Plasmodium falciparum/genética , Proteoma/genética , Proteínas de Protozoários/genética , Inversão de Sequência , Eritrócitos/parasitologia , Regulação da Expressão Gênica , Humanos , Aprendizado de Máquina , Proteínas de Membrana/metabolismo , Família Multigênica , Plasmodium falciparum/metabolismo , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Proteoma/metabolismo , Proteínas de Protozoários/metabolismo , Software
11.
J Chem Inf Model ; 58(12): 2414-2419, 2018 12 24.
Artigo em Inglês | MEDLINE | ID: mdl-30139249

RESUMO

Zeolites are important materials for research and industrial applications. Mesopores are often introduced by desilication but other properties are also affected, making its optimization difficult. In this work, we demonstrate that Perturbation Theory and Machine Learning can be combined in a PTML multioutput model describing the effects of desilication. The PTML model achieves a notable accuracy ( R2 = 0.98) in the external validation and can be useful for the rational design of novel materials.


Assuntos
Aprendizado de Máquina , Silício/química , Zeolitas/química , Simulação por Computador , Modelos Moleculares , Método de Monte Carlo , Propriedades de Superfície
12.
J Proteome Res ; 16(11): 4093-4103, 2017 11 03.
Artigo em Inglês | MEDLINE | ID: mdl-28922600

RESUMO

In this work, we developed a general perturbation theory and machine learning method for data mining of proteomes to discover new B-cell epitopes useful for vaccine design. The method predicts the epitope activity εq(cqj) of one query peptide (q-peptide) under a set of experimental query conditions (cqj). The method uses as input the sequence of the q-peptide. The method also uses as input information about the sequence and epitope activity εr(crj) of a peptide of reference (r-peptide) assayed under similar experimental conditions (crj). The model proposed here is able to classify 1 048 190 pairs of query and reference peptide sequences from the proteome of many organisms reported on IEDB database. These pairs have variations (perturbations) under sequence or assay conditions. The model has accuracy, sensitivity, and specificity between 71 and 80% for training and external validation series. The retrieved information contains structural changes in 83 683 peptides sequences (Seq) determined in experimental assays with boundary conditions involving 1448 epitope organisms (Org), 323 host organisms (Host), 15 types of in vivo process (Proc), 28 experimental techniques (Tech), and 505 adjuvant additives (Adj). Afterward, we reported the experimental sampling, isolation, and sequencing of 15 complete sequences of Bm86 gene from state of Colima, Mexico. Last, we used the model to predict the epitope immunogenic scores under different experimental conditions for the 26 112 peptides obtained from these sequences. The model may become a useful tool for epitope selection toward vaccine design. The theoretical-experimental results on Bm86 protein may help the future design of a new vaccine based on this protein.


Assuntos
Mineração de Dados/métodos , Epitopos de Linfócito B , Glicoproteínas de Membrana/genética , Proteoma/análise , Proteínas Recombinantes/genética , Vacinas/genética , Sequência de Aminoácidos , Animais , Aprendizado de Máquina , México , Modelos Teóricos
13.
Health Inf Sci Syst ; 12(1): 6, 2024 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38125666

RESUMO

Prostate cancer is the most common cancer in men worldwide and has a high mortality rate. The complex and heterogeneous development of prostate cancer has become a core obstacle in the treatment of prostate cancer. Simultaneously, the issues of overtreatment in early-stage diagnosis, oligometastasis and dormant tumor recognition, as well as personalized drug utilization, are also specific concerns that require attention in the clinical management of prostate cancer. Some typical genetic mutations have been proved to be associated with prostate cancer's initiation and progression. However, single-omic studies usually are not able to explain the causal relationship between molecular alterations and clinical phenotypes. Exploration from a systems genetics perspective is also lacking in this field, that is, the impact of gene network, the environmental factors, and even lifestyle behaviors on disease progression. At the meantime, current trend emphasizes the utilization of artificial intelligence (AI) and machine learning techniques to process extensive multidimensional data, including multi-omics. These technologies unveil the potential patterns, correlations, and insights related to diseases, thereby aiding the interpretable clinical decision making and applications, namely intelligent medicine. Therefore, there is a pressing need to integrate multidimensional data for identification of molecular subtypes, prediction of cancer progression and aggressiveness, along with perosonalized treatment performing. In this review, we systematically elaborated the landscape from molecular mechanism discovery of prostate cancer to clinical translational applications. We discussed the molecular profiles and clinical manifestations of prostate cancer heterogeneity, the identification of different states of prostate cancer, as well as corresponding precision medicine practices. Taking multi-omics fusion, systems genetics, and intelligence medicine as the main perspectives, the current research results and knowledge-driven research path of prostate cancer were summarized.

14.
Nanoscale ; 12(25): 13471-13483, 2020 Jul 02.
Artigo em Inglês | MEDLINE | ID: mdl-32613998

RESUMO

Nanoparticles (NPs) decorated with coating agents (polymers, gels, proteins, etc.) form Nanoparticle Drug Delivery Systems (DDNS), which are of high interest in nanotechnology and biomaterials science. There have been increasing reports of experimental data sets of biological activity, toxicity, and delivery properties of DDNS. However, these data sets are still dispersed and not as large as the datasets of DDNS components (NP and drugs). This has prompted researchers to train Machine Learning (ML) algorithms that are able to design new DDNS based on the properties of their components. However, most ML models reported up to date predictions of the specific activities of NP or drugs over a determined target or cell line. In this paper, we combine Perturbation Theory and Machine Learning (PTML algorithm) to train a model that is able to predict the best components (NP, coating agent, and drug) for DDNS design. In so doing, we downloaded a dataset of >30 000 preclinical assays of drugs from ChEMBL. We also downloaded an NP data set formed by preclinical assays of coated Metal Oxide Nanoparticles (MONPs) from public sources. Both the drugs and NP datasets of preclinical assays cover multiple conditions of assays that can be listed as two arrays, namely, cjdrug and cjNP. The cjdrug array includes >504 biological activity parameters (c0drug), >340 target proteins (c1drug), >650 types of cells (c2drug), >120 assay organisms (c3drug), and >60 assay strains (c4drug). On the other hand, the cjNP array includes 3 biological activity parameters (c0NP), 40 types of proteins (c1NP), 10 shapes of nanoparticles (c2NP), 6 assay media (c3NP), and 12 coating agents (c4NP). After downloading, we pre-processed both the data sets by separate calculation PT operators that are able to account for changes (perturbations) in the drug, coating agents, and NP chemical structure and/or physicochemical properties as well as for the assay conditions. Next, we carry out an information fusion process to form a final dataset of above 500 000 DDNS (drug + MONP pairs). We also trained other linear and non-linear PTML models using R studio scripts for comparative purposes. To the best of our knowledge, this is the first multi-label PTML model that is useful for the selection of drugs, coating agents, and metal or metal-oxide nanoparticles to be assembled in order to design new DDNS with optimal activity/toxicity profiles.


Assuntos
Nanopartículas , Preparações Farmacêuticas , Algoritmos , Liberação Controlada de Fármacos , Aprendizado de Máquina
15.
ACS Comb Sci ; 22(3): 129-141, 2020 03 09.
Artigo em Inglês | MEDLINE | ID: mdl-32011854

RESUMO

Determining the biological activity of vitamin derivatives is needed given that organic synthesis of analogs of vitamins is an active field of interest for medicinal chemistry, pharmaceuticals, and food additives. Accordingly, scientists from different disciplines perform preclinical assays (nij) with a considerable combination of assay conditions (cj). Indeed, the ChEMBL platform contains a database that includes results from 36 220 different biological activity bioassays of 21 240 different vitamins and vitamin derivatives. These assays present are heterogeneous in terms of assay combinations of cj. They are focused on >500 different biological activity parameters (c0), >340 different targets (c1), >6200 types of cell (c2), >120 organisms of assay (c3), and >60 assay strains (c4). It includes a total of >1850 niacin assays, >1580 tretinoin assays, >1580 retinol assays, 857 ascorbic acid assays, etc. Given the complexity of this combinatorial data in terms of being assimilated by researchers, we propose to build a model by combining perturbation theory (PT) and machine learning (ML). Through this study, we propose a PTML (PT + ML) combinatorial model for ChEMBL results on biological activity of vitamins and vitamins derivatives. The linear discriminant analysis (LDA) model presented the following results for training subset a: specificity (%) = 90.38, sensitivity (%) = 87.51, and accuracy (%) = 89.89. The model showed the following results for the external validation subset: specificity (%) = 90.58, sensitivity (%) = 87.72, and accuracy (%) = 90.09. Different types of linear and nonlinear PTML models, such as logistic regression (LR), classification tree (CT), näive Bayes (NB), and random Forest (RF), were applied to contrast the capacity of prediction. The PTML-LDA model predicts with more accuracy by applying combinatorial descriptors. In addition, a PCA experiment with chemical structure descriptors allowed us to characterize the high structural diversity of the chemical space studied. In any case, PTML models using chemical structure descriptors do not improve the performance of the PTML-LDA model based on ALOGP and PSA. We can conclude that the three variable PTML-LDA model is a simplified and adaptable tool for the prediction, for different experiment combinations, the biological activity of derivative vitamins.


Assuntos
Teorema de Bayes , Técnicas de Química Combinatória , Aprendizado de Máquina , Modelos Estatísticos , Vitaminas/química , Bases de Dados Factuais , Estrutura Molecular , Vitaminas/síntese química
16.
ACS Omega ; 5(42): 27211-27220, 2020 Oct 27.
Artigo em Inglês | MEDLINE | ID: mdl-33134682

RESUMO

Sarcomas are a group of malignant neoplasms of connective tissue with a different etiology than carcinomas. The efforts to discover new drugs with antisarcoma activity have generated large datasets of multiple preclinical assays with different experimental conditions. For instance, the ChEMBL database contains outcomes of 37,919 different antisarcoma assays with 34,955 different chemical compounds. Furthermore, the experimental conditions reported in this dataset include 157 types of biological activity parameters, 36 drug targets, 43 cell lines, and 17 assay organisms. Considering this information, we propose combining perturbation theory (PT) principles with machine learning (ML) to develop a PTML model to predict antisarcoma compounds. PTML models use one function of reference that measures the probability of a drug being active under certain conditions (protein, cell line, organism, etc.). In this paper, we used a linear discriminant analysis and neural network to train and compare PT and non-PT models. All the explored models have an accuracy of 89.19-95.25% for training and 89.22-95.46% in validation sets. PTML-based strategies have similar accuracy but generate simplest models. Therefore, they may become a versatile tool for predicting antisarcoma compounds.

17.
Curr Top Med Chem ; 20(25): 2326-2337, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32938352

RESUMO

By combining Machine Learning (ML) methods with Perturbation Theory (PT), it is possible to develop predictive models for a variety of response targets. Such combination often known as Perturbation Theory Machine Learning (PTML) modeling comprises a set of techniques that can handle various physical, and chemical properties of different organisms, complex biological or material systems under multiple input conditions. In so doing, these techniques effectively integrate a manifold of diverse chemical and biological data into a single computational framework that can then be applied for screening lead chemicals as well as to find clues for improving the targeted response(s). PTML models have thus been extremely helpful in drug or material design efforts and found to be predictive and applicable across a broad space of systems. After a brief outline of the applied methodology, this work reviews the different uses of PTML in Medicinal Chemistry, as well as in other applications. Finally, we cover the development of software available nowadays for setting up PTML models from large datasets.


Assuntos
Bases de Dados de Compostos Químicos , Aprendizado de Máquina , Software , Química Farmacêutica , Modelos Moleculares
18.
Biology (Basel) ; 9(8)2020 Jul 30.
Artigo em Inglês | MEDLINE | ID: mdl-32751710

RESUMO

Drug-decorated nanoparticles (DDNPs) have important medical applications. The current work combined Perturbation Theory with Machine Learning and Information Fusion (PTMLIF). Thus, PTMLIF models were proposed to predict the probability of nanoparticle-compound/drug complexes having antimalarial activity (against Plasmodium). The aim is to save experimental resources and time by using a virtual screening for DDNPs. The raw data was obtained by the fusion of experimental data for nanoparticles with compound chemical assays from the ChEMBL database. The inputs for the eight Machine Learning classifiers were transformed features of drugs/compounds and nanoparticles as perturbations of molecular descriptors in specific experimental conditions (experiment-centered features). The resulting dataset contains 107 input features and 249,992 examples. The best classification model was provided by Random Forest, with 27 selected features of drugs/compounds and nanoparticles in all experimental conditions considered. The high performance of the model was demonstrated by the mean Area Under the Receiver Operating Characteristics (AUC) in a test subset with a value of 0.9921 ± 0.000244 (10-fold cross-validation). The results demonstrated the power of information fusion of the experimental-centered features of drugs/compounds and nanoparticles for the prediction of nanoparticle-compound antimalarial activity. The scripts and dataset for this project are available in the open GitHub repository.

19.
Curr Top Med Chem ; 20(4): 305-317, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-31878856

RESUMO

AIMS: Cheminformatics models are able to predict different outputs (activity, property, chemical reactivity) in single molecules or complex molecular systems (catalyzed organic synthesis, metabolic reactions, nanoparticles, etc.). BACKGROUND: Cheminformatics models are able to predict different outputs (activity, property, chemical reactivity) in single molecules or complex molecular systems (catalyzed organic synthesis, metabolic reactions, nanoparticles, etc.). OBJECTIVE: Cheminformatics prediction of complex catalytic enantioselective reactions is a major goal in organic synthesis research and chemical industry. Markov Chain Molecular Descriptors (MCDs) have been largely used to solve Cheminformatics problems. There are different types of Markov chain descriptors such as Markov-Shannon entropies (Shk), Markov Means (Mk), Markov Moments (πk), etc. However, there are other possible MCDs that have not been used before. In addition, the calculation of MCDs is done very often using specific software not always available for general users and there is not an R library public available for the calculation of MCDs. This fact, limits the availability of MCMDbased Cheminformatics procedures. METHODS: We studied the enantiomeric excess ee(%)[Rcat] for 324 α-amidoalkylation reactions. These reactions have a complex mechanism depending on various factors. The model includes MCDs of the substrate, solvent, chiral catalyst, product along with values of time of reaction, temperature, load of catalyst, etc. We tested several Machine Learning regression algorithms. The Random Forest regression model has R2 > 0.90 in training and test. Secondly, the biological activity of 5644 compounds against colorectal cancer was studied. RESULTS: We developed very interesting model able to predict with Specificity and Sensitivity 70-82% the cases of preclinical assays in both training and validation series. CONCLUSION: The work shows the potential of the new tool for computational studies in organic and medicinal chemistry.


Assuntos
Quimioinformática , Química Farmacêutica , Cadeias de Markov , Algoritmos , Humanos , Aprendizado de Máquina
20.
Nanoscale ; 11(45): 21811-21823, 2019 Nov 21.
Artigo em Inglês | MEDLINE | ID: mdl-31691701

RESUMO

Nano-systems for cancer co-therapy including vitamins or vitamin derivatives have showed adequate results to continue with further research studies to better understand them. However, the number of different combinations of drugs, vitamins, nanoparticle types, coating agents, synthesis conditions, and system types (nanocapsules, micelles, etc.) to be tested is very large generating a high cost in experimentations. In this context, there are reports of large datasets of preclinical assays of compounds (like in the ChEMBL database) and increasing but yet limited reports of experimental measurements of nano-systems per se. On the other hand, Machine Learning is gaining momentum in Nanotechnology and Pharmaceutical Sciences as a tool for rational design of new drugs and drug-release nano-systems. In this work, we propose to combine Perturbation Theory principles and Machine Learning to develop a PTML model for rational selection of the components of cancer co-therapy drug-vitamin release nano-systems (DVRNs). In doing so, we apply information fusion techniques with 2 data sets: (1) a large ChEMBL dataset of >36 000 preclinical assays of vitamin derivatives and a new dataset of >1000 outcomes of DVRNs, collected herein from the literature for the first time. The ChEMBL dataset used covers a considerable number of assay conditions (cjvit) each one with multiple levels. These conditions included >504 biological activity parameters (c0vit), >340 types of proteins (c1vit), >650 types of cells (c2vit), >120 assay organisms (c3vit), >60 assay strains (c4vit). Regarding the DVRNs, there are 25 different types of nano-systems (njn), with up to 16 conditions (cjn) including also different levels such as 8 biological activity parameters (c0n), 9 raw nanomaterials (c4n), 15 assay cells (c11n), etc. In the first stage, we used Moving Average operators to quantify the perturbations (deviations) in all input variables with respect to the conditions. After that, we used multiplicative PT operators to carry out data fusion, and dimension reduction, and Linear Discriminant Analysis (LDA) to seek the PTML model. The best PTML model found showed values of specificity, sensitivity, and accuracy in the range of 83-88% in training and external validation series for >130 000 cases (DVRNs vs. ChEMBL data pairs) formed after data fusion. To the best of our knowledge, this is the first general purpose model for the rational design of DVRNs for cancer co-therapy.


Assuntos
Sistemas de Liberação de Medicamentos , Aprendizado de Máquina , Modelos Biológicos , Nanopartículas , Neoplasias , Vitaminas , Humanos , Micelas , Nanopartículas/química , Nanopartículas/uso terapêutico , Neoplasias/tratamento farmacológico , Neoplasias/metabolismo , Neoplasias/patologia , Vitaminas/química , Vitaminas/farmacocinética , Vitaminas/farmacologia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA