RESUMEN
Zymomonas mobilis is an ethanologenic microbe that has a demonstrated potential for use in lignocellulosic biorefineries for bioethanol production. Z. mobilis exhibits a number of desirable characteristics for use as an ethanologenic microbe, with capabilities for metabolic engineering and bioprocess modification. Many advanced genetic tools, including mutation techniques, screening methods and genome editing have been successively performed to improve various Z. mobilis strains as potential consolidated ethanologenic microbes. Many bioprocess strategies have also been applied to this organism for bioethanol production. Z. mobilis biofilm reactors have been modified with various benefits, including high bacterial populations, less fermentation times, high productivity, high cell stability, resistance to the high concentration of substrates and toxicity, and higher product recovery. We suggest that Z. mobilis biofilm reactors could be used in bioethanol production using lignocellulosic substrates under batch, continuous and repeated batch processes.
Asunto(s)
Etanol/metabolismo , Zymomonas/genética , Biopelículas , Reactores Biológicos/microbiología , Fermentación , Edición Génica , Lignina/metabolismo , Ingeniería Metabólica , Microorganismos Modificados Genéticamente/genética , Mutagénesis , Nitrógeno/metabolismo , Zymomonas/metabolismoRESUMEN
Protein kinases are an important class of enzymes that play an essential role in virtually all major disease areas. In addition, they account for approximately 50% of the current targets pursued in drug discovery research. In this work, we explore the generation of structure-based quantum mechanical (QM) quantitative structure-activity relationship models (QSAR) as a means to facilitate structure-guided optimization of protein kinase inhibitors. We explore whether more accurate, interpretable QSAR models can be generated for a series of 76 N-phenylquinazolin-4-amine inhibitors of epidermal growth factor receptor (EGFR) kinase by comparing and contrasting them to other standard QSAR methodologies. The QM-based method involved molecular docking of inhibitors followed by their QM optimization within a ~ 300 atom cluster model of the EGFR active site at the M062X/6-31G(d,p) level. Pairwise computations of the interaction energies with each active site residue were performed. QSAR models were generated by splitting the datasets 75:25 into a training and test set followed by modelling using partial least squares (PLS). Additional QSAR models were generated using alignment dependent CoMFA and CoMSIA methods as well as alignment independent physicochemical, e-state indices and fingerprint descriptors. The structure-based QM-QSAR model displayed good performance on the training and test sets (r2 ~ 0.7) and was demonstrably more predictive than the QSAR models built using other methods. The descriptor coefficients from the QM-QSAR models allowed for a detailed rationalization of the active site SAR, which has implications for subsequent design iterations.
Asunto(s)
Inhibidores de Proteínas Quinasas/química , Proteínas Quinasas/ultraestructura , Relación Estructura-Actividad Cuantitativa , Dominio Catalítico , Receptores ErbB/antagonistas & inhibidores , Receptores ErbB/química , Receptores ErbB/ultraestructura , Humanos , Simulación del Acoplamiento Molecular , Estructura Molecular , Proteínas Quinasas/química , Teoría CuánticaRESUMEN
Janus kinase 2 (JAK2) inhibitors represent a promising therapeutic class of anticancer agents against many myeloproliferative disorders. Bioactivity data on pIC 50 of 2229 JAK2 inhibitors were employed in the construction of quantitative structure-activity relationship (QSAR) models. The models were built from 100 data splits using decision tree (DT), support vector machine (SVM), deep neural network (DNN) and random forest (RF). The predictive power of RF models were assessed via 10-fold cross validation, which afforded excellent predictive performance with R 2 and RMSE of 0.74 ± 0.05 and 0.63 ± 0.05, respectively. Moreover, test set has excellent performance of R 2 (0.75 ± 0.03) and RMSE (0.62 ± 0.04). In addition, Y-scrambling was utilized to evaluate the possibility of chance correlation of the predictive model. A thorough analysis of the substructure fingerprint count was conducted to provide insights on the inhibitory properties of JAK2 inhibitors. Molecular cluster analysis revealed that pyrazine scaffolds have nanomolar potency against JAK2.
Asunto(s)
Inhibidores Enzimáticos/química , Janus Quinasa 2/química , Minería de Datos , Relación Estructura-Actividad CuantitativaRESUMEN
Green fluorescent protein (GFP) has immense utility in biomedical imaging owing to its autofluorescent nature. In efforts to broaden the spectral diversity of GFP, there have been several reports of engineered mutants via rational design and random mutagenesis. Understanding the origins of spectral properties of GFP could be achieved by means of investigating its structure-activity relationship. The first quantitative structure-property relationship study for modeling the spectral properties, particularly the excitation and emission maximas, of GFP was previously proposed by us some years ago in which quantum chemical descriptors were used for model development. However, such simplified model does not consider possible effects that neighboring amino acids have on the conjugated π-system of GFP chromophore. This study describes the development of a unified proteochemometric model in which the GFP chromophore and amino acids in its vicinity are both considered in the same model. The predictive performance of the model was verified by internal and external validation as well as Y-scrambling. Our strategy provides a general solution for elucidating the contribution that specific ligand and protein descriptors have on the investigated spectral property, which may be useful in engineering novel GFP variants with desired characteristics.
Asunto(s)
Aminoácidos/química , Proteínas Fluorescentes Verdes/química , Modelos Moleculares , Aminoácidos/análisis , Biología Computacional , Proteínas Fluorescentes Verdes/genética , Ligandos , Ingeniería de Proteínas , Relación Estructura-ActividadRESUMEN
INTRODUCTION: Small-molecule Programmable Cell Death Protein 1/Programmable Death-Ligand 1 (PD1/PDL1) inhibition via PDL1 dimerization has the potential to lead to inexpensive drugs with better cancer patient outcomes and milder side effects. However, this therapeutic approach has proven challenging, with only one PDL1 dimerizer reaching early clinical trials so far. There is hence a need for fast and accurate methods to develop alternative PDL1 dimerizers. OBJECTIVES: We aim to show that structure-based virtual screening (SBVS) based on PDL1-specific machine-learning (ML) scoring functions (SFs) is a powerful drug design tool for detecting PD1/PDL1 inhibitors via PDL1 dimerization. METHODS: By incorporating the latest MLSF advances, we generated and evaluated PDL1-specific MLSFs (classifiers and inactive-enriched regressors) on two demanding test sets. RESULTS: 60 PDL1-specific MLSFs (30 classifiers and 30 regressors) were generated. Our large-scale analysis provides highly predictive PDL1-specific MLSFs that benefitted from training with large volumes of docked inactives and enabling inactive-enriched regression. CONCLUSION: PDL1-specific MLSFs strongly outperformed generic SFs of various types on this target and are released here without restrictions.
RESUMEN
Structure-based virtual screening (SBVS) via docking has been used to discover active molecules for a range of therapeutic targets. Chemical and protein data sets that contain integrated bioactivity information have increased both in number and in size. Artificial intelligence and, more concretely, its machine-learning (ML) branch, including deep learning, have effectively exploited these data sets to build scoring functions (SFs) for SBVS against targets with an atomic-resolution 3D model (e.g., generated by X-ray crystallography or predicted by AlphaFold2). Often outperforming their generic and non-ML counterparts, target-specific ML-based SFs represent the state of the art for SBVS. Here, we present a comprehensive and user-friendly protocol to build and rigorously evaluate these new SFs for SBVS. This protocol is organized into four sections: (i) using a public benchmark of a given target to evaluate an existing generic SF; (ii) preparing experimental data for a target from public repositories; (iii) partitioning data into a training set and a test set for subsequent target-specific ML modeling; and (iv) generating and evaluating target-specific ML SFs by using the prepared training-test partitions. All necessary code and input/output data related to three example targets (acetylcholinesterase, HMG-CoA reductase, and peroxisome proliferator-activated receptor-α) are available at https://github.com/vktrannguyen/MLSF-protocol , can be run by using a single computer within 1 week and make use of easily accessible software/programs (e.g., Smina, CNN-Score, RF-Score-VS and DeepCoy) and web resources. Our aim is to provide practical guidance on how to augment training data to enhance SBVS performance, how to identify the most suitable supervised learning algorithm for a data set, and how to build an SF with the highest likelihood of discovering target-active molecules within a given compound library.
Asunto(s)
Acetilcolinesterasa , Inteligencia Artificial , Ligandos , Aprendizaje Automático , Algoritmos , Simulación del Acoplamiento MolecularRESUMEN
The interaction between PD1 and its ligand PDL1 has been shown to render tumor cells resistant to apoptosis and promote tumor progression. An innovative mechanism to inhibit the PD1/PDL1 interaction is PDL1 dimerization induced by small-molecule PDL1 binders. Structure-based virtual screening is a promising approach to discovering such small-molecule PD1/PDL1 inhibitors. Here we investigate which type of generic scoring functions is most suitable to tackle this problem. We consider CNN-Score, an ensemble of convolutional neural networks, as the representative of machine-learning scoring functions. We also evaluate Smina, a commonly used classical scoring function, and IFP, a top structural fingerprint similarity scoring function. These three types of scoring functions were evaluated on two test sets sharing the same set of small-molecule PD1/PDL1 inhibitors, but using different types of inactives: either true inactives (molecules with no in vitro PD1/PDL1 inhibition activity) or assumed inactives (property-matched decoy molecules generated from each active). On both test sets, CNN-Score performed much better than Smina, which in turn strongly outperformed IFP. The fact that the latter was the case, despite precluding any possibility of exploiting decoy bias, demonstrates the predictive value of CNN-Score for PDL1. These results suggest that re-scoring Smina-docked molecules with CNN-Score is a promising structure-based virtual screening method to discover new small-molecule inhibitors of this therapeutic target.
RESUMEN
BACKGROUND: Despite continued efforts to develop new treatments, there is an urgent need to discover new drug leads to treat tumors exhibiting primary or secondary resistance to existing drugs. Cell cultures derived from patient-derived orthotopic xenografts are promising pre-clinical models to better predict drug response in cancer recurrence. OBJECTIVE: The aim of the study was to investigate the relationship between the physiochemical properties of drugs and their in vitro potency as well as identifying chemical scaffolds biasedtowards selectivity or promiscuity of such drugs. METHODS: The bioactivities of 158 drugs screened against cell cultures derived from 30 cancer orthotopic patient-derived xenograft (O-PDX) models were considered. Drugs were represented by physicochemical descriptors and chemical structure fingerprints. Supervised learning was employed to model the relationship between features and in vitro potency. RESULTS: Drugs with in vitro potency for alveolar rhabdomyosarcoma and osteosarcoma tend to have a higher number of rings, two carbon-hetero bonds and halogens. Selective and promiscuous scaffolds for these phenotypic targets were identified. Highly-predictive models of in vitro potency were obtained across these 30 targets, which can be applied to unseen molecules via a webserver (https://rnewbie.shinyapps.io/Shobek-master). CONCLUSION: It is possible to identify privileged chemical scaffolds and predict the in vitro potency of unseen molecules across these 30 targets This information and models should be helpful to select which molecules to screen against these primary cultures of pediatric solid tumors.
Asunto(s)
Recurrencia Local de Neoplasia , Preparaciones Farmacéuticas , Animales , Modelos Animales de Enfermedad , HumanosRESUMEN
The reproducibility of experiments has been a long standing impediment for further scientific progress. Computational methods have been instrumental in drug discovery efforts owing to its multifaceted utilization for data collection, pre-processing, analysis and inference. This article provides an in-depth coverage on the reproducibility of computational drug discovery. This review explores the following topics: (1) the current state-of-the-art on reproducible research, (2) research documentation (e.g. electronic laboratory notebook, Jupyter notebook, etc.), (3) science of reproducible research (i.e. comparison and contrast with related concepts as replicability, reusability and reliability), (4) model development in computational drug discovery, (5) computational issues on model development and deployment, (6) use case scenarios for streamlining the computational drug discovery protocol. In computational disciplines, it has become common practice to share data and programming codes used for numerical calculations as to not only facilitate reproducibility, but also to foster collaborations (i.e. to drive the project further by introducing new ideas, growing the data, augmenting the code, etc.). It is therefore inevitable that the field of computational drug design would adopt an open approach towards the collection, curation and sharing of data/code.
RESUMEN
Volume of distribution (Vdss ) is a measure of how effectively a drug molecule is distributed throughout the body. Along with the clearance, it determines the half-life and therefore the drug dosing interval. A number of different pre-clinical approaches are available to predict the Vdss in human including quantitative structure activity relationship (QSAR) models. Vdss QSAR models have been reported for human and rat, but not important pre-clinical species including dog, mouse and monkey. In this study, we have generated Vdss QSAR model on the human and commonly used pre-clinical species, each of which differs in terms of size, chemical diversity and data quality. We discuss the model performance by species, assess the effect the domain of applicability and the relative merits of building chemical series-specific models. In addition, we compare the intrinsic variability of the experimental logVdss data (â¼1.2 fold error) to in-vivo interspecies differences (â¼2 fold error) and in silico based models (â¼3 fold error). This prompted us to explore whether one species could be used to predict another, particularly where little data for that species is available. i. e. does the expansion in domain of applicability prove beneficial over and above any deterioration due to the use of response values from an alternative species.
Asunto(s)
Modelos Biológicos , Preparaciones Farmacéuticas/química , Relación Estructura-Actividad Cuantitativa , Animales , Perros , Haplorrinos , Humanos , Ratones , RatasRESUMEN
Immune therapy is generally seen as the future of cancer treatment. The discovery of tumor-associated antigens and cytotoxic T lymphocyte epitope peptides spurned intensive research into effective peptide-based cancer vaccines. One of the major obstacles hindering the development of peptide-based cancer vaccines is the lack of humoral response induction. As of now, very limited work has been performed to identify epitope peptides capable of inducing both cellular and humoral anticancer responses. In addition, no research has been carried out to analyze the structure and properties of peptides responsible for such immunological activities. This study utilizes a machine learning method together with interpretable descriptors in an attempt to identify parameters determining the immunotherapeutic activity of cancer epitope peptides.
Asunto(s)
Linfocitos B/inmunología , Biología Computacional , Epítopos de Linfocito B/inmunología , Epítopos de Linfocito T/inmunología , Neoplasias/inmunología , Péptidos/inmunología , Linfocitos T/inmunología , Linfocitos B/metabolismo , Biología Computacional/métodos , Epítopos de Linfocito B/química , Epítopos de Linfocito T/química , Humanos , Inmunoterapia , Neoplasias/terapia , Péptidos/química , Péptidos/uso terapéutico , Linfocitos T/metabolismo , Flujo de TrabajoRESUMEN
BACKGROUND: Currently, monomeric fluorescent proteins (FP) are ideal markers for protein tagging. The prediction of oligomeric states is helpful for enhancing live biomedical imaging. Computational prediction of FP oligomeric states can accelerate the effort of protein engineering efforts of creating monomeric FPs. To the best of our knowledge, this study represents the first computational model for predicting and analyzing FP oligomerization directly from the amino acid sequence. RESULTS: After data curation, an exhaustive data set consisting of 397 non-redundant FP oligomeric states was compiled from the literature. Results from benchmarking of the protein descriptors revealed that the model built with amino acid composition descriptors was the top performing model with accuracy, sensitivity and specificity in excess of 80% and MCC greater than 0.6 for all three data subsets (e.g. training, tenfold cross-validation and external sets). The model provided insights on the important residues governing the oligomerization of FP. To maximize the benefit of the generated predictive model, it was implemented as a web server under the R programming environment. CONCLUSION: osFP affords a user-friendly interface that can be used to predict the oligomeric state of FP using the protein sequence. The advantage of osFP is that it is platform-independent meaning that it can be accessed via a web browser on any operating system and device. osFP is freely accessible at http://codes.bio/osfp/ while the source code and data set is provided on GitHub at https://github.com/chaninn/osFP/.Graphical Abstract.
RESUMEN
Alzheimer's disease (AD) is a chronic neurodegenerative disease which leads to the gradual loss of neuronal cells. Several hypotheses for AD exists (e.g., cholinergic, amyloid, tau hypotheses, etc.). As per the cholinergic hypothesis, the deficiency of choline is responsible for AD; therefore, the inhibition of AChE is a lucrative therapeutic strategy for the treatment of AD. Acetylcholinesterase (AChE) is an enzyme that catalyzes the breakdown of the neurotransmitter acetylcholine that is essential for cognition and memory. A large non-redundant data set of 2,570 compounds with reported IC50 values against AChE was obtained from ChEMBL and employed in quantitative structure-activity relationship (QSAR) study so as to gain insights on their origin of bioactivity. AChE inhibitors were described by a set of 12 fingerprint descriptors and predictive models were constructed from 100 different data splits using random forest. Generated models afforded R (2), [Formula: see text] and [Formula: see text] values in ranges of 0.66-0.93, 0.55-0.79 and 0.56-0.81 for the training set, 10-fold cross-validated set and external set, respectively. The best model built using the substructure count was selected according to the OECD guidelines and it afforded R (2), [Formula: see text] and [Formula: see text] values of 0.92 ± 0.01, 0.78 ± 0.06 and 0.78 ± 0.05, respectively. Furthermore, Y-scrambling was applied to evaluate the possibility of chance correlation of the predictive model. Subsequently, a thorough analysis of the substructure fingerprint count was conducted to provide informative insights on the inhibitory activity of AChE inhibitors. Moreover, Kennard-Stone sampling of the actives were applied to select 30 diverse compounds for further molecular docking studies in order to gain structural insights on the origin of AChE inhibition. Site-moiety mapping of compounds from the diversity set revealed three binding anchors encompassing both hydrogen bonding and van der Waals interaction. Molecular docking revealed that compounds 13, 5 and 28 exhibited the lowest binding energies of -12.2, -12.0 and -12.0 kcal/mol, respectively, against human AChE, which is modulated by hydrogen bonding, π-π stacking and hydrophobic interaction inside the binding pocket. These information may be used as guidelines for the design of novel and robust AChE inhibitors.
RESUMEN
Aromatase, the rate-limiting enzyme that catalyzes the conversion of androgen to estrogen, plays an essential role in the development of estrogen-dependent breast cancer. Side effects due to aromatase inhibitors (AIs) necessitate the pursuit of novel inhibitor candidates with high selectivity, lower toxicity and increased potency. Designing a novel therapeutic agent against aromatase could be achieved computationally by means of ligand-based and structure-based methods. For over a decade, we have utilized both approaches to design potential AIs for which quantitative structure-activity relationships and molecular docking were used to explore inhibitory mechanisms of AIs towards aromatase. However, such approaches do not consider the effects that aromatase variants have on different AIs. In this study, proteochemometrics modeling was applied to analyze the interaction space between AIs and aromatase variants as a function of their substructural and amino acid features. Good predictive performance was achieved, as rigorously verified by 10-fold cross-validation, external validation, leave-one-compound-out cross-validation, leave-one-protein-out cross-validation and Y-scrambling tests. The investigations presented herein provide important insights into the mechanisms of aromatase inhibitory activity that could aid in the design of novel potent AIs as breast cancer therapeutic agents.
RESUMEN
Natural products have been an integral part of sustaining civilizations because of their medicinal properties. Past discoveries of bioactive natural products have relied on serendipity, and these compounds serve as inspiration for the generation of analogs with desired physicochemical properties. Bioactive natural products with therapeutic potential are abundantly available in nature and some of them are beyond exploration by conventional methods. The effectiveness of computational approaches as versatile tools for facilitating drug discovery and development has been recognized for decades, without exception, in the case of natural products. In the post-genomic era, scientists are bombarded with data produced by advanced technologies. Thus, rendering these data into knowledge that is interpretable and meaningful becomes an essential issue. In this regard, computational approaches utilize the existing data to generate knowledge that provides valuable understanding for addressing current problems and guiding the further research and development of new natural-derived drugs. Furthermore, several medicinal plants have been continuously used in many traditional medicine systems since antiquity throughout the world, and their mechanisms have not yet been elucidated. Therefore, the utilization of computational approaches and advanced synthetic techniques would yield great benefit to improving the world's health population and well-being.
Asunto(s)
Productos Biológicos/síntesis química , Diseño Asistido por Computadora , Diseño de Fármacos , Productos Biológicos/química , Modelos Moleculares , Relación Estructura-Actividad CuantitativaRESUMEN
UNLABELLED: In biology and chemistry, a key goal is to discover novel compounds affording potent biological activity or chemical properties. This could be achieved through a chemical intuition-driven trial-and-error process or via data-driven predictive modeling. The latter is based on the concept of quantitative structure-activity/property relationship (QSAR/QSPR) when applied in modeling the biological activity and chemical properties, respectively, of compounds. Data mining is a powerful technology underlying QSAR/QSPR as it harnesses knowledge from large volumes of high-dimensional data via multivariate analysis. Although extremely useful, the technicalities of data mining may overwhelm potential users, especially those in the life sciences. Herein, we aim to lower the barriers to access and utilization of data mining software for QSAR/QSPR studies. AutoWeka is an automated data mining software tool that is powered by the widely used machine learning package Weka. The software provides a user-friendly graphical interface along with an automated parameter search capability. It employs two robust and popular machine learning methods: artificial neural networks and support vector machines. This chapter describes the practical usage of AutoWeka and relevant tools in the development of predictive QSAR/QSPR models. AVAILABILITY: The software is freely available at http://www.mt.mahidol.ac.th/autoweka.