Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 44
Filtrar
1.
J Cheminform ; 16(1): 35, 2024 Mar 25.
Artículo en Inglés | MEDLINE | ID: mdl-38528548

RESUMEN

Natural products are a diverse class of compounds with promising biological properties, such as high potency and excellent selectivity. However, they have different structural motifs than typical drug-like compounds, e.g., a wider range of molecular weight, multiple stereocenters and higher fraction of sp3-hybridized carbons. This makes the encoding of natural products via molecular fingerprints difficult, thus restricting their use in cheminformatics studies. To tackle this issue, we explored over 30 years of research to systematically evaluate which molecular fingerprint provides the best performance on the natural product chemical space. We considered 20 molecular fingerprints from four different sources, which we then benchmarked on over 100,000 unique natural products from the COCONUT (COlleCtion of Open Natural prodUcTs) and CMNPD (Comprehensive Marine Natural Products Database) databases. Our analysis focused on the correlation between different fingerprints and their classification performance on 12 bioactivity prediction datasets. Our results show that different encodings can provide fundamentally different views of the natural product chemical space, leading to substantial differences in pairwise similarity and performance. While Extended Connectivity Fingerprints are the de-facto option to encoding drug-like compounds, other fingerprints resulted to match or outperform them for bioactivity prediction of natural products. These results highlight the need to evaluate multiple fingerprinting algorithms for optimal performance and suggest new areas of research. Finally, we provide an open-source Python package for computing all molecular fingerprints considered in the study, as well as data and scripts necessary to reproduce the results, at https://github.com/dahvida/NP_Fingerprints .

2.
Food Res Int ; 171: 113036, 2023 09.
Artículo en Inglés | MEDLINE | ID: mdl-37330849

RESUMEN

The capacity to discriminate safe from dangerous compounds has played an important role in the evolution of species, including human beings. Highly evolved senses such as taste receptors allow humans to navigate and survive in the environment through information that arrives to the brain through electrical pulses. Specifically, taste receptors provide multiple bits of information about the substances that are introduced orally. These substances could be pleasant or not according to the taste responses that they trigger. Tastes have been classified into basic (sweet, bitter, umami, sour and salty) or non-basic (astringent, chilling, cooling, heating, pungent), while some compounds are considered as multitastes, taste modifiers or tasteless. Classification-based machine learning approaches are useful tools to develop predictive mathematical relationships in such a way as to predict the taste class of new molecules based on their chemical structure. This work reviews the history of multicriteria quantitative structure-taste relationship modelling, starting from the first ligand-based (LB) classifier proposed in 1980 by Lemont B. Kier and concluding with the most recent studies published in 2022.


Asunto(s)
Papilas Gustativas , Gusto , Humanos , Gusto/fisiología , Percepción del Gusto
3.
Molecules ; 27(18)2022 Sep 08.
Artículo en Inglés | MEDLINE | ID: mdl-36144564

RESUMEN

Mass spectrometry (MS) is widely used for the identification of chemical compounds by matching the experimentally acquired mass spectrum against a database of reference spectra. However, this approach suffers from a limited coverage of the existing databases causing a failure in the identification of a compound not present in the database. Among the computational approaches for mining metabolite structures based on MS data, one option is to predict molecular fingerprints from the mass spectra by means of chemometric strategies and then use them to screen compound libraries. This can be carried out by calibrating multi-task artificial neural networks from large datasets of mass spectra, used as inputs, and molecular fingerprints as outputs. In this study, we prepared a large LC-MS/MS dataset from an on-line open repository. These data were used to train and evaluate deep-learning-based approaches to predict molecular fingerprints and retrieve the structure of unknown compounds from their LC-MS/MS spectra. Effects of data sparseness and the impact of different strategies of data curing and dimensionality reduction on the output accuracy have been evaluated. Moreover, extensive diagnostics have been carried out to evaluate modelling advantages and drawbacks as a function of the explored chemical space.


Asunto(s)
Redes Neurales de la Computación , Espectrometría de Masas en Tándem , Cromatografía Liquida/métodos , Bases de Datos Factuales , Espectrometría de Masas en Tándem/métodos
4.
Molecules ; 26(23)2021 Nov 30.
Artículo en Inglés | MEDLINE | ID: mdl-34885837

RESUMEN

Neural networks are rapidly gaining popularity in chemical modeling and Quantitative Structure-Activity Relationship (QSAR) thanks to their ability to handle multitask problems. However, outcomes of neural networks depend on the tuning of several hyperparameters, whose small variations can often strongly affect their performance. Hence, optimization is a fundamental step in training neural networks although, in many cases, it can be very expensive from a computational point of view. In this study, we compared four of the most widely used approaches for tuning hyperparameters, namely, grid search, random search, tree-structured Parzen estimator, and genetic algorithms on three multitask QSAR datasets. We mainly focused on parsimonious optimization and thus not only on the performance of neural networks, but also the computational time that was taken into account. Furthermore, since the optimization approaches do not directly provide information about the influence of hyperparameters, we applied experimental design strategies to determine their effects on the neural network performance. We found that genetic algorithms, tree-structured Parzen estimator, and random search require on average 0.08% of the hours required by grid search; in addition, tree-structured Parzen estimator and genetic algorithms provide better results than random search.

5.
Food Chem ; 315: 126248, 2020 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-32018076

RESUMEN

Chianti is a precious red wine and enjoys a high reputation for its high quality in the world wine market. Despite this, the production region is small and product needs efficient tools to protect its brands and prevent adulterations. In this sense, ICP-MS combined with chemometrics has demonstrated its usefulness in food authentication. In this study, Chianti/Chianti Classico, authentic wines from vineyard of Toscana region (Italy), together samples from 18 different geographical regions, were analyzed with the objective of differentiate them from other Italian wines. Partial Least Squares-Discriminant Analysis (PLS-DA) identified variables to discriminate wine geographical origin. Rare Earth Elements (REE), major and trace elements all contributed to the discrimination of Chianti samples. General model was not suited to distinguish PDO red wines from samples, with similar chemical fingerprints, collected in some regions. Specific classification models enhanced the capability of discrimination, emphasizing the discriminant role of some elements.


Asunto(s)
Análisis de los Alimentos/métodos , Espectrometría de Masas/métodos , Vino/análisis , Análisis Discriminante , Análisis de los Alimentos/estadística & datos numéricos , Italia , Análisis de los Mínimos Cuadrados , Límite de Detección , Espectrometría de Masas/estadística & datos numéricos , Metales de Tierras Raras/análisis , Oligoelementos/análisis
6.
Mol Inform ; 38(1-2): e1800029, 2019 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-30142701

RESUMEN

Quantitative Structure - Activity Relationship (QSAR) models play a central role in medicinal chemistry, toxicology and computer-assisted molecular design, as well as a support for regulatory decisions and animal testing reduction. Thus, assessing their predictive ability becomes an essential step for any prospective application. Many metrics have been proposed to estimate the model predictive ability of QSARs, which have created confusion on how models should be evaluated and properly compared. Recently, we showed that the metric Q F 3 2 is particularly well-suited for comparing the external predictivity of different models developed on the same training dataset. However, when comparing models developed on different training data, this function becomes inadequate and only dispersion measures like the root-mean-square error (RMSE) should be used. The intent of this work is to provide clarity on the correct and incorrect uses of Q F 3 2 , discussing its behavior towards the training data distribution and illustrating some cases in which Q F 3 2 estimates may be misleading. Hereby, we encourage the usage of measures of dispersions when models trained on different datasets have to be compared and evaluated.


Asunto(s)
Relación Estructura-Actividad Cuantitativa , Algoritmos , Diseño de Fármacos , Descubrimiento de Drogas/métodos , Descubrimiento de Drogas/normas
7.
Mol Inform ; 38(8-9): e1800124, 2019 08.
Artículo en Inglés | MEDLINE | ID: mdl-30549437

RESUMEN

The ICCVAM Acute Toxicity Workgroup (U.S. Department of Health and Human Services), in collaboration with the U.S. Environmental Protection Agency (U.S. EPA, National Center for Computational Toxicology), coordinated the "Predictive Models for Acute Oral Systemic Toxicity" collaborative project to develop in silico models to predict acute oral systemic toxicity for filling regulatory needs. In this framework, new Quantitative Structure-Activity Relationship (QSAR) models for the prediction of very toxic (LD50 lower than 50 mg/kg) and nontoxic (LD50 greater than or equal to 2,000 mg/kg) endpoints were developed, as described in this study. Models were developed on a large set of chemicals (8992), provided by the project coordinators, considering the five OCED principles for QSAR applicability to regulatory endpoints. A Bayesian consensus approach integrating three different classification QSAR algorithms was applied as modelling method. For both the considered endpoints, the proposed approach demonstrated to be robust and predictive, as determined by a blind validation on a set of external molecules provided in a later stage by the coordinators of the collaborative project. Finally, the integration of predictions obtained for the very toxic and nontoxic endpoints allowed the identification of compounds associated to medium toxicity, as well as the analysis of consistency between the predictions obtained for the two endpoints on the same molecules. Predictions of the proposed consensus approach will be integrated with those originated from models proposed by the participants of the collaborative project to facilitate the regulatory acceptance of in-silico predictions and thus reduce or replace experimental tests for acute toxicity.


Asunto(s)
Compuestos Orgánicos/toxicidad , Relación Estructura-Actividad Cuantitativa , Administración Oral , Animales , Teorema de Bayes , Simulación por Computador , Relación Dosis-Respuesta a Droga , Modelos Moleculares , Compuestos Orgánicos/administración & dosificación , Ratas , Programas Informáticos , Estados Unidos , United States Dept. of Health and Human Services , United States Environmental Protection Agency
8.
Protein Pept Lett ; 25(11): 1015-1023, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30430931

RESUMEN

BACKGROUND: Local classification models were used to establish Quantitative Structure- Activity Relationships (QSARs) of bioactive di-, tri- and tetrapeptides, with their capacity to inhibit Angiotensin Converting Enzyme (ACE). These discrete models can thus predict this activity for other peptides obtained from functional foods. These types of peptides allow some foods to be considered nutraceuticals. METHOD: A database of 313 molecules of di-, tri- and tetrapeptides was investigated and antihypertensive activities of peptides, expressed as log (1/IC50), were separated into two qualitative classes: low activity (inactive) was associated with experimental values under the 66th percentile and active peptides with values above this threshold. Chemicals were divided into a training set, including 70% of the peptides, and a test set for external validation. Genetic algorithms-variable subset selection coupled with the kNN and N3 local classifiers were applied to select the best subset of molecular descriptors from a pool of 953 Dragon descriptors. Both models were validated on the test peptides. RESULTS: The N3 model turned out to be superior to the kNN model when the classification focused on identifying the most active peptides.


Asunto(s)
Inhibidores de la Enzima Convertidora de Angiotensina/química , Inhibidores de la Enzima Convertidora de Angiotensina/farmacología , Oligopéptidos/química , Oligopéptidos/farmacología , Peptidil-Dipeptidasa A/metabolismo , Relación Estructura-Actividad Cuantitativa , Bases de Datos de Proteínas , Concentración 50 Inhibidora , Modelos Estadísticos
9.
Methods Mol Biol ; 1825: 171-209, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30334206

RESUMEN

Molecular descriptors encode a wide variety of molecular information and have become the support of many contemporary chemoinformatic and bioinformatic applications. They grasp specific molecular features (e.g., geometry, shape, pharmacophores, or atomic properties) and directly affect computational models, in terms of outcome, performance, and applicability. This chapter aims to illustrate the impact of different molecular descriptors on the structural information captured and on the perceived chemical similarity among molecules. After introducing the fundamental concepts of molecular descriptor theory and application, a step-by-step retrospective virtual screening procedure guides users through the fundamental processing steps and discusses the impact of different types of molecular descriptors.


Asunto(s)
Técnicas Químicas Combinatorias/métodos , Biología Computacional/métodos , Simulación por Computador , Diseño de Fármacos , Modelos Moleculares , Algoritmos , Humanos
10.
Methods Mol Biol ; 1800: 3-53, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-29934886

RESUMEN

Molecular descriptors capture diverse parts of the structural information of molecules and they are the support of many contemporary computer-assisted toxicological and chemical applications. After briefly introducing some fundamental concepts of structure-activity applications (e.g., molecular descriptor dimensionality, classical vs. fingerprint description, and activity landscapes), this chapter guides the readers through a step-by-step explanation of molecular descriptors rationale and application. To this end, the chapter illustrates a case study of a recently published application of molecular descriptors for modeling the activity on cytochrome P450.


Asunto(s)
Modelos Moleculares , Relación Estructura-Actividad Cuantitativa , Algoritmos , Sistema Enzimático del Citocromo P-450/química , Conformación Molecular , Estructura Molecular , Programas Informáticos
11.
Front Chem ; 5: 53, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28791285

RESUMEN

This work describes a novel approach based on advanced molecular similarity to predict the sweetness of chemicals. The proposed Quantitative Structure-Taste Relationship (QSTR) model is an expert system developed keeping in mind the five principles defined by the Organization for Economic Co-operation and Development (OECD) for the validation of (Q)SARs. The 649 sweet and non-sweet molecules were described by both conformation-independent extended-connectivity fingerprints (ECFPs) and molecular descriptors. In particular, the molecular similarity in the ECFPs space showed a clear association with molecular taste and it was exploited for model development. Molecules laying in the subspaces where the taste assignation was more difficult were modeled trough a consensus between linear and local approaches (Partial Least Squares-Discriminant Analysis and N-nearest-neighbor classifier). The expert system, which was thoroughly validated through a Monte Carlo procedure and an external set, gave satisfactory results in comparison with the state-of-the-art models. Moreover, the QSTR model can be leveraged into a greater understanding of the relationship between molecular structure and sweetness, and into the design of novel sweeteners.

12.
Mol Inform ; 36(1-2)2017 01.
Artículo en Inglés | MEDLINE | ID: mdl-27650559

RESUMEN

Molecular descriptors capture diverse structural information of molecules and are a prerequisite for ligand-based similarity searching. In this study, we introduce topological matrix-based descriptors to virtual screening for hit discovery. We evaluated the usefulness of matrix-based descriptors in a retrospective setting and compared them with topological pharmacophore descriptors. Special attention was given to the influence of data pre-processing and the applied similarity metric on the virtual screening performance. Overall, the MB descriptors showed a competitive and complementary performance to other descriptors. A prospective screen of a commercial compound library led to the discovery of a novel natural-product-derived cyclooxygenase-2 inhibitor predicted to interact differently with the target protein compared to the query compound ibuprofen. The results of our study motivate the use of matrix-based descriptors for molecular similarity-based virtual screening and scaffold hopping.


Asunto(s)
Diseño de Fármacos , Simulación del Acoplamiento Molecular/métodos , Relación Estructura-Actividad Cuantitativa , Bibliotecas de Moléculas Pequeñas/química , Sitios de Unión , Ciclooxigenasa 2/química , Ciclooxigenasa 2/metabolismo , Inhibidores de la Ciclooxigenasa 2/química , Inhibidores de la Ciclooxigenasa 2/farmacología , Unión Proteica , Bibliotecas de Moléculas Pequeñas/farmacología
14.
J Chem Inf Model ; 56(10): 1905-1913, 2016 10 24.
Artículo en Inglés | MEDLINE | ID: mdl-27633067

RESUMEN

Validation is an essential step of QSAR modeling, and it can be performed by both internal validation techniques (e.g., cross-validation, bootstrap) or by an external set of test objects, that is, objects not used for model development and/or optimization. The evaluation of model predictive ability is then completed by comparing experimental and predicted values of test molecules. When dealing with quantitative QSAR models, validation results are generally expressed in terms of Q2 metrics. In this work, four fundamental mathematical principles, which should be respected by any Q2 metric, are introduced. Then, the behavior of five different metrics (QF12, QF22, QF32, QCCC2, and QRm2) is compared and critically discussed. The conclusions highlight that only the QF32 metric satisfies all the stated conditions, while the remaining metrics show different theoretical flaws.


Asunto(s)
Algoritmos , Relación Estructura-Actividad Cuantitativa , Simulación por Computador , Modelos Químicos
15.
Int J Mol Sci ; 17(6)2016 Jun 09.
Artículo en Inglés | MEDLINE | ID: mdl-27294921

RESUMEN

Cytochromes P450 (CYP) are the main actors in the oxidation of xenobiotics and play a crucial role in drug safety, persistence, bioactivation, and drug-drug/food-drug interaction. This work aims to develop Quantitative Structure-Activity Relationship (QSAR) models to predict the drug interaction with two of the most important CYP isoforms, namely 2C9 and 3A4. The presented models are calibrated on 9122 drug-like compounds, using three different modelling approaches and two types of molecular description (classical molecular descriptors and binary fingerprints). For each isoform, three classification models are presented, based on a different approach and with different advantages: (1) a very simple and interpretable classification tree; (2) a local (k-Nearest Neighbor) model based classical descriptors and; (3) a model based on a recently proposed local classifier (N-Nearest Neighbor) on binary fingerprints. The salient features of the work are (1) the thorough model validation and the applicability domain assessment; (2) the descriptor interpretation, which highlighted the crucial aspects of P450-drug interaction; and (3) the consensus aggregation of models, which largely increased the prediction accuracy.


Asunto(s)
Inhibidores del Citocromo P-450 CYP2C9/farmacología , Citocromo P-450 CYP2C9/química , Inhibidores del Citocromo P-450 CYP3A/farmacología , Citocromo P-450 CYP3A/química , Relación Estructura-Actividad Cuantitativa , Animales , Simulación por Computador , Citocromo P-450 CYP2C9/metabolismo , Inhibidores del Citocromo P-450 CYP2C9/química , Citocromo P-450 CYP3A/metabolismo , Inhibidores del Citocromo P-450 CYP3A/química , Humanos , Unión Proteica
16.
Environ Res ; 148: 507-512, 2016 07.
Artículo en Inglés | MEDLINE | ID: mdl-27152714

RESUMEN

Expert systems are a rational integration of several models that generally aim to exploit their advantages and overcome their drawbacks. This work is founded on our previously published Quantitative Structure-Activity Relationship (QSAR) classification scheme, which detects compounds whose Bioconcentration Factor (BCF) is (1) well predicted by the octanol-water partition coefficient (KOW), (2) underestimated by KOW or (3) overestimated by KOW. The classification scheme served as the starting point to identify and combine the best BCF model for each class among three VEGA models and one KOW-based equation. The rationalized model integration showed stability and surprising performance on unknown data when compared with benchmark BCF models. Model simplicity, transparency and mechanistic interpretation were fostered in order to allow for its application and acceptance within the REACH framework.


Asunto(s)
Modelos Teóricos , Relación Estructura-Actividad Cuantitativa , 1-Octanol/química , Unión Europea , Regulación Gubernamental , Sustancias Peligrosas/química , Agua/química
17.
Environ Health Perspect ; 124(7): 1023-33, 2016 07.
Artículo en Inglés | MEDLINE | ID: mdl-26908244

RESUMEN

BACKGROUND: Humans are exposed to thousands of man-made chemicals in the environment. Some chemicals mimic natural endocrine hormones and, thus, have the potential to be endocrine disruptors. Most of these chemicals have never been tested for their ability to interact with the estrogen receptor (ER). Risk assessors need tools to prioritize chemicals for evaluation in costly in vivo tests, for instance, within the U.S. EPA Endocrine Disruptor Screening Program. OBJECTIVES: We describe a large-scale modeling project called CERAPP (Collaborative Estrogen Receptor Activity Prediction Project) and demonstrate the efficacy of using predictive computational models trained on high-throughput screening data to evaluate thousands of chemicals for ER-related activity and prioritize them for further testing. METHODS: CERAPP combined multiple models developed in collaboration with 17 groups in the United States and Europe to predict ER activity of a common set of 32,464 chemical structures. Quantitative structure-activity relationship models and docking approaches were employed, mostly using a common training set of 1,677 chemical structures provided by the U.S. EPA, to build a total of 40 categorical and 8 continuous models for binding, agonist, and antagonist ER activity. All predictions were evaluated on a set of 7,522 chemicals curated from the literature. To overcome the limitations of single models, a consensus was built by weighting models on scores based on their evaluated accuracies. RESULTS: Individual model scores ranged from 0.69 to 0.85, showing high prediction reliabilities. Out of the 32,464 chemicals, the consensus model predicted 4,001 chemicals (12.3%) as high priority actives and 6,742 potential actives (20.8%) to be considered for further testing. CONCLUSION: This project demonstrated the possibility to screen large libraries of chemicals using a consensus of different in silico approaches. This concept will be applied in future projects related to other end points. CITATION: Mansouri K, Abdelaziz A, Rybacka A, Roncaglioni A, Tropsha A, Varnek A, Zakharov A, Worth A, Richard AM, Grulke CM, Trisciuzzi D, Fourches D, Horvath D, Benfenati E, Muratov E, Wedebye EB, Grisoni F, Mangiatordi GF, Incisivo GM, Hong H, Ng HW, Tetko IV, Balabin I, Kancherla J, Shen J, Burton J, Nicklaus M, Cassotti M, Nikolov NG, Nicolotti O, Andersson PL, Zang Q, Politi R, Beger RD, Todeschini R, Huang R, Farag S, Rosenberg SA, Slavov S, Hu X, Judson RS. 2016. CERAPP: Collaborative Estrogen Receptor Activity Prediction Project. Environ Health Perspect 124:1023-1033; http://dx.doi.org/10.1289/ehp.1510267.


Asunto(s)
Disruptores Endocrinos/toxicidad , Receptores de Estrógenos/metabolismo , Pruebas de Toxicidad , Simulación por Computador , Disruptores Endocrinos/clasificación , Política Ambiental , Relación Estructura-Actividad Cuantitativa , Estados Unidos
18.
Environ Int ; 88: 198-205, 2016 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-26760717

RESUMEN

This paper proposes a scheme to predict whether a compound (1) is mainly stored within lipid tissues, (2) has additional storage sites (e.g., proteins), or (3) is metabolized/eliminated with a reduced bioconcentration. The approach is based on two validated QSAR (Quantitative Structure-Activity Relationship) trees, whose salient features are: (a) descriptor interpretability and (b) simplicity. Trees were developed for 779 organic compounds, the TGD approach was used to quantify the lipid-driven bioconcentration, and a refined machine-learning optimization procedure was applied. We focused on molecular descriptor interpretation, which allowed us to gather new mechanistic insights into the bioconcentration mechanisms.


Asunto(s)
Monitoreo del Ambiente/métodos , Contaminantes Ambientales/metabolismo , Compuestos Orgánicos/metabolismo , Relación Estructura-Actividad Cuantitativa , Animales , Humanos
19.
J Cheminform ; 8: 49, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-28316647

RESUMEN

This communication deals with the scientific problem of evaluating the similarity between two chemical systems, each described by a finite discrete set of elements/members, which are in turn p-dimensional vectors of chemical/biological descriptors. A variant of the Hausdorff measure, called Hausdorff-like similarity (Hs), is proposed aimed at taking into account information on all the elements present in the compared sets, information that is usually lost by the other measures.

20.
J Chem Inf Model ; 55(11): 2365-74, 2015 Nov 23.
Artículo en Inglés | MEDLINE | ID: mdl-26479827

RESUMEN

Two novel classification methods, called N3 (N-nearest neighbors) and BNN (binned nearest neighbors), are proposed. Both methods are inspired by the principles of the K-nearest neighbors (KNN) method, being both based on object pairwise similarities. Their performance was evaluated in comparison with nine well-known classification methods. In order to obtain reliable statistics, several comparisons were performed using 32 different literature data sets, which differ for number of objects, variables and classes. Results highlighted that N3 on average behaves as the most efficient classification method with similar performance to support vector machine based on radial basis function kernel (SVM/RBF). The method BNN showed on average higher performance than the classical K-nearest neighbors method.


Asunto(s)
Inteligencia Artificial , Reconocimiento de Normas Patrones Automatizadas/métodos , Algoritmos , Animales , Bases de Datos Factuales , Humanos , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...