Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 44
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Molecules ; 27(18)2022 Sep 08.
Artigo em Inglês | MEDLINE | ID: mdl-36144564

RESUMO

Mass spectrometry (MS) is widely used for the identification of chemical compounds by matching the experimentally acquired mass spectrum against a database of reference spectra. However, this approach suffers from a limited coverage of the existing databases causing a failure in the identification of a compound not present in the database. Among the computational approaches for mining metabolite structures based on MS data, one option is to predict molecular fingerprints from the mass spectra by means of chemometric strategies and then use them to screen compound libraries. This can be carried out by calibrating multi-task artificial neural networks from large datasets of mass spectra, used as inputs, and molecular fingerprints as outputs. In this study, we prepared a large LC-MS/MS dataset from an on-line open repository. These data were used to train and evaluate deep-learning-based approaches to predict molecular fingerprints and retrieve the structure of unknown compounds from their LC-MS/MS spectra. Effects of data sparseness and the impact of different strategies of data curing and dimensionality reduction on the output accuracy have been evaluated. Moreover, extensive diagnostics have been carried out to evaluate modelling advantages and drawbacks as a function of the explored chemical space.


Assuntos
Redes Neurais de Computação , Espectrometria de Massas em Tandem , Cromatografia Líquida/métodos , Bases de Dados Factuais , Espectrometria de Massas em Tandem/métodos
2.
Molecules ; 26(23)2021 Nov 30.
Artigo em Inglês | MEDLINE | ID: mdl-34885837

RESUMO

Neural networks are rapidly gaining popularity in chemical modeling and Quantitative Structure-Activity Relationship (QSAR) thanks to their ability to handle multitask problems. However, outcomes of neural networks depend on the tuning of several hyperparameters, whose small variations can often strongly affect their performance. Hence, optimization is a fundamental step in training neural networks although, in many cases, it can be very expensive from a computational point of view. In this study, we compared four of the most widely used approaches for tuning hyperparameters, namely, grid search, random search, tree-structured Parzen estimator, and genetic algorithms on three multitask QSAR datasets. We mainly focused on parsimonious optimization and thus not only on the performance of neural networks, but also the computational time that was taken into account. Furthermore, since the optimization approaches do not directly provide information about the influence of hyperparameters, we applied experimental design strategies to determine their effects on the neural network performance. We found that genetic algorithms, tree-structured Parzen estimator, and random search require on average 0.08% of the hours required by grid search; in addition, tree-structured Parzen estimator and genetic algorithms provide better results than random search.

3.
J Chem Inf Model ; 56(10): 1905-1913, 2016 10 24.
Artigo em Inglês | MEDLINE | ID: mdl-27633067

RESUMO

Validation is an essential step of QSAR modeling, and it can be performed by both internal validation techniques (e.g., cross-validation, bootstrap) or by an external set of test objects, that is, objects not used for model development and/or optimization. The evaluation of model predictive ability is then completed by comparing experimental and predicted values of test molecules. When dealing with quantitative QSAR models, validation results are generally expressed in terms of Q2 metrics. In this work, four fundamental mathematical principles, which should be respected by any Q2 metric, are introduced. Then, the behavior of five different metrics (QF12, QF22, QF32, QCCC2, and QRm2) is compared and critically discussed. The conclusions highlight that only the QF32 metric satisfies all the stated conditions, while the remaining metrics show different theoretical flaws.


Assuntos
Algoritmos , Relação Quantitativa Estrutura-Atividade , Simulação por Computador , Modelos Químicos
4.
Environ Res ; 148: 507-512, 2016 07.
Artigo em Inglês | MEDLINE | ID: mdl-27152714

RESUMO

Expert systems are a rational integration of several models that generally aim to exploit their advantages and overcome their drawbacks. This work is founded on our previously published Quantitative Structure-Activity Relationship (QSAR) classification scheme, which detects compounds whose Bioconcentration Factor (BCF) is (1) well predicted by the octanol-water partition coefficient (KOW), (2) underestimated by KOW or (3) overestimated by KOW. The classification scheme served as the starting point to identify and combine the best BCF model for each class among three VEGA models and one KOW-based equation. The rationalized model integration showed stability and surprising performance on unknown data when compared with benchmark BCF models. Model simplicity, transparency and mechanistic interpretation were fostered in order to allow for its application and acceptance within the REACH framework.


Assuntos
Modelos Teóricos , Relação Quantitativa Estrutura-Atividade , 1-Octanol/química , União Europeia , Regulamentação Governamental , Substâncias Perigosas/química , Água/química
5.
Int J Mol Sci ; 17(6)2016 Jun 09.
Artigo em Inglês | MEDLINE | ID: mdl-27294921

RESUMO

Cytochromes P450 (CYP) are the main actors in the oxidation of xenobiotics and play a crucial role in drug safety, persistence, bioactivation, and drug-drug/food-drug interaction. This work aims to develop Quantitative Structure-Activity Relationship (QSAR) models to predict the drug interaction with two of the most important CYP isoforms, namely 2C9 and 3A4. The presented models are calibrated on 9122 drug-like compounds, using three different modelling approaches and two types of molecular description (classical molecular descriptors and binary fingerprints). For each isoform, three classification models are presented, based on a different approach and with different advantages: (1) a very simple and interpretable classification tree; (2) a local (k-Nearest Neighbor) model based classical descriptors and; (3) a model based on a recently proposed local classifier (N-Nearest Neighbor) on binary fingerprints. The salient features of the work are (1) the thorough model validation and the applicability domain assessment; (2) the descriptor interpretation, which highlighted the crucial aspects of P450-drug interaction; and (3) the consensus aggregation of models, which largely increased the prediction accuracy.


Assuntos
Inibidores do Citocromo P-450 CYP2C9/farmacologia , Citocromo P-450 CYP2C9/química , Inibidores do Citocromo P-450 CYP3A/farmacologia , Citocromo P-450 CYP3A/química , Relação Quantitativa Estrutura-Atividade , Animais , Simulação por Computador , Citocromo P-450 CYP2C9/metabolismo , Inibidores do Citocromo P-450 CYP2C9/química , Citocromo P-450 CYP3A/metabolismo , Inibidores do Citocromo P-450 CYP3A/química , Humanos , Ligação Proteica
7.
J Chem Inf Model ; 55(11): 2365-74, 2015 Nov 23.
Artigo em Inglês | MEDLINE | ID: mdl-26479827

RESUMO

Two novel classification methods, called N3 (N-nearest neighbors) and BNN (binned nearest neighbors), are proposed. Both methods are inspired by the principles of the K-nearest neighbors (KNN) method, being both based on object pairwise similarities. Their performance was evaluated in comparison with nine well-known classification methods. In order to obtain reliable statistics, several comparisons were performed using 32 different literature data sets, which differ for number of objects, variables and classes. Results highlighted that N3 on average behaves as the most efficient classification method with similar performance to support vector machine based on radial basis function kernel (SVM/RBF). The method BNN showed on average higher performance than the classical K-nearest neighbors method.


Assuntos
Inteligência Artificial , Reconhecimento Automatizado de Padrão/métodos , Algoritmos , Animais , Bases de Dados Factuais , Humanos , Software
8.
Altern Lab Anim ; 42(1): 31-41, 2014 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-24773486

RESUMO

In this study, a QSAR model was developed from a data set consisting of 546 organic molecules, to predict acute aquatic toxicity toward Daphnia magna. A modified k-Nearest Neighbour (kNN) strategy was used as the regression method, which provided prediction only for those molecules with an average distance from the k nearest neighbours lower than a selected threshold. The final model showed good performance (R(2) and Q(2) cv equal to 0.78, Q(2) ext equal to 0.72). It comprised eight molecular descriptors that encoded information about lipophilicity, the formation of H-bonds, polar surface area, polarisability, nucleophilicity and electrophilicity.


Assuntos
Daphnia/efeitos dos fármacos , Compostos Orgânicos/toxicidade , Testes de Toxicidade Aguda/métodos , Animais , Relação Quantitativa Estrutura-Atividade , Análise de Regressão
9.
Int J Mol Sci ; 15(10): 18162-74, 2014 Oct 09.
Artigo em Inglês | MEDLINE | ID: mdl-25302621

RESUMO

A series of 436 Munro database chemicals were studied with respect to their corresponding experimental LD50 values to investigate the possibility of establishing a global QSAR model for acute toxicity. Dragon molecular descriptors were used for the QSAR model development and genetic algorithms were used to select descriptors better correlated with toxicity data. Toxic values were discretized in a qualitative class on the basis of the Globally Harmonized Scheme: the 436 chemicals were divided into 3 classes based on their experimental LD50 values: highly toxic, intermediate toxic and low to non-toxic. The k-nearest neighbor (k-NN) classification method was calibrated on 25 molecular descriptors and gave a non-error rate (NER) equal to 0.66 and 0.57 for internal and external prediction sets, respectively. Even if the classification performances are not optimal, the subsequent analysis of the selected descriptors and their relationship with toxicity levels constitute a step towards the development of a global QSAR model for acute toxicity.


Assuntos
Modelos Biológicos , Relação Quantitativa Estrutura-Atividade , Testes de Toxicidade Aguda , Animais , Bases de Dados Factuais , Humanos , Dose Letal Mediana , Estrutura Molecular , Compostos Orgânicos/química , Compostos Orgânicos/toxicidade
10.
J Cheminform ; 16(1): 35, 2024 Mar 25.
Artigo em Inglês | MEDLINE | ID: mdl-38528548

RESUMO

Natural products are a diverse class of compounds with promising biological properties, such as high potency and excellent selectivity. However, they have different structural motifs than typical drug-like compounds, e.g., a wider range of molecular weight, multiple stereocenters and higher fraction of sp3-hybridized carbons. This makes the encoding of natural products via molecular fingerprints difficult, thus restricting their use in cheminformatics studies. To tackle this issue, we explored over 30 years of research to systematically evaluate which molecular fingerprint provides the best performance on the natural product chemical space. We considered 20 molecular fingerprints from four different sources, which we then benchmarked on over 100,000 unique natural products from the COCONUT (COlleCtion of Open Natural prodUcTs) and CMNPD (Comprehensive Marine Natural Products Database) databases. Our analysis focused on the correlation between different fingerprints and their classification performance on 12 bioactivity prediction datasets. Our results show that different encodings can provide fundamentally different views of the natural product chemical space, leading to substantial differences in pairwise similarity and performance. While Extended Connectivity Fingerprints are the de-facto option to encoding drug-like compounds, other fingerprints resulted to match or outperform them for bioactivity prediction of natural products. These results highlight the need to evaluate multiple fingerprinting algorithms for optimal performance and suggest new areas of research. Finally, we provide an open-source Python package for computing all molecular fingerprints considered in the study, as well as data and scripts necessary to reproduce the results, at https://github.com/dahvida/NP_Fingerprints .

11.
J Chem Inf Model ; 53(4): 867-78, 2013 Apr 22.
Artigo em Inglês | MEDLINE | ID: mdl-23469921

RESUMO

The European REACH regulation requires information on ready biodegradation, which is a screening test to assess the biodegradability of chemicals. At the same time REACH encourages the use of alternatives to animal testing which includes predictions from quantitative structure-activity relationship (QSAR) models. The aim of this study was to build QSAR models to predict ready biodegradation of chemicals by using different modeling methods and types of molecular descriptors. Particular attention was given to data screening and validation procedures in order to build predictive models. Experimental values of 1055 chemicals were collected from the webpage of the National Institute of Technology and Evaluation of Japan (NITE): 837 and 218 molecules were used for calibration and testing purposes, respectively. In addition, models were further evaluated using an external validation set consisting of 670 molecules. Classification models were produced in order to discriminate biodegradable and nonbiodegradable chemicals by means of different mathematical methods: k nearest neighbors, partial least squares discriminant analysis, and support vector machines, as well as their consensus models. The proposed models and the derived consensus analysis demonstrated good classification performances with respect to already published QSAR models on biodegradation. Relationships between the molecular descriptors selected in each QSAR model and biodegradability were evaluated.


Assuntos
Modelos Estatísticos , Bibliotecas de Moléculas Pequenas/metabolismo , Biodegradação Ambiental , Bases de Dados de Compostos Químicos , Estrutura Molecular , Relação Quantitativa Estrutura-Atividade , Bibliotecas de Moléculas Pequenas/química , Bibliotecas de Moléculas Pequenas/classificação
12.
Food Res Int ; 171: 113036, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37330849

RESUMO

The capacity to discriminate safe from dangerous compounds has played an important role in the evolution of species, including human beings. Highly evolved senses such as taste receptors allow humans to navigate and survive in the environment through information that arrives to the brain through electrical pulses. Specifically, taste receptors provide multiple bits of information about the substances that are introduced orally. These substances could be pleasant or not according to the taste responses that they trigger. Tastes have been classified into basic (sweet, bitter, umami, sour and salty) or non-basic (astringent, chilling, cooling, heating, pungent), while some compounds are considered as multitastes, taste modifiers or tasteless. Classification-based machine learning approaches are useful tools to develop predictive mathematical relationships in such a way as to predict the taste class of new molecules based on their chemical structure. This work reviews the history of multicriteria quantitative structure-taste relationship modelling, starting from the first ligand-based (LB) classifier proposed in 1980 by Lemont B. Kier and concluding with the most recent studies published in 2022.


Assuntos
Papilas Gustativas , Paladar , Humanos , Paladar/fisiologia , Percepção Gustatória
13.
J Chem Inf Model ; 52(11): 2884-901, 2012 Nov 26.
Artigo em Inglês | MEDLINE | ID: mdl-23078167

RESUMO

This paper reports an analysis and comparison of the use of 51 different similarity coefficients for computing the similarities between binary fingerprints for both simulated and real chemical data sets. Five pairs and a triplet of coefficients were found to yield identical similarity values, leading to the elimination of seven of the coefficients. The remaining 44 coefficients were then compared in two ways: by their theoretical characteristics using simple descriptive statistics, correlation analysis, multidimensional scaling, Hasse diagrams, and the recently described atemporal target diffusion model; and by their effectiveness for similarity-based virtual screening using MDDR, WOMBAT, and MUV data. The comparisons demonstrate the general utility of the well-known Tanimoto method but also suggest other coefficients that may be worthy of further attention.


Assuntos
Algoritmos , Inibidores Enzimáticos/química , Modelos Químicos , Proteínas/antagonistas & inibidores , Antagonistas da Serotonina/química , Simulação por Computador , Bases de Dados de Compostos Químicos , Descoberta de Drogas , Estrutura Molecular , Relação Estrutura-Atividade
14.
Molecules ; 17(5): 4791-810, 2012 Apr 25.
Artigo em Inglês | MEDLINE | ID: mdl-22534664

RESUMO

One of the OECD principles for model validation requires defining the Applicability Domain (AD) for the QSAR models. This is important since the reliable predictions are generally limited to query chemicals structurally similar to the training compounds used to build the model. Therefore, characterization of interpolation space is significant in defining the AD and in this study some existing descriptor-based approaches performing this task are discussed and compared by implementing them on existing validated datasets from the literature. Algorithms adopted by different approaches allow defining the interpolation space in several ways, while defined thresholds contribute significantly to the extrapolations. For each dataset and approach implemented for this study, the comparison analysis was carried out by considering the model statistics and relative position of test set with respect to the training space.


Assuntos
Modelos Estatísticos , Relação Quantitativa Estrutura-Atividade , Algoritmos , Modelos Químicos
15.
J Comput Aided Mol Des ; 25(6): 533-54, 2011 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-21660515

RESUMO

The Online Chemical Modeling Environment is a web-based platform that aims to automate and simplify the typical steps required for QSAR modeling. The platform consists of two major subsystems: the database of experimental measurements and the modeling framework. A user-contributed database contains a set of tools for easy input, search and modification of thousands of records. The OCHEM database is based on the wiki principle and focuses primarily on the quality and verifiability of the data. The database is tightly integrated with the modeling framework, which supports all the steps required to create a predictive model: data search, calculation and selection of a vast variety of molecular descriptors, application of machine learning methods, validation, analysis of the model and assessment of the applicability domain. As compared to other similar systems, OCHEM is not intended to re-implement the existing tools or models but rather to invite the original authors to contribute their results, make them publicly available, share them with other users and to become members of the growing research community. Our intention is to make OCHEM a widely used platform to perform the QSPR/QSAR studies online and share it with other users on the Web. The ultimate goal of OCHEM is collecting all possible chemoinformatics tools within one simple, reliable and user-friendly resource. The OCHEM is free for web users and it is available online at http://www.ochem.eu.


Assuntos
Bases de Dados Factuais , Internet , Modelos Químicos , Disseminação de Informação , Gestão da Informação , Relação Quantitativa Estrutura-Atividade , Interface Usuário-Computador
16.
J Chem Inf Model ; 50(12): 2094-111, 2010 Dec 27.
Artigo em Inglês | MEDLINE | ID: mdl-21033656

RESUMO

The estimation of accuracy and applicability of QSAR and QSPR models for biological and physicochemical properties represents a critical problem. The developed parameter of "distance to model" (DM) is defined as a metric of similarity between the training and test set compounds that have been subjected to QSAR/QSPR modeling. In our previous work, we demonstrated the utility and optimal performance of DM metrics that have been based on the standard deviation within an ensemble of QSAR models. The current study applies such analysis to 30 QSAR models for the Ames mutagenicity data set that were previously reported within the 2009 QSAR challenge. We demonstrate that the DMs based on an ensemble (consensus) model provide systematically better performance than other DMs. The presented approach identifies 30-60% of compounds having an accuracy of prediction similar to the interlaboratory accuracy of the Ames test, which is estimated to be 90%. Thus, the in silico predictions can be used to halve the cost of experimental measurements by providing a similar prediction accuracy. The developed model has been made publicly available at http://ochem.eu/models/1 .


Assuntos
Benchmarking/métodos , Classificação/métodos , Testes de Mutagenicidade/métodos , Relação Quantitativa Estrutura-Atividade , Testes de Mutagenicidade/normas , Análise de Componente Principal
17.
Food Chem ; 315: 126248, 2020 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-32018076

RESUMO

Chianti is a precious red wine and enjoys a high reputation for its high quality in the world wine market. Despite this, the production region is small and product needs efficient tools to protect its brands and prevent adulterations. In this sense, ICP-MS combined with chemometrics has demonstrated its usefulness in food authentication. In this study, Chianti/Chianti Classico, authentic wines from vineyard of Toscana region (Italy), together samples from 18 different geographical regions, were analyzed with the objective of differentiate them from other Italian wines. Partial Least Squares-Discriminant Analysis (PLS-DA) identified variables to discriminate wine geographical origin. Rare Earth Elements (REE), major and trace elements all contributed to the discrimination of Chianti samples. General model was not suited to distinguish PDO red wines from samples, with similar chemical fingerprints, collected in some regions. Specific classification models enhanced the capability of discrimination, emphasizing the discriminant role of some elements.


Assuntos
Análise de Alimentos/métodos , Espectrometria de Massas/métodos , Vinho/análise , Análise Discriminante , Análise de Alimentos/estatística & dados numéricos , Itália , Análise dos Mínimos Quadrados , Limite de Detecção , Espectrometria de Massas/estatística & dados numéricos , Metais Terras Raras/análise , Oligoelementos/análise
19.
Mol Inform ; 38(8-9): e1800124, 2019 08.
Artigo em Inglês | MEDLINE | ID: mdl-30549437

RESUMO

The ICCVAM Acute Toxicity Workgroup (U.S. Department of Health and Human Services), in collaboration with the U.S. Environmental Protection Agency (U.S. EPA, National Center for Computational Toxicology), coordinated the "Predictive Models for Acute Oral Systemic Toxicity" collaborative project to develop in silico models to predict acute oral systemic toxicity for filling regulatory needs. In this framework, new Quantitative Structure-Activity Relationship (QSAR) models for the prediction of very toxic (LD50 lower than 50 mg/kg) and nontoxic (LD50 greater than or equal to 2,000 mg/kg) endpoints were developed, as described in this study. Models were developed on a large set of chemicals (8992), provided by the project coordinators, considering the five OCED principles for QSAR applicability to regulatory endpoints. A Bayesian consensus approach integrating three different classification QSAR algorithms was applied as modelling method. For both the considered endpoints, the proposed approach demonstrated to be robust and predictive, as determined by a blind validation on a set of external molecules provided in a later stage by the coordinators of the collaborative project. Finally, the integration of predictions obtained for the very toxic and nontoxic endpoints allowed the identification of compounds associated to medium toxicity, as well as the analysis of consistency between the predictions obtained for the two endpoints on the same molecules. Predictions of the proposed consensus approach will be integrated with those originated from models proposed by the participants of the collaborative project to facilitate the regulatory acceptance of in-silico predictions and thus reduce or replace experimental tests for acute toxicity.


Assuntos
Compostos Orgânicos/toxicidade , Relação Quantitativa Estrutura-Atividade , Administração Oral , Animais , Teorema de Bayes , Simulação por Computador , Relação Dose-Resposta a Droga , Modelos Moleculares , Compostos Orgânicos/administração & dosagem , Ratos , Software , Estados Unidos , United States Dept. of Health and Human Services , United States Environmental Protection Agency
20.
Mol Inform ; 38(1-2): e1800029, 2019 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-30142701

RESUMO

Quantitative Structure - Activity Relationship (QSAR) models play a central role in medicinal chemistry, toxicology and computer-assisted molecular design, as well as a support for regulatory decisions and animal testing reduction. Thus, assessing their predictive ability becomes an essential step for any prospective application. Many metrics have been proposed to estimate the model predictive ability of QSARs, which have created confusion on how models should be evaluated and properly compared. Recently, we showed that the metric Q F 3 2 is particularly well-suited for comparing the external predictivity of different models developed on the same training dataset. However, when comparing models developed on different training data, this function becomes inadequate and only dispersion measures like the root-mean-square error (RMSE) should be used. The intent of this work is to provide clarity on the correct and incorrect uses of Q F 3 2 , discussing its behavior towards the training data distribution and illustrating some cases in which Q F 3 2 estimates may be misleading. Hereby, we encourage the usage of measures of dispersions when models trained on different datasets have to be compared and evaluated.


Assuntos
Relação Quantitativa Estrutura-Atividade , Algoritmos , Desenho de Fármacos , Descoberta de Drogas/métodos , Descoberta de Drogas/normas
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA