Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
J Ethnopharmacol ; 197: 61-72, 2017 Feb 02.
Artigo em Inglês | MEDLINE | ID: mdl-27452659

RESUMO

ETHNOPHARMACOLOGICAL RELEVANCE: Cassia auriculata (CA) is used as an antidiabetic therapy in Ayurvedic and Siddha practice. This study aimed to understand the mode-of-action of CA via combined cheminformatics and in vivo biological analysis. In particular, the effect of 10 polyphenolic constituents of CA in modulating insulin and immunoprotective pathways were studied. MATERIALS AND METHODS: In silico target prediction was first employed to predict the probability of the polyphenols interacting with key protein targets related to insulin signalling, based on a model trained on known bioactivity data and chemical similarity considerations. Next, CA was investigated in in vivo studies where induced type 2 diabetic rats were treated with CA for 28 days and the expression levels of genes regulating insulin signalling pathway, glucose transporters of hepatic (GLUT2) and muscular (GLUT4) tissue, insulin receptor substrate (IRS), phosphorylated insulin receptor (AKT), gluconeogenesis (G6PC and PCK-1), along with inflammatory mediators genes (NF-κB, IL-6, IFN-γ and TNF-α) and peroxisome proliferators-activated receptor gamma (PPAR-γ) were determined by qPCR. RESULTS: In silico analysis shows that several of the top 20 enriched targets predicted for the constituents of CA are involved in insulin signalling pathways e.g. PTPN1, PCK-α, AKT2, PI3K-γ. Some of the predictions were supported by scientific literature such as the prediction of PI3K for epigallocatechin gallate. Based on the in silico and in vivo findings, we hypothesized that CA may enhance glucose uptake and glucose transporter expressions via the IRS signalling pathway. This is based on AKT2 and PI3K-γ being listed in the top 20 enriched targets. In vivo analysis shows significant increase in the expression of IRS, AKT, GLUT2 and GLUT4. CA may also affect the PPAR-γ signalling pathway. This is based on the CA-treated groups showing significant activation of PPAR-γ in the liver compared to control. PPAR-γ was predicted by the in silico target prediction with high normalisation rate although it was not in the top 20 most enriched targets. CA may also be involved in the gluconeogenesis and glycogenolysis in the liver based on the downregulation of G6PC and PCK-1 genes seen in CA-treated groups. In addition, CA-treated groups also showed decreased cholesterol, triglyceride, glucose, CRP and Hb1Ac levels, and increased insulin and C-peptide levels. These findings demonstrate the insulin secretagogue and sensitizer effect of CA. CONCLUSION: Based on both an in silico and in vivo analysis, we propose here that CA mediates glucose/lipid metabolism via the PI3K signalling pathway, and influence AKT thereby causing insulin secretion and insulin sensitivity in peripheral tissues. CA enhances glucose uptake and expression of glucose transporters in particular via the upregulation of GLUT2 and GLUT4. Thus, based on its ability to modulate immunometabolic pathways, CA appears as an attractive long term therapy for T2DM even at relatively low doses.


Assuntos
Cassia/química , Diabetes Mellitus Tipo 2/tratamento farmacológico , Extratos Vegetais/farmacologia , Animais , Diabetes Mellitus Experimental/tratamento farmacológico , Diabetes Mellitus Experimental/metabolismo , Diabetes Mellitus Tipo 2/metabolismo , Transportador de Glucose Tipo 2/metabolismo , Transportador de Glucose Tipo 4/metabolismo , Insulina/metabolismo , Proteínas Substratos do Receptor de Insulina/metabolismo , Fígado/efeitos dos fármacos , Fígado/metabolismo , PPAR gama/metabolismo , Fosfatidilinositol 3-Quinases/metabolismo , Proteínas Proto-Oncogênicas c-akt/metabolismo , Ratos , Ratos Sprague-Dawley , Transdução de Sinais/efeitos dos fármacos
2.
BMC Res Notes ; 8: 744, 2015 Dec 03.
Artigo em Inglês | MEDLINE | ID: mdl-26634450

RESUMO

BACKGROUND: We recently reported that one may be able to predict with high accuracy the chemical mechanism of an enzyme by employing a simple pattern recognition approach: a k Nearest Neighbour rule with k = 1 (k1NN) and 321 InterPro sequence signatures as enzyme features. The nearest-neighbour rule is known to be highly sensitive to errors in the training data, in particular when the available training dataset is small. This was the case in our previous study, in which our dataset comprised 248 enzymes annotated against 71 enzymatic mechanism labels from the MACiE database. In the current study, we have carefully re-analysed our dataset and prediction results to "explain" why a high variance k1NN rule exhibited such remarkable classification performance. RESULTS: We find that enzymes with different chemical mechanism labels in this dataset reside in barely overlapping subspaces in the feature space defined by the 321 features selected. These features contain the appropriate information needed to accurately classify the enzymatic mechanisms, rendering our classification problem a basic look-up exercise. This observation dovetails with the low misclassification rate we reported. CONCLUSION: Our results provide explanations for the "anomaly"-a basic nearest-neighbour algorithm exhibiting remarkable prediction performance for enzymatic mechanism despite the fact that the feature space was large and sparse. Our results also dovetail well with another finding we reported, namely that InterPro signatures are critical for accurate prediction of enzyme mechanism. We also suggest simple rules that might enable one to inductively predict whether a novel enzyme possesses any of our 71 predefined mechanisms.


Assuntos
Enzimas/metabolismo , Software
3.
J Cheminform ; 7: 58, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26628925

RESUMO

It is common in cheminformatics to represent the properties of a ligand as a string of 1's and 0's, with the intention of elucidating, inter alia, the relationship between the chemical structure of a ligand and its bioactivity. In this commentary we note that, where relevant but non-redundant features are binary, they inevitably lead to a classifier capable of capturing only a linear relationship between structural features and activity. If, instead, we were to use relevant but non-redundant real-valued features, the resulting predictive model would be capable of describing a non-linear structure-activity relationship. Hence, we suggest that real-valued features, where available, are to be preferred in this scenario.

4.
Pattern Recognit Lett ; 63: 30-35, 2015 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-26435560

RESUMO

Pattern classification methods assign an object to one of several predefined classes/categories based on features extracted from observed attributes of the object (pattern). When L discriminatory features for the pattern can be accurately determined, the pattern classification problem presents no difficulty. However, precise identification of the relevant features for a classification algorithm (classifier) to be able to categorize real world patterns without errors is generally infeasible. In this case, the pattern classification problem is often cast as devising a classifier that minimizes the misclassification rate. One way of doing this is to consider both the pattern attributes and its class label as random variables, estimate the posterior class probabilities for a given pattern and then assign the pattern to the class/category for which the posterior class probability value estimated is maximum. More often than not, the form of the posterior class probabilities is unknown. The so-called Parzen Window approach is widely employed to estimate class-conditional probability (class-specific probability) densities for a given pattern. These probability densities can then be utilized to estimate the appropriate posterior class probabilities for that pattern. However, the Parzen Window scheme can become computationally impractical when the size of the training dataset is in the tens of thousands and L is also large (a few hundred or more). Over the years, various schemes have been suggested to ameliorate the computational drawback of the Parzen Window approach, but the problem still remains outstanding and unresolved. In this paper, we revisit the Parzen Window technique and introduce a novel approach that may circumvent the aforementioned computational bottleneck. The current paper presents the mathematical aspect of our idea. Practical realizations of the proposed scheme will be given elsewhere.

5.
J Cheminform ; 7: 27, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26075027

RESUMO

BACKGROUND: In a recent paper, Mussa, Mitchell and Glen (MMG) have mathematically demonstrated that the "Laplacian Corrected Modified Naïve Bayes" (LCMNB) algorithm can be viewed as a variant of the so-called Standard Naïve Bayes (SNB) scheme, whereby the role played by absence of compound features in classifying/assigning the compound to its appropriate class is ignored. MMG have also proffered guidelines regarding the conditions under which this omission may hold. Utilising three data sets, the present paper examines the validity of these guidelines in practice. The paper also extends MMG's work and introduces a new version of the SNB classifier: "Tapered Naïve Bayes" (TNB). TNB does not discard the role of absence of a feature out of hand, nor does it fully consider its role. Hence, TNB encapsulates both SNB and LCMNB. RESULTS: LCMNB, SNB and TNB performed differently on classifying 4,658, 5,031 and 1,149 ligands (all chosen from the ChEMBL Database) distributed over 31 enzymes, 23 membrane receptors, and one ion-channel, four transporters and one transcription factor as their target proteins. When the number of features utilised was equal to or smaller than the "optimal" number of features for a given data set, SNB classifiers systematically gave better classification results than those yielded by LCMNB classifiers. The opposite was true when the number of features employed was markedly larger than the "optimal" number of features for this data set. Nonetheless, these LCMNB performances were worse than the classification performance achieved by SNB when the "optimal" number of features for the data set was utilised. TNB classifiers systematically outperformed both SNB and LCMNB classifiers. CONCLUSIONS: The classification results obtained in this study concur with the mathematical based guidelines given in MMG's paper-that is, ignoring the role of absence of a feature out of hand does not necessarily improve classification performance of the SNB approach; if anything, it could make the performance of the SNB method worse. The results obtained also lend support to the rationale, on which the TNB algorithm rests: handled judiciously, taking into account absence of features can enhance (not impair) the discriminatory classification power of the SNB approach.

6.
J Cheminform ; 7: 24, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26064191

RESUMO

BACKGROUND: According to Cobanoglu et al., it is now widely acknowledged that the single target paradigm (one protein/target, one disease, one drug) that has been the dominant premise in drug development in the recent past is untenable. More often than not, a drug-like compound (ligand) can be promiscuous - it can interact with more than one target protein. In recent years, in in silico target prediction methods the promiscuity issue has generally been approached computationally in three main ways: ligand-based methods; target-protein-based methods; and integrative schemes. In this study we confine attention to ligand-based target prediction machine learning approaches, commonly referred to as target-fishing. The target-fishing approaches that are currently ubiquitous in cheminformatics literature can be essentially viewed as single-label multi-classification schemes; these approaches inherently bank on the single target paradigm assumption that a ligand can zero in on one single target. In order to address the ligand promiscuity issue, one might be able to cast target-fishing as a multi-label multi-class classification problem. For illustrative and comparison purposes, single-label and multi-label Naïve Bayes classification models (denoted here by SMM and MMM, respectively) for target-fishing were implemented. The models were constructed and tested on 65,587 compounds/ligands and 308 targets retrieved from the ChEMBL17 database. RESULTS: On classifying 3,332 test multi-label (promiscuous) compounds, SMM and MMM performed differently. At the 0.05 significance level, a Wilcoxon signed rank test performed on the paired target predictions yielded by SMM and MMM for the test ligands gave a p-value < 5.1 × 10(-94) and test statistics value of 6.8 × 10(5), in favour of MMM. The two models performed differently when tested on four datasets comprising single-label (non-promiscuous) compounds; McNemar's test yielded χ (2) values of 15.657, 16.500 and 16.405 (with corresponding p-values of 7.594 × 10(-05), 4.865 × 10(-05) and 5.115 × 10(-05)), respectively, for three test sets, in favour of MMM. The models performed similarly on the fourth set. CONCLUSIONS: The target prediction results obtained in this study indicate that multi-label multi-class approaches are more apt than the ubiquitous single-label multi-class schemes when it comes to the application of ligand-based classifiers to target-fishing.

7.
J Cheminform ; 6: 29, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24959208

RESUMO

BACKGROUND: The prediction of sites and products of metabolism in xenobiotic compounds is key to the development of new chemical entities, where screening potential metabolites for toxicity or unwanted side-effects is of crucial importance. In this work 2D topological fingerprints are used to encode atomic sites and three probabilistic machine learning methods are applied: Parzen-Rosenblatt Window (PRW), Naive Bayesian (NB) and a novel approach called RASCAL (Random Attribute Subsampling Classification ALgorithm). These are implemented by randomly subsampling descriptor space to alleviate the problem often suffered by data mining methods of having to exactly match fingerprints, and in the case of PRW by measuring a distance between feature vectors rather than exact matching. The classifiers have been implemented in CUDA/C++ to exploit the parallel architecture of graphical processing units (GPUs) and is freely available in a public repository. RESULTS: It is shown that for PRW a SoM (Site of Metabolism) is identified in the top two predictions for 85%, 91% and 88% of the CYP 3A4, 2D6 and 2C9 data sets respectively, with RASCAL giving similar performance of 83%, 91% and 88%, respectively. These results put PRW and RASCAL performance ahead of NB which gave a much lower classification performance of 51%, 73% and 74%, respectively. CONCLUSIONS: 2D topological fingerprints calculated to a bond depth of 4-6 contain sufficient information to allow the identification of SoMs using classifiers based on relatively small data sets. Thus, the machine learning methods outlined in this paper are conceptually simpler and more efficient than other methods tested and the use of simple topological descriptors derived from 2D structure give results competitive with other approaches using more expensive quantum chemical descriptors. The descriptor space subsampling approach and ensemble methodology allow the methods to be applied to molecules more distant from the training data where data mining would be more likely to fail due to the lack of common fingerprints. The RASCAL algorithm is shown to give equivalent classification performance to PRW but at lower computational expense allowing it to be applied more efficiently in the ensemble scheme.

8.
J Cheminform ; 5(1): 37, 2013 Aug 23.
Artigo em Inglês | MEDLINE | ID: mdl-23968281

RESUMO

BACKGROUND: In the last decade the standard Naive Bayes (SNB) algorithm has been widely employed in multi-class classification problems in cheminformatics. This popularity is mainly due to the fact that the algorithm is simple to implement and in many cases yields respectable classification results. Using clever heuristic arguments "anchored" by insightful cheminformatics knowledge, Xia et al. have simplified the SNB algorithm further and termed it the Laplacian Corrected Modified Naive Bayes (LCMNB) approach, which has been widely used in cheminformatics since its publication.In this note we mathematically illustrate the conditions under which Xia et al.'s simplification holds. It is our hope that this clarification could help Naive Bayes practitioners in deciding when it is appropriate to employ the LCMNB algorithm to classify large chemical datasets. RESULTS: A general formulation that subsumes the simplified Naive Bayes version is presented. Unlike the widely used NB method, the Standard Naive Bayes description presented in this work is discriminative (not generative) in nature, which may lead to possible further applications of the SNB method. CONCLUSIONS: Starting from a standard Naive Bayes (SNB) algorithm, we have derived mathematically the relationship between Xia et al.'s ingenious, but heuristic algorithm, and the SNB approach. We have also demonstrated the conditions under which Xia et al.'s crucial assumptions hold. We therefore hope that the new insight and recommendations provided can be found useful by the cheminformatics community.

9.
J Chem Inf Model ; 53(8): 1957-66, 2013 Aug 26.
Artigo em Inglês | MEDLINE | ID: mdl-23829430

RESUMO

In this study, two probabilistic machine-learning algorithms were compared for in silico target prediction of bioactive molecules, namely the well-established Laplacian-modified Naïve Bayes classifier (NB) and the more recently introduced (to Cheminformatics) Parzen-Rosenblatt Window. Both classifiers were trained in conjunction with circular fingerprints on a large data set of bioactive compounds extracted from ChEMBL, covering 894 human protein targets with more than 155,000 ligand-protein pairs. This data set is also provided as a benchmark data set for future target prediction methods due to its size as well as the number of bioactivity classes it contains. In addition to evaluating the methods, different performance measures were explored. This is not as straightforward as in binary classification settings, due to the number of classes, the possibility of multiple class memberships, and the need to translate model scores into "yes/no" predictions for assessing model performance. Both algorithms achieved a recall of correct targets that exceeds 80% in the top 1% of predictions. Performance depends significantly on the underlying diversity and size of a given class of bioactive compounds, with small classes and low structural similarity affecting both algorithms to different degrees. When tested on an external test set extracted from WOMBAT covering more than 500 targets by excluding all compounds with Tanimoto similarity above 0.8 to compounds from the ChEMBL data set, the current methodologies achieved a recall of 63.3% and 66.6% among the top 1% for Naïve Bayes and Parzen-Rosenblatt Window, respectively. While those numbers seem to indicate lower performance, they are also more realistic for settings where protein targets need to be established for novel chemical substances.


Assuntos
Algoritmos , Biologia Computacional/métodos , Teorema de Bayes , Benchmarking , Descoberta de Drogas , Humanos , Ligantes , Ligação Proteica , Proteínas/metabolismo , Reprodutibilidade dos Testes
10.
J Chem Inf Model ; 52(10): 2494-500, 2012 Oct 22.
Artigo em Inglês | MEDLINE | ID: mdl-22900941

RESUMO

A plethora of articles on naive Bayes classifiers, where the chemical compounds to be classified are represented by binary-valued (absent or present type) descriptors, have appeared in the cheminformatics literature over the past decade. The principal goal of this paper is to describe how a naive Bayes classifier based on binary descriptors (NBCBBD) can be employed as a feature selector in an efficient manner suitable for cheminformatics. In the process, we point out a fact well documented in other disciplines that NBCBBD is a linear classifier and is therefore intrinsically suboptimal for classifying compounds that are nonlinearly separable in their binary descriptor space. We investigate the performance of the proposed algorithm on classifying a subset of the MDDR data set, a standard molecular benchmark data set, into active and inactive compounds.


Assuntos
Algoritmos , Produtos Biológicos/química , Inibidores Enzimáticos/química , Teorema de Bayes , Produtos Biológicos/farmacologia , Inibidores Enzimáticos/farmacologia , Humanos , Informática , Modelos Moleculares , Relação Estrutura-Atividade
11.
J Cheminform ; 4: 2, 2012 Jan 26.
Artigo em Inglês | MEDLINE | ID: mdl-22281160

RESUMO

The mechanism of phospholipidosis is still not well understood. Numerous different mechanisms have been proposed, varying from direct inhibition of the breakdown of phospholipids to the binding of a drug compound to the phospholipid, preventing breakdown. We have used a probabilistic method, the Parzen-Rosenblatt Window approach, to build a model from the ChEMBL dataset which can predict from a compound's structure both its primary pharmaceutical target and other targets with which it forms off-target, usually weaker, interactions. Using a small dataset of 182 phospholipidosis-inducing and non-inducing compounds, we predict their off-target activity against targets which could relate to phospholipidosis as a side-effect of a drug. We link these targets to specific mechanisms of inducing this lysosomal build-up of phospholipids in cells. Thus, we show that the induction of phospholipidosis is likely to occur by separate mechanisms when triggered by different cationic amphiphilic drugs. We find that both inhibition of phospholipase activity and enhanced cholesterol biosynthesis are likely to be important mechanisms. Furthermore, we provide evidence suggesting four specific protein targets. Sphingomyelin phosphodiesterase, phospholipase A2 and lysosomal phospholipase A1 are shown to be likely targets for the induction of phospholipidosis by inhibition of phospholipase activity, while lanosterol synthase is predicted to be associated with phospholipidosis being induced by enhanced cholesterol biosynthesis. This analysis provides the impetus for further experimental tests of these hypotheses.

12.
Mol Inform ; 31(9): 679-85, 2012 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-27477818

RESUMO

The US Food and Drug Administration (FDA) require in vitro human ether-a-go-go related (hERG) ion channel affinity tests for all drug candidates prior to clinical trials. In this study, probabilistic-based methods were employed to develop prediction models on hERG inhibition prediction, which are different from traditional QSAR models that are mainly based on supervised 'hard point' (HP) classification approaches giving 'yes/no' answers. The obtained models can 'ascertain' whether or not a given set of compounds can block hERG ion channels. The results presented indicate that the proposed probabilistic-based method can be a valuable tool for ranking compounds with respect to their potential cardio-toxicity and will be promising for other toxic property predictions.

13.
J Chem Inf Model ; 51(7): 1539-44, 2011 Jul 25.
Artigo em Inglês | MEDLINE | ID: mdl-21696153

RESUMO

The central idea of supervised classification in chemoinformatics is to design a classifying algorithm that accurately assigns a new molecule to one of a set of predefined classes. Tipping has devised a classifying scheme, the Relevance Vector Machine (RVM), which is in terms of sparsity equivalent to the Support Vector Machine (SVM). However, unlike SVM classifiers, the RVM classifiers are probabilistic in nature, which is crucial in the field of decision making and risk taking. In this work, we investigate the performance of RVM binary classifiers on classifying a subset of the MDDR data set, a standard molecular benchmark data set, into active and inactive compounds. Additionally, we present results that compare the performance of SVM and RVM binary classifiers.


Assuntos
Algoritmos , Biologia Computacional/métodos , Inibidores Enzimáticos/química , Inibidores Enzimáticos/classificação , Modelos Biológicos , Descoberta de Drogas
14.
J Chem Inf Model ; 51(1): 4-14, 2011 Jan 24.
Artigo em Inglês | MEDLINE | ID: mdl-21155612

RESUMO

In recent years classifiers generated with kernel-based methods, such as support vector machines (SVM), Gaussian processes (GP), regularization networks (RN), and binary kernel discrimination (BKD) have been very popular in chemoinformatics data analysis. Aizerman et al. were the first to introduce the notion of employing kernel-based classifiers in the area of pattern recognition. Their original scheme, which they termed the potential function method (PFM), can basically be viewed as a kernel-based perceptron procedure and arguably subsumes the modern kernel-based algorithms. PFM can be computationally much cheaper than modern kernel-based classifiers; furthermore, PFM is far simpler conceptually and easier to implement than the SVM, GP, and RN algorithms. Unfortunately, unlike, e.g., SVM, GP, and RN, PFM is not endowed with both theoretical guarantees and practical strategies to safeguard it against generating overfitting classifiers. This is, in our opinion, the reason why this simple and elegant method has not been taken up in chemoinformatics. In this paper we empirically address this drawback: while maintaining its simplicity, we demonstrate that PFM combined with a simple regularization scheme may yield binary classifiers that can be, in practice, as efficient as classifiers obtained by employing state-of-the-art kernel-based methods. Using a realistic classification example, the augmented PFM was used to generate binary classifiers. Using a large chemical data set, the generalization ability of PFM classifiers were then compared with the prediction power of Laplacian-modified naive Bayesian (LmNB), Winnow (WN), and SVM classifiers.


Assuntos
Química/métodos , Classificação/métodos , Informática/métodos , Tomada de Decisões , Análise Discriminante , Dinâmica não Linear
15.
IEEE Trans Neural Netw ; 21(4): 680-6, 2010 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-20194056

RESUMO

Generally, training neural networks with the global extended Kalman filter (GEKF) technique exhibits excellent performance at the expense of a large increase in computational costs which can become prohibitive even for networks of moderate size. This drawback was previously addressed by heuristically decoupling some of the weights of the networks. Inevitably, ad hoc decoupling leads to a degradation in the quality (accuracy) of the resultant neural networks. In this paper, we present an algorithm that emulates the accuracy of GEKF, but avoids the construction of the state covariance matrix-the source of the computational bottleneck in GEKF. In the proposed algorithm, all the synaptic weights remain connected while the amount of computer memory required is similar to (or cheaper than) the memory requirements in the decoupling schemes. We also point out that the new method can be extended to derivative-free nonlinear Kalman filters, such as the unscented Kalman filter and ensemble Kalman filters.


Assuntos
Algoritmos , Memória/fisiologia , Redes Neurais de Computação , Simulação por Computador , Filtração , Humanos , Análise de Regressão
16.
J Biomol Screen ; 10(7): 658-66, 2005 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-16170051

RESUMO

A fragment-based similarity searching method, MOLPRINT 2D, was employed for virtual screening of Escherichia coli dihydrofolate reductase inhibitors. Using the original training set of 50,000 compounds, only marginal enrichment factors (between 1 and 3) could be achieved on the test library. The active structures contained in the training and test libraries represented different types of "chemistry", that is, different substructural features associated with activity. Training and test sets were pooled in a 2nd step and randomly split into training and test of equal size, with the objective of smoothing out the different chemical characteristics of both libraries. In a 10-fold cross-validation study on the new training and test sets, typically 10-fold enrichment could be found in the first 96 positions, 4-fold enrichment in the first 384 positions, and 3-fold enrichment in the first 1536 positions, corresponding to 6, 10, and 28 hits, respectively (out of a total of 307; activity defined as average residual activity of less than 80%). The conclusions are 2-fold. On one hand, the exact fragment-matching similarity searching method employed here is not capable of finding completely novel hit structures. On the other hand, this study emphasizes the requirement for a comparable distribution of chemical features of the training and test sets. MOLPRINT 2D is freely downloadable from http://www.cheminformatics.org.


Assuntos
Biologia Computacional/métodos , Antagonistas do Ácido Fólico/química , Tetra-Hidrofolato Desidrogenase/química , Teorema de Bayes , Escherichia coli/enzimologia , Antagonistas do Ácido Fólico/metabolismo , Relação Estrutura-Atividade , Tetra-Hidrofolato Desidrogenase/metabolismo
17.
J Med Chem ; 47(26): 6569-83, 2004 Dec 16.
Artigo em Inglês | MEDLINE | ID: mdl-15588092

RESUMO

A novel method (MOLPRINT 3D) for virtual screening and the elucidation of ligand-receptor binding patterns is introduced that is based on environments of molecular surface points. The descriptor uses points relative to the molecular coordinates, thus it is translationally and rotationally invariant. Due to its local nature, conformational variations cause only minor changes in the descriptor. If surface point environments are combined with the Tanimoto coefficient and applied to virtual screening, they achieve retrieval rates comparable to that of two-dimensional (2D) fingerprints. The identification of active structures with minimal 2D similarity ("scaffold hopping") is facilitated. In combination with information-gain-based feature selection and a naive Bayesian classifier, information from multiple molecules can be combined and classification performance can be improved. Selected features are consistent with experimentally determined binding patterns. Examples are given for angiotensin-converting enzyme inhibitors, 3-hydroxy-3-methylglutaryl-coenzyme A reductase inhibitors, and thromboxane A2 antagonists.


Assuntos
Ligantes , Ligação Proteica , Relação Quantitativa Estrutura-Atividade , Inibidores da Enzima Conversora de Angiotensina/química , Teorema de Bayes , Corticosterona/química , Inibidores de Hidroximetilglutaril-CoA Redutases/química , Modelos Moleculares , Conformação Molecular , Fator de Ativação de Plaquetas/antagonistas & inibidores , Fator de Ativação de Plaquetas/química , Receptores 5-HT3 de Serotonina/química , Antagonistas do Receptor 5-HT3 de Serotonina , Tromboxano A2/antagonistas & inibidores , Tromboxano A2/química
18.
J Chem Inf Comput Sci ; 44(5): 1708-18, 2004.
Artigo em Inglês | MEDLINE | ID: mdl-15446830

RESUMO

A molecular similarity searching technique based on atom environments, information-gain-based feature selection, and the naive Bayesian classifier has been applied to a series of diverse datasets and its performance compared to those of alternative searching methods. Atom environments are count vectors of heavy atoms present at a topological distance from each heavy atom of a molecular structure. In this application, using a recently published dataset of more than 100000 molecules from the MDL Drug Data Report database, the atom environment approach appears to outperform fusion of ranking scores as well as binary kernel discrimination, which are both used in combination with Unity fingerprints. Overall retrieval rates among the top 5% of the sorted library are nearly 10% better (more than 14% better in relative numbers) than those of the second best method, Unity fingerprints and binary kernel discrimination. In 10 out of 11 sets of active compounds the combination of atom environments and the naive Bayesian classifier appears to be the superior method, while in the remaining dataset, data fusion and binary kernel discrimination in combination with Unity fingerprints is the method of choice. Binary kernel discrimination in combination with Unity fingerprints generally comes second in performance overall. The difference in performance can largely be attributed to the different molecular descriptors used. Atom environments outperform Unity fingerprints by a large margin if the combination of these descriptors with the Tanimoto coefficient is compared. The naive Bayesian classifier in combination with information-gain-based feature selection and selection of a sensible number of features performs about as well as binary kernel discrimination in experiments where these classification methods are compared. When used on a monoaminooxidase dataset, atom environments and the naive Bayesian classifier perform as well as binary kernel discrimination in the case of a 50/50 split of training and test compounds. In the case of sparse training data, binary kernel discrimination is found to be superior on this particular dataset. On a third dataset, the atom environment descriptor shows higher retrieval rates than other 2D fingerprints tested here when used in combination with the Tanimoto similarity coefficient. Feature selection is shown to be a crucial step in determining the performance of the algorithm. The representation of molecules by atom environments is found to be more effective than Unity fingerprints for the type of biological receptor similarity calculations examined here. Combining information prior to scoring and including information about inactive compounds, as in the Bayesian classifier and binary kernel discrimination, is found to be superior to posterior data fusion (in the datasets tested here).

19.
J Chem Inf Comput Sci ; 44(1): 170-8, 2004.
Artigo em Inglês | MEDLINE | ID: mdl-14741025

RESUMO

A novel technique for similarity searching is introduced. Molecules are represented by atom environments, which are fed into an information-gain-based feature selection. A naïve Bayesian classifier is then employed for compound classification. The new method is tested by its ability to retrieve five sets of active molecules seeded in the MDL Drug Data Report (MDDR). In comparison experiments, the algorithm outperforms all current retrieval methods assessed here using two- and three-dimensional descriptors and offers insight into the significance of structural components for binding.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA