Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 22
Filtrar
1.
Philos Trans A Math Phys Eng Sci ; 381(2251): 20220051, 2023 Jul 24.
Artigo em Inglês | MEDLINE | ID: mdl-37271172
2.
Mol Inform ; 34(9): 615-625, 2015 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-26583052

RESUMO

The use of virtual screening has become increasingly central to the drug development pipeline, with ligand-based virtual screening used to screen databases of compounds to predict their bioactivity against a target. These databases can only represent a small fraction of chemical space, and this paper describes a method of exploring synthetic space by applying virtual reactions to promising compounds within a database, and generating focussed libraries of predicted derivatives. A ligand-based virtual screening tool Investigational Novel Drug Discovery by Example (INDDEx) is used as the basis for a system of virtual reactions. The use of virtual reactions is estimated to open up a potential space of 1.21×1012 potential molecules. A de novo design algorithm known as Partial Logical-Rule Reactant Selection (PLoRRS) is introduced and incorporated into the INDDEx methodology. PLoRRS uses logical rules from the INDDEx model to select reactants for the de novo generation of potentially active products. The PLoRRS method is found to increase significantly the likelihood of retrieving molecules similar to known actives with a p-value of 0.016. Case studies demonstrate that the virtual reactions produce molecules highly similar to known actives, including known blockbuster drugs.

3.
J Mol Biol ; 425(1): 186-97, 2013 Jan 09.
Artigo em Inglês | MEDLINE | ID: mdl-23103756

RESUMO

Increasingly, experimental data on biological systems are obtained from several sources and computational approaches are required to integrate this information and derive models for the function of the system. Here, we demonstrate the power of a logic-based machine learning approach to propose hypotheses for gene function integrating information from two diverse experimental approaches. Specifically, we use inductive logic programming that automatically proposes hypotheses explaining the empirical data with respect to logically encoded background knowledge. We study the capsular polysaccharide biosynthetic pathway of the major human gastrointestinal pathogen Campylobacter jejuni. We consider several key steps in the formation of capsular polysaccharide consisting of 15 genes of which 8 have assigned function, and we explore the extent to which functions can be hypothesised for the remaining 7. Two sources of experimental data provide the information for learning-the results of knockout experiments on the genes involved in capsule formation and the absence/presence of capsule genes in a multitude of strains of different serotypes. The machine learning uses the pathway structure as background knowledge. We propose assignments of specific genes to five previously unassigned reaction steps. For four of these steps, there was an unambiguous optimal assignment of gene to reaction, and to the fifth, there were three candidate genes. Several of these assignments were consistent with additional experimental results. We therefore show that the logic-based methodology provides a robust strategy to integrate results from different experimental approaches and propose hypotheses for the behaviour of a biological system.


Assuntos
Inteligência Artificial , Campylobacter jejuni/metabolismo , Lógica , Modelos Biológicos , Polissacarídeos Bacterianos/genética , Biologia de Sistemas/métodos , Cápsulas Bacterianas/genética , Cápsulas Bacterianas/metabolismo , Vias Biossintéticas/genética , Campylobacter jejuni/genética , Técnicas de Inativação de Genes , Genes Bacterianos/genética , Genes Bacterianos/fisiologia , Glicômica , Metabolômica , Anotação de Sequência Molecular , Mutação , Análise de Sequência com Séries de Oligonucleotídeos , Fenótipo , Polissacarídeos Bacterianos/metabolismo
4.
BMC Bioinformatics ; 13: 162, 2012 Jul 11.
Artigo em Inglês | MEDLINE | ID: mdl-22783946

RESUMO

BACKGROUND: There is a need for automated methods to learn general features of the interactions of a ligand class with its diverse set of protein receptors. An appropriate machine learning approach is Inductive Logic Programming (ILP), which automatically generates comprehensible rules in addition to prediction. The development of ILP systems which can learn rules of the complexity required for studies on protein structure remains a challenge. In this work we use a new ILP system, ProGolem, and demonstrate its performance on learning features of hexose-protein interactions. RESULTS: The rules induced by ProGolem detect interactions mediated by aromatics and by planar-polar residues, in addition to less common features such as the aromatic sandwich. The rules also reveal a previously unreported dependency for residues cys and leu. They also specify interactions involving aromatic and hydrogen bonding residues. This paper shows that Inductive Logic Programming implemented in ProGolem can derive rules giving structural features of protein/ligand interactions. Several of these rules are consistent with descriptions in the literature. CONCLUSIONS: In addition to confirming literature results, ProGolem's model has a 10-fold cross-validated predictive accuracy that is superior, at the 95% confidence level, to another ILP system previously used to study protein/hexose interactions and is comparable with state-of-the-art statistical learners.


Assuntos
Inteligência Artificial , Hexoses/química , Ligação Proteica , Hexoses/metabolismo , Ligantes , Proteínas/química , Proteínas/metabolismo
5.
J Phys Chem B ; 116(23): 6732-9, 2012 Jun 14.
Artigo em Inglês | MEDLINE | ID: mdl-22380596

RESUMO

The Investigational Novel Drug Discovery by Example (INDDEx) package has been developed to find active compounds by linking activity to chemical substructure and to guide the process of further drug development. INDDEx is a machine-learning technique, based on forming qualitative logical rules about substructural features of active molecules, weighting the rules to form a quantitative model, and then using the model to screen a molecular database. INDDEx is shown to be able to learn from multiple active compounds and to be useful for scaffold-hopping when performing virtual screening, giving high retrieval rates even when learning from a small number of compounds. Across the data sets tested, at 1% of the data, INDDEx was found to have average enrichment factors of 69.2, 82.7, and 90.4 when learning from 2, 4, and 8 active ligands, respectively. At 0.1% of the data, INDDEx had average enrichment factors of 492, 631, and 707 when learning from 2, 4, and 8 active ligands, respectively. Excluding all ligands with more than 0.5 Tanimoto Maximum Common Substructure, INDDEx had average enrichment factors at 1% of 52.3, 63.6, and 66.9 when learning from 2, 4, and 8 active ligands, respectively. The performance of INDDEx is compared with that of eHiTS LASSO, PharmaGist, and DOCK.


Assuntos
Bases de Dados Factuais , Descoberta de Drogas , Ensaios de Triagem em Larga Escala , Ligantes , Relação Quantitativa Estrutura-Atividade
6.
J Integr Bioinform ; 8(2): 156, 2011 Jun 27.
Artigo em Inglês | MEDLINE | ID: mdl-21705808

RESUMO

The construction of integrated datasets from potentially hundreds of sources with bespoke formats, and their subsequent visualization and analysis, is a recurring challenge in systems biology. We present WIBL, a visualization and model development environment initially geared towards logic-based modelling of biological systems using integrated datasets. WIBL combines data integration, visualisation and modelling in a single portal-based workbench providing a comprehensive solution for interdisciplinary systems biology projects.


Assuntos
Software , Biologia de Sistemas/métodos , Modelos Biológicos
7.
PLoS One ; 6(12): e29028, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-22242111

RESUMO

Networks of trophic links (food webs) are used to describe and understand mechanistic routes for translocation of energy (biomass) between species. However, a relatively low proportion of ecosystems have been studied using food web approaches due to difficulties in making observations on large numbers of species. In this paper we demonstrate that Machine Learning of food webs, using a logic-based approach called A/ILP, can generate plausible and testable food webs from field sample data. Our example data come from a national-scale Vortis suction sampling of invertebrates from arable fields in Great Britain. We found that 45 invertebrate species or taxa, representing approximately 25% of the sample and about 74% of the invertebrate individuals included in the learning, were hypothesized to be linked. As might be expected, detritivore Collembola were consistently the most important prey. Generalist and omnivorous carabid beetles were hypothesized to be the dominant predators of the system. We were, however, surprised by the importance of carabid larvae suggested by the machine learning as predators of a wide variety of prey. High probability links were hypothesized for widespread, potentially destabilizing, intra-guild predation; predictions that could be experimentally tested. Many of the high probability links in the model have already been observed or suggested for this system, supporting our contention that A/ILP learning can produce plausible food webs from sample data, independent of our preconceptions about "who eats whom." Well-characterised links in the literature correspond with links ascribed with high probability through A/ILP. We believe that this very general Machine Learning approach has great power and could be used to extend and test our current theories of agricultural ecosystem dynamics and function. In particular, we believe it could be used to support the development of a wider theory of ecosystem responses to environmental change.


Assuntos
Inteligência Artificial , Cadeia Alimentar , Lógica , Estatística como Assunto , Animais , Automação , Modelos Biológicos , Comportamento Predatório , Especificidade da Espécie
8.
Health Promot J Austr ; 21(3): 189-95, 2010 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-21118065

RESUMO

ISSUE ADDRESSED: enhancing opportunities for all older people to be physically and mentally active is an imperative in our ageing society. Lessons learned from the use of the Nintendo Wii within Queensland agedcare and disability services were assembled through eliciting staff perceptions regarding the usefulness of Wii technology within their centres. METHODS: telephone interviews were conducted with direct care staff in 53 centres that had been using the Wii technology for at least three months. Content analysis of interview data identified the major response patterns raised by staff. RESULTS: staff noted that Wii activities were easy to master for more able clients and that there was minimal risk to clients. Staff reported that these activities provided health promoting physical benefits (mobility, range of motion, dexterity, coordination, distraction from pain) and psychosocial gains (social engagement, self-esteem, mastery, ability to pacify challenging behaviours) and were a useful adjunct to other care practices within these aged-care and disabilities services. CONCLUSIONS: staff believed that Wii activities provided purposeful and meaningful opportunities to promote wellbeing for aged and disabled clients within an aged-care and disability service. However, Wii activities were less successful with clients who had significant cognitive and/or physical disabilities.


Assuntos
Pessoas com Deficiência/reabilitação , Exercício Físico/fisiologia , Promoção da Saúde/métodos , Serviços de Saúde para Idosos/organização & administração , Jogos de Vídeo , Adulto , Idoso , Pessoas com Deficiência/psicologia , Exercício Físico/psicologia , Feminino , Humanos , Relações Interpessoais , Masculino , Pessoa de Meia-Idade , Percepção , Queensland , Autoimagem
9.
Biochem Soc Trans ; 38(5): 1290-3, 2010 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-20863301

RESUMO

Bacteria produce an array of glycan-based structures including capsules, lipo-oligosaccharide and glycosylated proteins, which are invariably cell-surface-located. For pathogenic bacteria, such structures are involved in diverse roles in the life cycle of the bacterium, including adhesion, colonization, avoidance of predation and interactions with the immune system. Compared with eukaryotes, bacteria produce huge combinatorial variations of glycan structures, which, coupled to the lack of genetic data, has previously hampered studies on bacterial glycans and their role in survival and pathogenesis. The advent of genomics in tandem with rapid technological improvements in MS analysis has opened a new era in bacterial glycomics. This has resulted in a rich source of novel glycan structures and new possibilities for glycoprospecting and glycoengineering. However, assigning genetic information in predicted glycan biosynthetic pathways to the overall structural information is complex. Bioinformatic analysis is required, linked to systematic mutagenesis and functional analysis of individual genes, often from diverse biosynthetic pathways. This must then be related back to structural analysis from MS or NMR spectroscopy. To aid in this process, systems level analysis of the multiple datasets can be used to make predictions of gene function that can then be confirmed experimentally. The present paper exemplifies these advances with reference to the major gastrointestinal pathogen Campylobacter jejuni.


Assuntos
Bactérias/metabolismo , Biologia Computacional , Glicômica , Proteínas de Bactérias/metabolismo , Campylobacter jejuni/metabolismo , Polissacarídeos Bacterianos/metabolismo
10.
Mol Inform ; 29(8-9): 655-64, 2010 Sep 17.
Artigo em Inglês | MEDLINE | ID: mdl-27463459

RESUMO

Toxicity prediction is essential for drug design and development of effective therapeutics. In this paper we present an in silico strategy, to identify the mode of action of toxic compounds, that is based on the use of a novel logic based kernel method. The technique uses support vector machines in conjunction with the kernels constructed from first order rules induced by an Inductive Logic Programming system. It constructs multi-class models by using a divide and conquer reduction strategy that splits multi-classes into binary groups and solves each individual problem recursively hence generating an underlying decision list structure. In order to evaluate the effectiveness of the approach for chemoinformatics problems like predictive toxicology, we apply it to toxicity classification in aquatic systems. The method is used to identify and classify 442 compounds with respect to the mode of action. The experimental results show that the technique successfully classifies toxic compounds and can be useful in assessing environmental risks. Experimental comparison of the performance of the proposed multi-class scheme with the standard multi-class Inductive Logic Programming algorithm and multi-class Support Vector Machine yields statistically significant results and demonstrates the potential power and benefits of the approach in identifying compounds of various toxic mechanisms.

11.
Protein Eng Des Sel ; 22(9): 561-7, 2009 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-19574295

RESUMO

Structural genomics initiatives are rapidly generating vast numbers of protein structures. Comparative modelling is also capable of producing accurate structural models for many protein sequences. However, for many of the known structures, functions are not yet determined, and in many modelling tasks, an accurate structural model does not necessarily tell us about function. Thus, there is a pressing need for high-throughput methods for determining function from structure. The spatial arrangement of key amino acids in a folded protein, on the surface or buried in clefts, is often the determinants of its biological function. A central aim of molecular biology is to understand the relationship between such substructures or surfaces and biological function, leading both to function prediction and to function design. We present a new general method for discovering the features of binding pockets that confer specificity for particular ligands. Using a recently developed machine-learning technique which couples the rule-discovery approach of inductive logic programming with the statistical learning power of support vector machines, we are able to discriminate, with high precision (90%) and recall (86%) between pockets that bind FAD and those that bind NAD on a large benchmark set given only the geometry and composition of the backbone of the binding pocket without the use of docking. In addition, we learn rules governing this specificity which can feed into protein functional design protocols. An analysis of the rules found suggests that key features of the binding pocket may be tied to conformational freedom in the ligand. The representation is sufficiently general to be applicable to any discriminatory binding problem. All programs and data sets are freely available to non-commercial users at http://www.sbg.bio.ic.ac.uk/svilp_ligand/.


Assuntos
Inteligência Artificial , Engenharia de Proteínas/métodos , Proteínas/química , Motivos de Aminoácidos , Bases de Dados de Proteínas , Flavina-Adenina Dinucleotídeo/química , Flavina-Adenina Dinucleotídeo/metabolismo , Ligantes , Modelos Moleculares , NAD/química , NAD/metabolismo , Ligação Proteica , Conformação Proteica , Proteínas/metabolismo , Reprodutibilidade dos Testes , Relação Estrutura-Atividade , Especificidade por Substrato
12.
J Chem Inf Model ; 48(5): 949-57, 2008 May.
Artigo em Inglês | MEDLINE | ID: mdl-18457387

RESUMO

In chemoinformatics, searching for compounds which are structurally diverse and share a biological activity is called scaffold hopping. Scaffold hopping is important since it can be used to obtain alternative structures when the compound under development has unexpected side-effects. Pharmaceutical companies use scaffold hopping when they wish to circumvent prior patents for targets of interest. We propose a new method for scaffold hopping using inductive logic programming (ILP). ILP uses the observed spatial relationships between pharmacophore types in pretested active and inactive compounds and learns human-readable rules describing the diverse structures of active compounds. The ILP-based scaffold hopping method is compared to two previous algorithms (chemically advanced template search, CATS, and CATS3D) on 10 data sets with diverse scaffolds. The comparison shows that the ILP-based method is significantly better than random selection while the other two algorithms are not. In addition, the ILP-based method retrieves new active scaffolds which were not found by CATS and CATS3D. The results show that the ILP-based method is at least as good as the other methods in this study. ILP produces human-readable rules, which makes it possible to identify the three-dimensional features that lead to scaffold hopping. A minor variant of a rule learnt by ILP for scaffold hopping was subsequently found to cover an inhibitor identified by an independent study. This provides a successful result in a blind trial of the effectiveness of ILP to generate rules for scaffold hopping. We conclude that ILP provides a valuable new approach for scaffold hopping.


Assuntos
Inteligência Artificial , Biologia Computacional/métodos , Desenho de Fármacos
13.
J Proteome Res ; 7(2): 497-503, 2008 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-18179164

RESUMO

Multivariate metabolic profiles from biofluids such as urine and plasma are highly indicative of the biological fitness of complex organisms and can be captured analytically in order to derive top-down systems biology models. The application of currently available modeling approaches to human and animal metabolic pathway modeling is problematic because of multicompartmental cellular and tissue exchange of metabolites operating on many time scales. Hence, novel approaches are needed to analyze metabolic data obtained using minimally invasive sampling methods in order to reconstruct the patho-physiological modulations of metabolic interactions that are representative of whole system dynamics. Here, we show that spectroscopically derived metabolic data in experimental liver injury studies (induced by hydrazine and alpha-napthylisothiocyanate treatment) can be used to derive insightful probabilistic graphical models of metabolite dependencies, which we refer to as metabolic interactome maps. Using these, system level mechanistic information on homeostasis can be inferred, and the degree of reversibility of induced lesions can be related to variations in the metabolic network patterns. This approach has wider application in assessment of system level dysfunction in animal or human studies from noninvasive measurements.


Assuntos
Teorema de Bayes , Modelos Animais de Doenças , Hepatopatias/metabolismo , Biologia de Sistemas , Animais , Biologia Computacional , Hepatopatias/sangue , Hepatopatias/urina , Masculino , Ratos , Ratos Sprague-Dawley
14.
Mach Learn ; 73(1): 55-85, 2008 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-19888348

RESUMO

We revisit an application developed originally using abductive Inductive Logic Programming (ILP) for modeling inhibition in metabolic networks. The example data was derived from studies of the effects of toxins on rats using Nuclear Magnetic Resonance (NMR) time-trace analysis of their biofluids together with background knowledge representing a subset of the Kyoto Encyclopedia of Genes and Genomes (KEGG). We now apply two Probabilistic ILP (PILP) approaches - abductive Stochastic Logic Programs (SLPs) and PRogramming In Statistical modeling (PRISM) to the application. Both approaches support abductive learning and probability predictions. Abductive SLPs are a PILP framework that provides possible worlds semantics to SLPs through abduction. Instead of learning logic models from non-probabilistic examples as done in ILP, the PILP approach applied in this paper is based on a general technique for introducing probability labels within a standard scientific experimental setting involving control and treated data. Our results demonstrate that the PILP approach provides a way of learning probabilistic logic models from probabilistic examples, and the PILP models learned from probabilistic examples lead to a significant decrease in error accompanied by improved insight from the learned results compared with the PILP models learned from non-probabilistic examples.

15.
Proteins ; 69(4): 823-31, 2007 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-17910057

RESUMO

Despite the increased recent use of protein-ligand and protein-protein docking in the drug discovery process due to the increases in computational power, the difficulty of accurately ranking the binding affinities of a series of ligands or a series of proteins docked to a protein receptor remains largely unsolved. This problem is of major concern in lead optimization procedures and has lead to the development of scoring functions tailored to rank the binding affinities of a series of ligands to a specific system. However, such methods can take a long time to develop and their transferability to other systems remains open to question. Here we demonstrate that given a suitable amount of background information a new approach using support vector inductive logic programming (SVILP) can be used to produce system-specific scoring functions. Inductive logic programming (ILP) learns logic-based rules for a given dataset that can be used to describe properties of each member of the set in a qualitative manner. By combining ILP with support vector machine regression, a quantitative set of rules can be obtained. SVILP has previously been used in a biological context to examine datasets containing a series of singular molecular structures and properties. Here we describe the use of SVILP to produce binding affinity predictions of a series of ligands to a particular protein. We also for the first time examine the applicability of SVILP techniques to datasets consisting of protein-ligand complexes. Our results show that SVILP performs comparably with other state-of-the-art methods on five protein-ligand systems as judged by similar cross-validated squares of their correlation coefficients. A McNemar test comparing SVILP to CoMFA and CoMSIA across the five systems indicates our method to be significantly better on one occasion. The ability to graphically display and understand the SVILP-produced rules is demonstrated and this feature of ILP can be used to derive hypothesis for future ligand design in lead optimization procedures. The approach can readily be extended to evaluate the binding affinities of a series of protein-protein complexes.


Assuntos
Biologia Computacional/métodos , Simulação por Computador , Mapeamento de Interação de Proteínas , Proteínas/química , Proteômica/métodos , Algoritmos , Cristalografia por Raios X/métodos , Bases de Dados de Proteínas , Genômica , Ligantes , Conformação Molecular , Ligação Proteica , Conformação Proteica , Software
16.
J Mol Biol ; 369(4): 1126-39, 2007 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-17466331

RESUMO

The increasing interest in systems biology has resulted in extensive experimental data describing networks of interactions (or associations) between molecules in metabolism, protein-protein interactions and gene regulation. Comparative analysis of these networks is central to understanding biological systems. We report a novel method (PHUNKEE: Pairing subgrapHs Using NetworK Environment Equivalence) by which similar subgraphs in a pair of networks can be identified. Like other methods, PHUNKEE explicitly considers the graphical form of the data and allows for gaps. However, it is novel in that it includes information about the context of the subgraph within the adjacent network. We also explore a new approach to quantifying the statistical significance of matching subgraphs. We report similar subgraphs in metabolic pathways and in protein-protein interaction networks. The most similar metabolic subgraphs were generally found to occur in processes central to all life, such as purine, pyrimidine and amino acid metabolism. The most similar pairs of subgraphs found in the protein-protein interaction networks of Drosophila melanogaster and Saccharomyces cerevisiae also include central processes such as cell division but, interestingly, also include protein sub-networks involved in pre-mRNA processing. The inclusion of network context information in the comparison of protein interaction networks increased the number of similar subgraphs found consisting of proteins involved in the same functional process. This could have implications for the prediction of protein function.


Assuntos
Redes e Vias Metabólicas , Modelos Biológicos , Mapeamento de Interação de Proteínas , Software , Algoritmos , Animais , Simulação por Computador , Bases de Dados de Proteínas , Drosophila melanogaster/metabolismo , Saccharomyces cerevisiae/metabolismo
18.
J Chem Inf Model ; 47(3): 998-1006, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-17451225

RESUMO

There is a pressing need for accurate in silico methods to predict the toxicity of molecules that are being introduced into the environment or are being developed into new pharmaceuticals. Predictive toxicology is in the realm of structure activity relationships (SAR), and many approaches have been used to derive such SAR. Previous work has shown that inductive logic programming (ILP) is a powerful approach that circumvents several major difficulties, such as molecular superposition, faced by some other SAR methods. The ILP approach reasons with chemical substructures within a relational framework and yields chemically understandable rules. Here, we report a general new approach, support vector inductive logic programming (SVILP), which extends the essentially qualitative ILP-based SAR to quantitative modeling. First, ILP is used to learn rules, the predictions of which are then used within a novel kernel to derive a support-vector generalization model. For a highly heterogeneous dataset of 576 molecules with known fathead minnow fish toxicity, the cross-validated correlation coefficients (R2CV) from a chemical descriptor method (CHEM) and SVILP are 0.52 and 0.66, respectively. The ILP, CHEM, and SVILP approaches correctly predict 55, 58, and 73%, respectively, of toxic molecules. In a set of 165 unseen molecules, the R2 values from the commercial software TOPKAT and SVILP are 0.26 and 0.57, respectively. In all calculations, SVILP showed significant improvements in comparison with the other methods. The SVILP approach has a major advantage in that it uses ILP automatically and consistently to derive rules, mostly novel, describing fragments that are toxicity alerts. The SVILP is a general machine-learning approach and has the potential of tackling many problems relevant to chemoinformatics including in silico drug design.


Assuntos
Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Lógica , Toxicologia/métodos , Algoritmos , Inteligência Artificial , Técnicas de Química Combinatória , Desenho de Fármacos , Estrutura Molecular , Software , Relação Estrutura-Atividade
19.
J Comput Aided Mol Des ; 21(5): 269-80, 2007 May.
Artigo em Inglês | MEDLINE | ID: mdl-17387437

RESUMO

We investigate the classification performance of circular fingerprints in combination with the Naive Bayes Classifier (MP2D), Inductive Logic Programming (ILP) and Support Vector Inductive Logic Programming (SVILP) on a standard molecular benchmark dataset comprising 11 activity classes and about 102,000 structures. The Naive Bayes Classifier treats features independently while ILP combines structural fragments, and then creates new features with higher predictive power. SVILP is a very recently presented method which adds a support vector machine after common ILP procedures. The performance of the methods is evaluated via a number of statistical measures, namely recall, specificity, precision, F-measure, Matthews Correlation Coefficient, area under the Receiver Operating Characteristic (ROC) curve and enrichment factor (EF). According to the F-measure, which takes both recall and precision into account, SVILP is for seven out of the 11 classes the superior method. The results show that the Bayes Classifier gives the best recall performance for eight of the 11 targets, but has a much lower precision, specificity and F-measure. The SVILP model on the other hand has the highest recall for only three of the 11 classes, but generally far superior specificity and precision. To evaluate the statistical significance of the SVILP superiority, we employ McNemar's test which shows that SVILP performs significantly (p < 5%) better than both other methods for six out of 11 activity classes, while being superior with less significance for three of the remaining classes. While previously the Bayes Classifier was shown to perform very well in molecular classification studies, these results suggest that SVILP is able to extract additional knowledge from the data, thus improving classification results further.


Assuntos
Teorema de Bayes , Biologia Computacional , Desenho de Fármacos , Preparações Farmacêuticas/classificação , Preparações Farmacêuticas/metabolismo , Software , Intervalos de Confiança , Preparações Farmacêuticas/síntese química
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA