Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
J Chem Inf Model ; 55(9): 1781-803, 2015 Sep 28.
Artigo em Inglês | MEDLINE | ID: mdl-26237649

RESUMO

Knowledge Discovery in Databases (KDD) refers to the use of methodologies from machine learning, pattern recognition, statistics, and other fields to extract knowledge from large collections of data, where the knowledge is not explicitly available as part of the database structure. In this paper, we describe four modern data mining techniques, Rough Set Theory (RST), Association Rule Mining (ARM), Emerging Pattern Mining (EP), and Formal Concept Analysis (FCA), and we have attempted to give an exhaustive list of their chemoinformatics applications. One of the main strengths of these methods is their descriptive ability. When used to derive rules, for example, in structure-activity relationships, the rules have clear physical meaning. This review has shown that there are close relationships between the methods. Often apparent differences lie in the way in which the problem under investigation has been formulated which can lead to the natural adoption of one or other method. For example, the idea of a structural alert, as a structure which is present in toxic and absent in nontoxic compounds, leads to the natural formulation of an Emerging Pattern search. Despite the similarities between the methods, each has its strengths. RST is useful for dealing with uncertain and noisy data. Its main chemoinformatics applications so far have been in feature extraction and feature reduction, the latter often as input to another data mining method, such as an Support Vector Machine (SVM). ARM has mostly been used for frequent subgraph mining. EP and FCA have both been used to mine both structural and nonstructural patterns for classification of both active and inactive molecules. Since their introduction in the 1980s and 1990s, RST, ARM, EP, and FCA have found wide-ranging applications, with many thousands of citations in Web of Science, but their adoption by the chemoinformatics community has been relatively slow. Advances, both in computer power and in algorithm development, mean that there is the potential to apply these techniques to larger data sets and thus to different problems in the future.


Assuntos
Algoritmos , Informática , Estrutura Molecular , Fenômenos Farmacológicos , Relação Estrutura-Atividade
2.
J Chem Inf Model ; 54(12): 3302-19, 2014 Dec 22.
Artigo em Inglês | MEDLINE | ID: mdl-25379955

RESUMO

Spectral clustering involves placing objects into clusters based on the eigenvectors and eigenvalues of an associated matrix. The technique was first applied to molecular data by Brewer [J. Chem. Inf. Model. 2007, 47, 1727-1733] who demonstrated its use on a very small dataset of 125 COX-2 inhibitors. We have determined suitable parameters for spectral clustering using a wide variety of molecular descriptors and several datasets of a few thousand compounds and compared the results of clustering using a nonoverlapping version of Brewer's use of Sarker and Boyer's algorithm with that of Ward's and k-means clustering. We then replaced the exact eigendecomposition method with two different approximate methods and concluded that Singular Value Decomposition is the most appropriate method for clustering larger compound collections of up to 100,000 compounds. We have also used spectral clustering with the Tversky coefficient to generate two sets of clusters linked by a common set of eigenvalues and have used this novel approach to cluster sets of fragments such as those used in fragment-based drug design.


Assuntos
Algoritmos , Estatística como Assunto/métodos , Análise por Conglomerados , Inibidores de Ciclo-Oxigenase 2/farmacologia , Descoberta de Drogas
3.
Phys Chem Chem Phys ; 15(41): 18262-73, 2013 Nov 07.
Artigo em Inglês | MEDLINE | ID: mdl-24064723

RESUMO

A liquid is composed of an ensemble of molecules that populate a large number of different states, so calculation of the solvation energy of a molecule in solution requires a method for summing the interactions with the environment over all of these states. The surface site interaction model for the properties of liquids at equilibrium (SSIMPLE) simplifies the surface of a molecule to a discrete number of specific interaction sites (SSIPs). The thermodynamic properties of these interaction sites can be characterised experimentally, for example, through measurement of association constants for the formation of simple complexes that feature a single H-bonding interaction. Correlation of experimentally determined solution phase H-bond parameters with gas phase ab initio calculations of maxima and minima on molecular electrostatic potential surfaces (MEPS) provides a method for converting gas phase calculations on isolated molecules to parameters that can be used to estimate solution phase interaction free energies. This approach has been generalised using a footprinting technique that converts an MEPS into a discrete set of SSIPs (each described by a polar interaction parameter, εi). These SSIPs represent the molecular recognition properties of the entire surface of the molecule. For example, water is described by four SSIPs, two H-bond donor sites and two H-bond acceptor sites. A liquid mixture is described as an ensemble of SSIPs that represent the components of the mixture at appropriate concentrations. Individual SSIPs are assumed to be independent, so speciation of SSIP contacts can be calculated based on properties of the individual SSIP interactions, which are given by the sum of a polar (εiεj) and a non-polar (E(vdW)) interaction term. Results are presented for calculation the free energies of transfer of a range of organic molecules from the pure liquid into water, from the pure liquid into n-hexadecane, from n-hexadecane into water, from n-octanol into water, and for the transfer of water from pure water into a range of organic liquids. The agreement with experiment is accurate to within 1.6-3.9 kJ mol(-1) root mean square difference, which suggests that the SSIMPLE approach is a promising method for estimation of solvation energies in more complex systems.


Assuntos
Modelos Moleculares , Solventes/química , Alcanos/química , Ligação de Hidrogênio , Eletricidade Estática , Termodinâmica , Água/química
4.
Nucleic Acids Res ; 40(Web Server issue): W380-6, 2012 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-22573174

RESUMO

Similarities in the 3D patterns of amino acid side chains can provide insights into their function despite the absence of any detectable sequence or fold similarities. Search for protein sites (SPRITE) and amino acid pattern search for substructures and motifs (ASSAM) are graph theoretical programs that can search for 3D amino side chain matches in protein structures, by representing the amino acid side chains as pseudo-atoms. The geometric relationship of the pseudo-atoms to each other as a pattern can be represented as a labeled graph where the pseudo-atoms are the graph's nodes while the edges are the inter-pseudo-atomic distances. Both programs require the input file to be in the PDB format. The objective of using SPRITE is to identify matches of side chains in a query structure to patterns with characterized function. In contrast, a 3D pattern of interest can be searched for existing occurrences in available PDB structures using ASSAM. Both programs are freely accessible without any login requirement. SPRITE is available at http://mfrlab.org/grafss/sprite/ while ASSAM can be accessed at http://mfrlab.org/grafss/assam/.


Assuntos
Motivos de Aminoácidos , Software , Aminoácidos/química , Proteínas Arqueais/química , Proteínas de Bactérias/química , Bases de Dados de Proteínas , Internet , Modelos Moleculares , Porinas/química , Conformação Proteica
5.
J Comput Aided Mol Des ; 26(4): 451-72, 2012 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-22538643

RESUMO

A program for overlaying multiple flexible molecules has been developed. Candidate overlays are generated by a novel fingerprint algorithm, scored on three objective functions (union volume, hydrogen-bond match, and hydrophobic match), and ranked by constrained Pareto ranking. A diverse subset of the best ranked solutions is chosen using an overlay-dissimilarity metric. If necessary, the solutions can be optimised. A multi-objective genetic algorithm can be used to find additional overlays with a given mapping of chemical features but different ligand conformations. The fingerprint algorithm may also be used to produce constrained overlays, in which user-specified chemical groups are forced to be superimposed. The program has been tested on several sets of ligands, for each of which the true overlay is known from protein-ligand crystal structures. Both objective and subjective success criteria indicate that good results are obtained on the majority of these sets.


Assuntos
Algoritmos , Estrutura Molecular
6.
J Chem Inf Model ; 52(3): 757-69, 2012 Mar 26.
Artigo em Inglês | MEDLINE | ID: mdl-22324299

RESUMO

Molecular interaction fields provide a useful description of ligand binding propensity and have found widespread use in computer-aided drug design, for example, to characterize protein binding sites and in small molecular applications, such as three-dimensional quantitative structure-activity relationships, physicochemical property prediction, and virtual screening. However, the grids on which the field data are stored are typically very large, consisting of thousands of data points, which make them cumbersome to store and manipulate. The wavelet transform is a commonly used data compression technique, for example, in signal processing and image compression. Here we use the wavelet transform to encode molecular interaction fields as wavelet thumbnails, which represent the original grid data in significantly reduced volumes. We describe a method for aligning wavelet thumbnails based on extracting extrema from the thumbnails and subsequently use them for virtual screening. We demonstrate that wavelet thumbnails provide an effective method of capturing the three-dimensional information encoded in a molecular interaction field.


Assuntos
Avaliação Pré-Clínica de Medicamentos/métodos , Modelos Moleculares , Área Sob a Curva , Ligantes , Conformação Molecular , Proteínas/metabolismo , Interface Usuário-Computador
7.
Future Med Chem ; 3(4): 405-14, 2011 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-21452977

RESUMO

BACKGROUND: It has been suggested that similarity searching using 2D fingerprints may not be suitable for scaffold hopping. METHODS: This article reports a detailed evaluation of the effectiveness of six common types of 2D fingerprints when they are used for scaffold-hopping similarity searches of the Molecular Design Limited Drug Data Report database, World of Molecular Bioactivity database and Maximum Unbiased Validation database. RESULTS: The results demonstrate that 2D fingerprints can be used for scaffold hopping, with novel scaffolds being identified in nearly every search that was carried out. The degree of enrichment depends on the structural diversity of the actives that are being sought, with the greatest enrichments often being obtained using the extended connectivity fingerprint encoding a circular substructure of diameter four bonds (ECFP4) fingerprint. CONCLUSION: 2D fingerprints provide a simple and computationally efficient way of identifying novel chemotypes in lead-discovery programs.


Assuntos
Inteligência Artificial , Desenho de Fármacos , Bases de Dados Factuais , Preparações Farmacêuticas/química , Relação Quantitativa Estrutura-Atividade
8.
J Mol Biol ; 396(2): 264-79, 2010 Feb 19.
Artigo em Inglês | MEDLINE | ID: mdl-19932116

RESUMO

Experimental X-ray crystal structures and a database of calculated structural parameters of DNA octamers were used in combination to analyse the mechanics of DNA bending in the nucleosome core complex. The 1kx5 X-ray crystal structure of the nucleosome core complex was used to determine the relationship between local structure at the base-step level and the global superhelical conformation observed for nucleosome-bound DNA. The superhelix is characterised by a large curvature (597 degrees) in one plane and very little curvature (10 degrees) in the orthogonal plane. Analysis of the curvature at the level of 10-step segments shows that there is a uniform curvature of 30 degrees per helical turn throughout most of the structure but that there are two sharper kinks of 50 degrees at +/-2 helical turns from the central dyad base pair. The curvature is due almost entirely to the base-step parameter roll. There are large periodic variations in roll, which are in phase with the helical twist and account for 500 degrees of the total curvature. Although variations in the other base-step parameters perturb the local path of the DNA, they make minimal contributions to the total curvature. This implies that DNA bending in the nucleosome is achieved using the roll-slide-twist degree of freedom previously identified as the major degree of freedom in naked DNA oligomers. The energetics of bending into a nucleosome-bound conformation were therefore analysed using a database of structural parameters that we have previously developed for naked DNA oligomers. The minimum energy roll, the roll flexibility force constant and the maximum and minimum accessible roll values were obtained for each base step in the relevant octanucleotide context to account for the effects of conformational coupling that vary with sequence context. The distribution of base-step roll values and corresponding strain energy required to bend DNA into the nucleosome-bound conformation defined by the 1kx5 structure were obtained by applying a constant bending moment. When a single bending moment was applied to the entire sequence, the local details of the calculated structure did not match the experiment. However, when local 10-step bending moments were applied separately, the calculated structure showed excellent agreement with experiment. This implies that the protein applies variable bending forces along the DNA to maintain the superhelical path required for nucleosome wrapping. In particular, the 50 degrees kinks are constraints imposed by the protein rather than a feature of the 1kx5 DNA sequence. The kinks coincide with a relatively flexible region of the sequence, and this is probably a prerequisite for high-affinity nucleosome binding, but the bending strain energy is significantly higher at these points than for the rest of the sequence. In the most rigid regions of the sequence, a higher strain energy is also required to achieve the standard 30 degrees curvature per helical turn. We conclude that matching of the DNA sequence to the local roll periodicity required to achieve bending, together with the increased flexibility required at the kinks, determines the sequence selectivity of DNA wrapping in the nucleosome.


Assuntos
DNA/química , DNA/metabolismo , Conformação de Ácido Nucleico , Nucleossomos/metabolismo , Fenômenos Biomecânicos , Montagem e Desmontagem da Cromatina/fisiologia , Cristalografia por Raios X , Modelos Biológicos , Modelos Moleculares , Modelos Teóricos , Simulação de Dinâmica Molecular , Nucleossomos/química , Estrutura Quaternária de Proteína
9.
J Chem Inf Model ; 49(12): 2761-73, 2009 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-19908873

RESUMO

Two methods are described for biasing conformational search during pharmacophore elucidation using a multiobjective genetic algorithm (MOGA). The MOGA explores conformation on-the-fly while simultaneously aligning a set of molecules such that their pharmacophoric features are maximally overlaid. By using a clique detection method to generate overlays of precomputed conformations to initialize the population (rather than starting from random), the speed of the algorithm has been increased by 2 orders of magnitude. This increase in speed has enabled the program to be applied to greater numbers of molecules than was previously possible. Furthermore, it was found that biasing the conformations explored during search time to those found in the Cambridge Structural Database could also improve the quality of the results.


Assuntos
Algoritmos , Descoberta de Drogas/métodos , Conformação Molecular , Quinase 2 Dependente de Ciclina/antagonistas & inibidores , Quinase 2 Dependente de Ciclina/química , Quinase 2 Dependente de Ciclina/genética , Bases de Dados Factuais , Inibidores Enzimáticos/química , Inibidores Enzimáticos/farmacologia , Humanos , Ligantes , Modelos Moleculares , Mutação , Tetra-Hidrofolato Desidrogenase/química , Tetra-Hidrofolato Desidrogenase/genética , Termodinâmica , Trombina/antagonistas & inibidores , Trombina/química , Trombina/genética
10.
J Chem Inf Model ; 47(2): 354-66, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-17309248

RESUMO

Chemical databases are routinely clustered, with the aim of grouping molecules which share similar structural features. Ideally, medicinal chemists are then able to browse a few representatives of the cluster in order to interpret the shared activity of the cluster members. However, when molecules are clustered using fingerprints, it may be difficult to decipher the structural commonalities which are present. Here, we seek to represent a cluster by means of a maximum common substructure based on the shared functionality of the cluster members. Previously, we have used reduced graphs, where each node corresponds to a generalized functional group, as topological molecular descriptors for virtual screening. In this work, we precluster a database using any clustering method. We then represent the molecules in a cluster as reduced graphs. By repeated application of a maximum common edge substructure (MCES) algorithm, we obtain one or more reduced graph cluster representatives. The sparsity of the reduced graphs means that the MCES calculations can be performed in real time. The reduced graph cluster representatives are readily interpretable in terms of functional activity and can be mapped directly back to the molecules to which they correspond, giving the chemist a rapid means of assessing potential activities contained within the cluster. Clusters of interest are then subject to a detailed R-group analysis using the same iterated MCES algorithm applied to the molecular graphs.


Assuntos
Algoritmos , Análise por Conglomerados , Biologia Computacional , Estrutura Molecular
11.
J Chem Inf Model ; 46(2): 503-11, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-16562978

RESUMO

Similarity-based methods for virtual screening are widely used. However, conventional searching using 2D chemical fingerprints or 2D graphs may retrieve only compounds which are structurally very similar to the original target molecule. Of particular current interest then is scaffold hopping, that is, the ability to identify molecules that belong to different chemical series but which could form the same interactions with a receptor. Reduced graphs provide summary representations of chemical structures and, therefore, offer the potential to retrieve compounds that are similar in terms of their gross features rather than at the atom-bond level. Using only a fingerprint representation of such graphs, we have previously shown that actives retrieved were more diverse than those found using Daylight fingerprints. Maximum common substructures give an intuitively reasonable view of the similarity between two molecules. However, their calculation using graph-matching techniques is too time-consuming for use in practical similarity searching in larger data sets. In this work, we exploit the low cardinality of the reduced graph in graph-based similarity searching. We reinterpret the reduced graph as a fully connected graph using the bond-distance information of the original graph. We describe searches, using both the maximum common induced subgraph and maximum common edge subgraph formulations, on the fully connected reduced graphs and compare the results with those obtained using both conventional chemical and reduced graph fingerprints. We show that graph matching using fully connected reduced graphs is an effective retrieval method and that the actives retrieved are likely to be topologically different from those retrieved using conventional 2D methods.


Assuntos
Algoritmos , Gráficos por Computador , Desenho de Fármacos , Inibidores Enzimáticos , Receptores de Serotonina , Bases de Dados como Assunto , Inibidores Enzimáticos/química , Inibidores Enzimáticos/classificação , Informática/métodos , Estrutura Molecular , Receptores de Serotonina/química , Receptores de Serotonina/classificação
12.
J Chem Inf Model ; 46(2): 743-52, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-16563005

RESUMO

Structural DNA profiles use the structural properties of the constituent octamers either to observe any characteristics of a single sequence that are unusual (a single sequence query) or to visualize a pattern common to a set of sequences (a multiple sequence query). They are an aid in understanding structural reasons for functional DNA activity. Profiles that answer single sequence queries are introduced and Profile Manager (a software application developed to automate profile generation) is presented. Two sequences that are similar by their nucleotide composition but are known to be very different by structure are analyzed, resulting in useful illustrations that agree with the experimental nuclear magnetic resonance structures.


Assuntos
Sequência de Bases , DNA/química , DNA/genética , Modelos Genéticos , Software , Animais , Humanos , Dados de Sequência Molecular , Regiões Promotoras Genéticas , Análise de Sequência de DNA/métodos , TATA Box/genética
13.
J Chem Inf Model ; 46(2): 753-61, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-16563006

RESUMO

Recent comparative studies of the human and mouse genomes have revealed sets of conserved nongenic sequences (CNGs) and sets of ultraconserved elements (UCEs). Both sets of sequences, which exhibit extremely high levels of conservation, extend over hundreds of bases and have no known function. Since there is no detectable sequence homology between paralogous CNGs or UCEs in either of the species, an alignment-free technique is needed for their analysis. We have previously compiled a database of the structural properties of all 32,896 unique DNA octamers, including information on stability, the minimum energy conformation, and flexibility. We have used Fourier techniques to analyze the UCEs and CNGs in terms of their octamer structural properties, to reveal structural correlations which may indicate possible functions for some of these sequences.


Assuntos
Biologia Computacional/métodos , Sequência Conservada/genética , DNA Intergênico/química , DNA Intergênico/genética , Genômica/métodos , Sequências Repetitivas Dispersas/genética , Animais , Sequência de Bases , Análise de Fourier , Humanos , Dados de Sequência Molecular
14.
J Chem Inf Model ; 45(3): 696-707, 2005.
Artigo em Inglês | MEDLINE | ID: mdl-15921459

RESUMO

A crucial enabling technology for structural genomics is the development of algorithms that can predict the putative function of novel protein structures: the proposed functions can subsequently be experimentally tested by functional studies. Testable assignments of function can be made if it is possible to attribute a putative, or indeed probable, function on the basis of the shapes of the binding sites on the surface of a protein structure. However the comparison of the surfaces of 3D protein structures is a computationally demanding task. Here we present four surface representations that can be used locally to describe the global shape of specifically bounded local region models. The most successful of these representations is obtained by a Fourier analysis of the distribution of surface curvature on concentric spheres around a surface point and summarizes a 24 A diameter spherically clipped region of protein surface by a fingerprint of 18 Fourier amplitude values. Searching experiments using these fingerprints on a set of 366 proteins demonstrate that this provides an effective and an efficient technique for the matching of protein surfaces.


Assuntos
Proteínas/química , Análise de Fourier , Modelos Moleculares
15.
J Mol Biol ; 343(4): 879-89, 2004 Oct 29.
Artigo em Inglês | MEDLINE | ID: mdl-15476807

RESUMO

A database of the structural properties of all 32,896 unique DNA octamer sequences has been calculated, including information on stability, the minimum energy conformation and flexibility. The contents of the database have been analysed using a variety of Euclidean distance similarity measures. A global comparison of sequence similarity with structural similarity shows that the structural properties of DNA are much less diverse than the sequences, and that DNA sequence space is larger and more diverse than DNA structure space. Thus, there are many very different sequences that have very similar structural properties, and this may be useful for identifying DNA motifs that have similar functional properties that are not apparent from the sequences. On the other hand, there are also small numbers of almost identical sequences that have very different structural properties, and these could give rise to false-positives in methods used to identify function based on sequence alignment. A simple validation test demonstrates that structural similarity can differentiate between promoter and non-promoter DNA. Combining structural and sequence similarity improves promoter recall beyond that possible using either similarity measure alone, demonstrating that there is indeed information available in the structure of double-helical DNA that is not readily apparent from the sequence.


Assuntos
DNA/química , Regiões Promotoras Genéticas
16.
J Mol Biol ; 332(5): 1025-35, 2003 Oct 03.
Artigo em Inglês | MEDLINE | ID: mdl-14499606

RESUMO

We have constructed the potential energy surfaces for all unique tetramers, hexamers and octamers in double helical DNA, as a function of the two principal degrees of freedom, slide and shift at the central step. From these potential energy maps, we have calculated a database of structural and flexibility properties for each of these sequences. These properties include: the values of each of the six step parameters (twist roll, tilt, rise, slide and shift), for each step of the sequence; flexibility measures for both decrease and increase in each property value from the minimum energy conformation for the central step; and the deviation from the path of a hypothetical straight octamer. In an analysis of structural change as a function of sequence length, we observe that almost all DNA tends to B-DNA and becomes less flexible. A more detailed analysis of octamer properties has allowed us to determine the structural preferences of particular sequence elements. GGC and GCC sequences tend to confer bistability, low stability and a predisposition to A-form DNA, whereas AA steps strongly prefer B-DNA and inhibit A-structures. There is no correlation between flexibility and intrinsic curvature, but bent DNA is less stable than straight. The most difficult deformation is undertwisting. The TA step stands out as the most flexible sequence element with respect to decreasing twist and increasing roll. However, as with the structural properties, this behavior is highly context-dependent and some TA steps are very straight.


Assuntos
DNA/análise , DNA/química , Conformação de Ácido Nucleico , DNA Forma A/química , Bases de Dados como Assunto , Modelos Estatísticos
17.
Proteins ; 52(1): 10-4, 2003 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-12784360

RESUMO

As part of the first Critical Assessment of PRotein Interactions, round 1, we predict the structure of two protein-protein complexes, by using a genetic algorithm, GAPDOCK, in combination with surface complementarity, buried surface area, biochemical information, and human intervention. Among the five models submitted for target 1, HPr phosphocarrier protein (B. subtilis) and the hexameric HPr kinase (L. lactis), the best correctly predicts 17 of 52 interprotein contacts, whereas for target 2, bovine rotavirus VP6 protein-monoclonal antibody, the best model predicts 27 of 52 correct contacts. Given the difficult nature of the targets, these predictions are very encouraging and compare well with those obtained by other methods. Nevertheless, it is clear that there is a need for improved methods for distinguishing between "correct" and "plausible but incorrect" complexes.


Assuntos
Algoritmos , Antígenos Virais , Proteínas de Bactérias , Modelos Moleculares , Proteínas/química , Proteínas/metabolismo , Sítios de Ligação , Proteínas do Capsídeo/química , Proteínas do Capsídeo/metabolismo , Fragmentos Fab das Imunoglobulinas/química , Fragmentos Fab das Imunoglobulinas/metabolismo , Substâncias Macromoleculares , Modelos Genéticos , Sistema Fosfotransferase de Açúcar do Fosfoenolpiruvato/química , Sistema Fosfotransferase de Açúcar do Fosfoenolpiruvato/metabolismo , Mapeamento de Interação de Proteínas , Proteínas Serina-Treonina Quinases/química , Proteínas Serina-Treonina Quinases/metabolismo , Proteínas/genética
18.
J Chem Inf Comput Sci ; 43(2): 346-56, 2003.
Artigo em Inglês | MEDLINE | ID: mdl-12653496

RESUMO

Reduced graphs provide summary representations of chemical structures. Here, a variety of different types of reduced graphs are compared in similarity searches. The reduced graphs are found to give comparable performance to Daylight fingerprints in terms of the number of active compounds retrieved. However, no one type of reduced graph is found to be consistently superior across a variety of different data sets. Consequently, a representative set of reduced graphs was chosen and used together with Daylight fingerprints in data fusion experiments. The results show improved performance in 10 out of 11 data sets compared to using Daylight fingerprints alone. Finally, the potential of using reduced graphs to build SAR models is demonstrated using recursive partitioning. An SAR model consistent with a published model is found following just two splits in the decision tree.


Assuntos
Gestão da Informação , Preparações Farmacêuticas/química , Farmacologia/métodos , Apresentação de Dados , Desenho de Fármacos , Estrutura Molecular , Relação Estrutura-Atividade
19.
J Chem Inf Comput Sci ; 42(2): 305-16, 2002.
Artigo em Inglês | MEDLINE | ID: mdl-11911700

RESUMO

Recently a method (RASCAL) for determining graph similarity using a maximum common edge subgraph algorithm has been proposed which has proven to be very efficient when used to calculate the relative similarity of chemical structures represented as graphs. This paper describes heuristics which simplify a RASCAL similarity calculation by taking advantage of certain properties specific to chemical graph representations of molecular structure. These heuristics are shown experimentally to increase the efficiency of the algorithm, especially at more distant values of chemical graph similarity.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...