Búsqueda | Portal Regional de la BVS

Efficient substructure searching of large chemical libraries: the ABCD chemical cartridge.

Agrafiotis, Dimitris K; Lobanov, Victor S; Shemanarev, Maxim; Rassokhin, Dmitrii N; Izrailev, Sergei; Jaeger, Edward P; Alex, Simson; Farnum, Michael.

J Chem Inf Model ; 51(12): 3113-30, 2011 Dec 27.

Artículo en Inglés | MEDLINE | ID: mdl-22035187

RESUMEN

Efficient substructure searching is a key requirement for any chemical information management system. In this paper, we describe the substructure search capabilities of ABCD, an integrated drug discovery informatics platform developed at Johnson & Johnson Pharmaceutical Research & Development, L.L.C. The solution consists of several algorithmic components: 1) a pattern mapping algorithm for solving the subgraph isomorphism problem, 2) an indexing scheme that enables very fast substructure searches on large structure files, 3) the incorporation of that indexing scheme into an Oracle cartridge to enable querying large relational databases through SQL, and 4) a cost estimation scheme that allows the Oracle cost-based optimizer to generate a good execution plan when a substructure search is combined with additional constraints in a single SQL query. The algorithm was tested on a public database comprising nearly 1 million molecules using 4,629 substructure queries, the vast majority of which were submitted by discovery scientists over the last 2.5 years of user acceptance testing of ABCD. 80.7% of these queries were completed in less than a second and 96.8% in less than ten seconds on a single CPU, while on eight processing cores these numbers increased to 93.2% and 99.7%, respectively. The slower queries involved extremely generic patterns that returned the entire database as screening hits and required extensive atom-by-atom verification.

Asunto(s)

Algoritmos , Descubrimiento de Drogas , Informática/métodos , Bibliotecas de Moléculas Pequeñas/química , Bases de Datos Factuales , Descubrimiento de Drogas/economía , Informática/economía , Factores de Tiempo

Library enhancement through the wisdom of crowds.

Hack, Michael D; Rassokhin, Dmitrii N; Buyck, Christophe; Seierstad, Mark; Skalkin, Andrew; ten Holte, Peter; Jones, Todd K; Mirzadegan, Taraneh; Agrafiotis, Dimitris K.

J Chem Inf Model ; 51(12): 3275-86, 2011 Dec 27.

Artículo en Inglés | MEDLINE | ID: mdl-22035213

RESUMEN

We present a novel approach for enhancing the diversity of a chemical library rooted on the theory of the wisdom of crowds. Our approach was motivated by a desire to tap into the collective experience of our global medicinal chemistry community and involved four basic steps: (1) Candidate compounds for acquisition were screened using various structural and property filters in order to eliminate clearly nondrug-like matter. (2) The remaining compounds were clustered together with our in-house collection using a novel fingerprint-based clustering algorithm that emphasizes common substructures and works with millions of molecules. (3) Clusters populated exclusively by external compounds were identified as "diversity holes," and representative members of these clusters were presented to our global medicinal chemistry community, who were asked to specify which ones they liked, disliked, or were indifferent to using a simple point-and-click interface. (4) The resulting votes were used to rank the clusters from most to least desirable, and to prioritize which ones should be targeted for acquisition. Analysis of the voting results reveals interesting voter behaviors and distinct preferences for certain molecular property ranges that are fully consistent with lead-like profiles established through systematic analysis of large historical databases.

Asunto(s)

Bibliotecas de Moléculas Pequeñas/química , Química Farmacéutica/métodos , Análisis por Conglomerados , Estructura Molecular

Power keys: a novel class of topological descriptors based on exhaustive subgraph enumeration and their application in substructure searching.

Liu, Pu; Agrafiotis, Dimitris K; Rassokhin, Dmitrii N.

J Chem Inf Model ; 51(11): 2843-51, 2011 Nov 28.

Artículo en Inglés | MEDLINE | ID: mdl-21955134

RESUMEN

We present a novel class of topological molecular descriptors, which we call power keys. Power keys are computed by enumerating all possible linear, branch, and cyclic subgraphs up to a given size, encoding the connected atoms and bonds into two separate components, and recording the number of occurrences of each subgraph. We have applied these new descriptors for the screening stage of substructure searching on a relational database of about 1 million compounds using a diverse set of reference queries. The new keys can eliminate the vast majority (>99.9% on average) of nonmatching molecules within a fraction of a second. More importantly, for many of the queries the screening efficiency is 100%. A common feature was identified for the molecules for which power keys have perfect discriminative ability. This feature can be exploited to obviate the need for expensive atom-by-atom matching in situations where some ambiguity can be tolerated (fuzzy substructure searching). Other advantages over commonly used molecular keys are also discussed.

Asunto(s)

Biología Computacional/métodos , Descubrimiento de Drogas/métodos , Programas Informáticos , Algoritmos , Biología Computacional/estadística & datos numéricos , Bases de Datos Factuales , Descubrimiento de Drogas/estadística & datos numéricos , Lógica Difusa , Modelos Moleculares , Relación Estructura-Actividad

Accelerating chemical database searching using graphics processing units.

Liu, Pu; Agrafiotis, Dimitris K; Rassokhin, Dmitrii N; Yang, Eric.

J Chem Inf Model ; 51(8): 1807-16, 2011 Aug 22.

Artículo en Inglés | MEDLINE | ID: mdl-21696144

RESUMEN

The utility of chemoinformatics systems depends on the accurate computer representation and efficient manipulation of chemical compounds. In such systems, a small molecule is often digitized as a large fingerprint vector, where each element indicates the presence/absence or the number of occurrences of a particular structural feature. Since in theory the number of unique features can be exceedingly large, these fingerprint vectors are usually folded into much shorter ones using hashing and modulo operations, allowing fast "in-memory" manipulation and comparison of molecules. There is increasing evidence that lossless fingerprints can substantially improve retrieval performance in chemical database searching (substructure or similarity), which have led to the development of several lossless fingerprint compression algorithms. However, any gains in storage and retrieval afforded by compression need to be weighed against the extra computational burden required for decompression before these fingerprints can be compared. Here we demonstrate that graphics processing units (GPU) can greatly alleviate this problem, enabling the practical application of lossless fingerprints on large databases. More specifically, we show that, with the help of a ~$500 ordinary video card, the entire PubChem database of ~32 million compounds can be searched in ~0.2-2 s on average, which is 2 orders of magnitude faster than a conventional CPU. If multiple query patterns are processed in batch, the speedup is even more dramatic (less than 0.02-0.2 s/query for 1000 queries). In the present study, we use the Elias gamma compression algorithm, which results in a compression ratio as high as 0.097.

Asunto(s)

Química Farmacéutica/métodos , Minería de Datos/métodos , Compuestos Orgánicos/análisis , Algoritmos , Química Farmacéutica/estadística & datos numéricos , Gráficos por Computador , Compresión de Datos , Bases de Datos Factuales , Modelos Químicos , Estructura Molecular , Programas Informáticos

A self-organizing algorithm for modeling protein loops.

Liu, Pu; Zhu, Fangqiang; Rassokhin, Dmitrii N; Agrafiotis, Dimitris K.

PLoS Comput Biol ; 5(8): e1000478, 2009 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-19696883

RESUMEN

Protein loops, the flexible short segments connecting two stable secondary structural units in proteins, play a critical role in protein structure and function. Constructing chemically sensible conformations of protein loops that seamlessly bridge the gap between the anchor points without introducing any steric collisions remains an open challenge. A variety of algorithms have been developed to tackle the loop closure problem, ranging from inverse kinematics to knowledge-based approaches that utilize pre-existing fragments extracted from known protein structures. However, many of these approaches focus on the generation of conformations that mainly satisfy the fixed end point condition, leaving the steric constraints to be resolved in subsequent post-processing steps. In the present work, we describe a simple solution that simultaneously satisfies not only the end point and steric conditions, but also chirality and planarity constraints. Starting from random initial atomic coordinates, each individual conformation is generated independently by using a simple alternating scheme of pairwise distance adjustments of randomly chosen atoms, followed by fast geometric matching of the conformationally rigid components of the constituent amino acids. The method is conceptually simple, numerically stable and computationally efficient. Very importantly, additional constraints, such as those derived from NMR experiments, hydrogen bonds or salt bridges, can be incorporated into the algorithm in a straightforward and inexpensive way, making the method ideal for solving more complex multi-loop problems. The remarkable performance and robustness of the algorithm are demonstrated on a set of protein loops of length 4, 8, and 12 that have been used in previous studies.

Asunto(s)

Algoritmos , Biología Computacional/métodos , Modelos Químicos , Proteínas/química , Cristalografía por Rayos X , Bases de Datos de Proteínas , Modelos Moleculares , Conformación Proteica

Advanced biological and chemical discovery (ABCD): centralizing discovery knowledge in an inherently decentralized world.

Agrafiotis, Dimitris K; Alex, Simson; Dai, Heng; Derkinderen, An; Farnum, Michael; Gates, Peter; Izrailev, Sergei; Jaeger, Edward P; Konstant, Paul; Leung, Albert; Lobanov, Victor S; Marichal, Patrick; Martin, Douglas; Rassokhin, Dmitrii N; Shemanarev, Maxim; Skalkin, Andrew; Stong, John; Tabruyn, Tom; Vermeiren, Marleen; Wan, Jackson; Xu, Xiang Yang; Yao, Xiang.

J Chem Inf Model ; 47(6): 1999-2014, 2007.

Artículo en Inglés | MEDLINE | ID: mdl-17973472

RESUMEN

We present ABCD, an integrated drug discovery informatics platform developed at Johnson & Johnson Pharmaceutical Research & Development, L.L.C. ABCD is an attempt to bridge multiple continents, data systems, and cultures using modern information technology and to provide scientists with tools that allow them to analyze multifactorial SAR and make informed, data-driven decisions. The system consists of three major components: (1) a data warehouse, which combines data from multiple chemical and pharmacological transactional databases, designed for supreme query performance; (2) a state-of-the-art application suite, which facilitates data upload, retrieval, mining, and reporting, and (3) a workspace, which facilitates collaboration and data sharing by allowing users to share queries, templates, results, and reports across project teams, campuses, and other organizational units. Chemical intelligence, performance, and analytical sophistication lie at the heart of the new system, which was developed entirely in-house. ABCD is used routinely by more than 1000 scientists around the world and is rapidly expanding into other functional areas within the J&J organization.

Asunto(s)

Biología , Biología Computacional , Computadores , Imagenología Tridimensional

A modified update rule for stochastic proximity embedding.

Rassokhin, Dmitrii N; Agrafiotis, Dimitris K.

J Mol Graph Model ; 22(2): 133-40, 2003 Nov.

Artículo en Inglés | MEDLINE | ID: mdl-12932784

RESUMEN

Recently, we described a fast self-organizing algorithm for embedding a set of objects into a low-dimensional Euclidean space in a way that preserves the intrinsic dimensionality and metric structure of the data [Proc. Natl. Acad. Sci. U.S.A. 99 (2002) 15869-15872]. The method, called stochastic proximity embedding (SPE), attempts to preserve the geodesic distances between the embedded objects, and scales linearly with the size of the data set. SPE starts with an initial configuration, and iteratively refines it by repeatedly selecting pairs of objects at random, and adjusting their coordinates so that their distances on the map match more closely their respective proximities. Here, we describe an alternative update rule that drastically reduces the number of calls to the random number generator and thus improves the efficiency of the algorithm.

Asunto(s)

Algoritmos , Procesos Estocásticos , Cómputos Matemáticos , Estructura Molecular , Distribución Aleatoria

A fractal approach for selecting an appropriate bin size for cell-based diversity estimation.

Agrafiotis, Dimitris K; Rassokhin, Dmitrii N.

J Chem Inf Comput Sci ; 42(1): 117-22, 2002.

Artículo en Inglés | MEDLINE | ID: mdl-11855975

RESUMEN

A novel approach for selecting an appropriate bin size for cell-based diversity assessment is presented. The method measures the sensitivity of the diversity index as a function of grid resolution, using a box-counting algorithm that is reminiscent of those used in fractal analysis. It is shown that the relative variance of the diversity score (sum of squared cell occupancies) of several commonly used molecular descriptor sets exhibits a bell-shaped distribution, whose exact characteristics depend on the distribution of the data set, the number of points considered, and the dimensionality of the feature space. The peak of this distribution represents the optimal bin size for a given data set and sample size. Although box counting can be performed in an algorithmically efficient manner, the ability of cell-based methods to distinguish between subsets of different spread falls sharply with dimensionality, and the method becomes useless beyond a few dimensions.

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA