Your browser doesn't support javascript.
loading
Cell-integral-diversity criterion: a proposal for minimizing cluster artifact in cell-based selections.
Rabal, Obdulia; Pascual, Rosalia; Borrell, José I; Teixidó, Jordi.
Affiliation
  • Rabal O; Grup d'Enginyeria Molecular, Institut Químic de Sarrià, Universitat Ramon Llull, Via Augusta 390, E-08017 Barcelona, Spain.
J Chem Inf Model ; 47(5): 1886-96, 2007.
Article in En | MEDLINE | ID: mdl-17824683
ABSTRACT
Cell-based methods and the diversity integral criterion (a distance-based technique) are commonly used approaches for assessing the diversity of collections of compounds in terms of space coverage. The main deficiency with cell-based methods is the arbitrariness of cell boundaries which leads to edge effects or cluster artifacts, i.e., situations in which similar molecules separated by a cell boundary yield a higher diversity score than molecules falling within the same cell but which are less similar to each other. We describe a straightforward diversity metric based on quantifying the distance to the center of the bins resulting from partitioning the descriptor space which aims at bypassing these artifacts. The mentioned criteria are compared for the diversity assessment of a set of selections carried out on three combinatorial libraries of different cardinalities. For each method, the influence of its parameters (reference partition and number of points) on their efficacy is examined. Furthermore, the proposed diversity metric is also applied to designing diverse libraries for three test cases. We show that full arrays selected by minimizing the sum of distances to the center of the cells are formed by compounds spaced further apart than selections obtained by maximizing the degree of cell occupancy.
Subject(s)
Search on Google
Database: MEDLINE Main subject: Cluster Analysis / Cells Type of study: Prognostic_studies Language: En Journal: J Chem Inf Model Journal subject: INFORMATICA MEDICA / QUIMICA Year: 2007 Type: Article Affiliation country: Spain
Search on Google
Database: MEDLINE Main subject: Cluster Analysis / Cells Type of study: Prognostic_studies Language: En Journal: J Chem Inf Model Journal subject: INFORMATICA MEDICA / QUIMICA Year: 2007 Type: Article Affiliation country: Spain