RESUMEN
Some modifications were introduced into the previously described Centroid diversity sorting algorithm, which uses cosine similarity metric. The modified algorithm is suitable for the work with large databases on personal computers. For example, for diversity sorting of the database with the size greater than a million of records, less than 9 h are required (Pentium III, 800 MHz). The problem of selecting new compounds into the existing collection is examined to reach the maximum diversity of the collection. The article describes the new algorithm for the selection of heterocyclic compounds.
RESUMEN
The development of a scoring scheme for the classification of molecules into serine protease (SP) actives and inactives is described. The method employed a set of pre-selected descriptors for encoding the molecular structures, and a trained neural network for classifying the molecules. The molecular requirements were profiled and validated by using available databases of SP- and non-SP-active agents [1,439 diverse SP-active molecules, and 5,131 diverse non-SP-active molecules from the Ensemble Database (Prous Science, 2002)] and Sensitivity Analysis. The method enables an efficient qualification or disqualification of a molecule as a potential serine protease ligand. It represents a useful tool for constraining the size of virtual libraries that will help accelerate the development of new serine protease active drugs.