Búsqueda | Portal de Búsqueda de la BVS España

Diversity and Chemical Library Networks of Large Data Sets.

Dunn, Timothy B; Seabra, Gustavo M; Kim, Taewon David; Juárez-Mercado, K Eurídice; Li, Chenglong; Medina-Franco, José L; Miranda-Quintana, Ramón Alain.

J Chem Inf Model ; 62(9): 2186-2201, 2022 05 09.

Artículo en Inglés | MEDLINE | ID: mdl-34723537

RESUMEN

The quantification of chemical diversity has many applications in drug discovery, organic chemistry, food, and natural product chemistry, to name a few. As the size of the chemical space is expanding rapidly, it is imperative to develop efficient methods to quantify the diversity of large and ultralarge chemical libraries and visualize their mutual relationships in chemical space. Herein, we show an application of our recently introduced extended similarity indices to measure the fingerprint-based diversity of 19 chemical libraries typically used in drug discovery and natural products research with over 18 million compounds. Based on this concept, we introduce the Chemical Library Networks (CLNs) as a general and efficient framework to represent visually the chemical space of large chemical libraries providing a global perspective of the relation between the libraries. For the 19 compound libraries explored in this work, it was found that the (extended) Tanimoto index offers the best description of extended similarity in combination with RDKit fingerprints. CLNs are general and can be explored with any structure representation and similarity coefficient for large chemical libraries.

Asunto(s)

Productos Biológicos , Bibliotecas de Moléculas Pequeñas , Productos Biológicos/química , Descubrimiento de Drogas/métodos , Bibliotecas de Moléculas Pequeñas/química

Extended continuous similarity indices: theory and application for QSAR descriptor selection.

Rácz, Anita; Dunn, Timothy B; Bajusz, Dávid; Kim, Taewon D; Miranda-Quintana, Ramón Alain; Héberger, Károly.

J Comput Aided Mol Des ; 36(3): 157-173, 2022 03.

Artículo en Inglés | MEDLINE | ID: mdl-35288838

RESUMEN

Extended (or n-ary) similarity indices have been recently proposed to extend the comparative analysis of binary strings. Going beyond the traditional notion of pairwise comparisons, these novel indices allow comparing any number of objects at the same time. This results in a remarkable efficiency gain with respect to other approaches, since now we can compare N molecules in O(N) instead of the common quadratic O(N2) timescale. This favorable scaling has motivated the application of these indices to diversity selection, clustering, phylogenetic analysis, chemical space visualization, and post-processing of molecular dynamics simulations. However, the current formulation of the n-ary indices is limited to vectors with binary or categorical inputs. Here, we present the further generalization of this formalism so it can be applied to numerical data, i.e. to vectors with continuous components. We discuss several ways to achieve this extension and present their analytical properties. As a practical example, we apply this formalism to the problem of feature selection in QSAR and prove that the extended continuous similarity indices provide a convenient way to discern between several sets of descriptors.

Asunto(s)

Diseño de Fármacos , Relación Estructura-Actividad Cuantitativa , Filogenia

Exploring activity landscapes with extended similarity: is Tanimoto enough?

Dunn, Timothy B; López-López, Edgar; Kim, Taewon David; Medina-Franco, José L; Miranda-Quintana, Ramón Alain.

Mol Inform ; 42(7): e2300056, 2023 07.

Artículo en Inglés | MEDLINE | ID: mdl-37202375

RESUMEN

Understanding structure-activity landscapes is essential in drug discovery. Similarly, it has been shown that the presence of activity cliffs in compound data sets can have a substantial impact not only on the design progress but also can influence the predictive ability of machine learning models. With the continued expansion of the chemical space and the currently available large and ultra-large libraries, it is imperative to implement efficient tools to analyze the activity landscape of compound data sets rapidly. The goal of this study is to show the applicability of the n-ary indices to quantify the structure-activity landscapes of large compound data sets using different types of structural representation rapidly and efficiently. We also discuss how a recently introduced medoid algorithm provides the foundation to finding optimum correlations between similarity measures and structure-activity rankings. The applicability of the n-ary indices and the medoid algorithm is shown by analyzing the activity landscape of 10 compound data sets with pharmaceutical relevance using three fingerprints of different designs, 16 extended similarity indices, and 11 coincidence thresholds.

Asunto(s)

Algoritmos , Descubrimiento de Drogas , Relación Estructura-Actividad , Aprendizaje Automático

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA