Your browser doesn't support javascript.
loading
Utilizing Low-Dimensional Molecular Embeddings for Rapid Chemical Similarity Search.
Kirchoff, Kathryn E; Wellnitz, James; Hochuli, Joshua E; Maxfield, Travis; Popov, Konstantin I; Gomez, Shawn; Tropsha, Alexander.
Afiliación
  • Kirchoff KE; Department of Computer Science, UNC Chapel Hill.
  • Wellnitz J; Eshelman School of Pharmacy, UNC Chapel Hill.
  • Hochuli JE; Eshelman School of Pharmacy, UNC Chapel Hill.
  • Maxfield T; Eshelman School of Pharmacy, UNC Chapel Hill.
  • Popov KI; Eshelman School of Pharmacy, UNC Chapel Hill.
  • Gomez S; Department of Pharmacology, UNC Chapel Hill.
  • Tropsha A; Joint Department of Biomedical Engineering at UNC Chapel Hill and NCSU.
Adv Inf Retr ; 14609: 34-49, 2024 Mar.
Article en En | MEDLINE | ID: mdl-38585224
ABSTRACT
Nearest neighbor-based similarity searching is a common task in chemistry, with notable use cases in drug discovery. Yet, some of the most commonly used approaches for this task still leverage a brute-force approach. In practice this can be computationally costly and overly time-consuming, due in part to the sheer size of modern chemical databases. Previous computational advancements for this task have generally relied on improvements to hardware or dataset-specific tricks that lack generalizability. Approaches that leverage lower-complexity searching algorithms remain relatively underexplored. However, many of these algorithms are approximate solutions and/or struggle with typical high-dimensional chemical embeddings. Here we evaluate whether a combination of low-dimensional chemical embeddings and a k-d tree data structure can achieve fast nearest neighbor queries while maintaining performance on standard chemical similarity search benchmarks. We examine different dimensionality reductions of standard chemical embeddings as well as a learned, structurally-aware embedding-SmallSA-for this task. With this framework, searches on over one billion chemicals execute in less than a second on a single CPU core, five orders of magnitude faster than the brute-force approach. We also demonstrate that SmallSA achieves competitive performance on chemical similarity benchmarks.
Palabras clave

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: Adv Inf Retr Año: 2024 Tipo del documento: Article

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: Adv Inf Retr Año: 2024 Tipo del documento: Article
...