Your browser doesn't support javascript.
loading
Improving Measures of Chemical Structural Similarity Using Machine Learning on Chemical-Genetic Interactions.
Safizadeh, Hamid; Simpkins, Scott W; Nelson, Justin; Li, Sheena C; Piotrowski, Jeff S; Yoshimura, Mami; Yashiroda, Yoko; Hirano, Hiroyuki; Osada, Hiroyuki; Yoshida, Minoru; Boone, Charles; Myers, Chad L.
Afiliação
  • Safizadeh H; Department of Electrical and Computer Engineering, University of Minnesota-Twin Cities, Minneapolis, Minnesota 55455, United States.
  • Simpkins SW; Department of Computer Science and Engineering, University of Minnesota-Twin Cities, Minneapolis, Minnesota 55455, United States.
  • Nelson J; Bioinformatics and Computational Biology Graduate Program, University of Minnesota-Twin Cities, Minneapolis, Minnesota 55455, United States.
  • Li SC; Bioinformatics and Computational Biology Graduate Program, University of Minnesota-Twin Cities, Minneapolis, Minnesota 55455, United States.
  • Piotrowski JS; The Donnelly Centre, University of Toronto, Toronto, Ontario M5S 3E1, Canada.
  • Yoshimura M; RIKEN Center for Sustainable Resource Science (CSRS), Wako, Saitama 351-0198, Japan.
  • Yashiroda Y; RIKEN Center for Sustainable Resource Science (CSRS), Wako, Saitama 351-0198, Japan.
  • Hirano H; RIKEN Center for Sustainable Resource Science (CSRS), Wako, Saitama 351-0198, Japan.
  • Osada H; RIKEN Center for Sustainable Resource Science (CSRS), Wako, Saitama 351-0198, Japan.
  • Yoshida M; RIKEN Center for Sustainable Resource Science (CSRS), Wako, Saitama 351-0198, Japan.
  • Boone C; RIKEN Center for Sustainable Resource Science (CSRS), Wako, Saitama 351-0198, Japan.
  • Myers CL; RIKEN Center for Sustainable Resource Science (CSRS), Wako, Saitama 351-0198, Japan.
J Chem Inf Model ; 61(9): 4156-4172, 2021 09 27.
Article em En | MEDLINE | ID: mdl-34318674
ABSTRACT
A common strategy for identifying molecules likely to possess a desired biological activity is to search large databases of compounds for high structural similarity to a query molecule that demonstrates this activity, under the assumption that structural similarity is predictive of similar biological activity. However, efforts to systematically benchmark the diverse array of available molecular fingerprints and similarity coefficients have been limited by a lack of large-scale datasets that reflect biological similarities of compounds. To elucidate the relative performance of these alternatives, we systematically benchmarked 11 different molecular fingerprint encodings, each combined with 13 different similarity coefficients, using a large set of chemical-genetic interaction data from the yeast Saccharomyces cerevisiae as a systematic proxy for biological activity. We found that the performance of different molecular fingerprints and similarity coefficients varied substantially and that the all-shortest path fingerprints paired with the Braun-Blanquet similarity coefficient provided superior performance that was robust across several compound collections. We further proposed a machine learning pipeline based on support vector machines that offered a fivefold improvement relative to the best unsupervised approach. Our results generally suggest that using high-dimensional chemical-genetic data as a basis for refining molecular fingerprints can be a powerful approach for improving prediction of biological functions from chemical structures.
Assuntos

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Máquina de Vetores de Suporte / Aprendizado de Máquina Idioma: En Revista: J Chem Inf Model Assunto da revista: INFORMATICA MEDICA / QUIMICA Ano de publicação: 2021 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Máquina de Vetores de Suporte / Aprendizado de Máquina Idioma: En Revista: J Chem Inf Model Assunto da revista: INFORMATICA MEDICA / QUIMICA Ano de publicação: 2021 Tipo de documento: Article País de afiliação: Estados Unidos