RESUMO
Integration of flexible data-analysis tools with cheminformatics methods is a prerequisite for successful identification and validation of "hits" in high-throughput screening (HTS) campaigns. We have designed, developed, and implemented a suite of robust yet flexible cheminformatics tools to support HTS activities at the Broad Institute, three of which are described herein. The "hit-calling" tool allows a researcher to set a hit threshold that can be varied during downstream analysis. The results from the hit-calling exercise are reported to a database for record keeping and further data analysis. The "cherry-picking" tool enables creation of an optimized list of hits for confirmatory and follow-up assays from an HTS hit list. This tool allows filtering by computed chemical property and by substructure. In addition, similarity searches can be performed on hits of interest and sets of related compounds can be selected. The third tool, an "S/SAR viewer," has been designed specifically for the Broad Institute's diversity-oriented synthesis (DOS) collection. The compounds in this collection are rich in chiral centers and the full complement of all possible stereoisomers of a given compound are present in the collection. The S/SAR viewer allows rapid identification of both structure/activity relationships and stereo-structure/activity relationships present in HTS data from the DOS collection. Together, these tools enable the prioritization and analysis of hits from diverse compound collections, and enable informed decisions for follow-up biology and chemistry efforts.
Assuntos
Desenho de Fármacos , Ensaios de Triagem em Larga Escala , Relação Estrutura-Atividade , Algoritmos , Técnicas de Química Combinatória , Bases de Dados Factuais , HumanosRESUMO
DNA-encoded small molecule libraries (DELs) have enabled discovery of novel inhibitors for many distinct protein targets of therapeutic value. We demonstrate a new approach applying machine learning to DEL selection data by identifying active molecules from large libraries of commercial and easily synthesizable compounds. We train models using only DEL selection data and apply automated or automatable filters to the predictions. We perform a large prospective study (â¼2000 compounds) across three diverse protein targets: sEH (a hydrolase), ERα (a nuclear receptor), and c-KIT (a kinase). The approach is effective, with an overall hit rate of â¼30% at 30 µM and discovery of potent compounds (IC50 < 10 nM) for every target. The system makes useful predictions even for molecules dissimilar to the original DEL, and the compounds identified are diverse, predominantly drug-like, and different from known ligands. This work demonstrates a powerful new approach to hit-finding.