Your browser doesn't support javascript.
loading
Montrer: 20 | 50 | 100
Résultats 1 - 3 de 3
Filtrer
Plus de filtres










Base de données
Gamme d'année
1.
PLoS One ; 16(7): e0253612, 2021.
Article de Anglais | MEDLINE | ID: mdl-34283864

RÉSUMÉ

The rise of machine learning (ML) has created an explosion in the potential strategies for using data to make scientific predictions. For physical scientists wishing to apply ML strategies to a particular domain, it can be difficult to assess in advance what strategy to adopt within a vast space of possibilities. Here we outline the results of an online community-powered effort to swarm search the space of ML strategies and develop algorithms for predicting atomic-pairwise nuclear magnetic resonance (NMR) properties in molecules. Using an open-source dataset, we worked with Kaggle to design and host a 3-month competition which received 47,800 ML model predictions from 2,700 teams in 84 countries. Within 3 weeks, the Kaggle community produced models with comparable accuracy to our best previously published 'in-house' efforts. A meta-ensemble model constructed as a linear combination of the top predictions has a prediction accuracy which exceeds that of any individual model, 7-19x better than our previous state-of-the-art. The results highlight the potential of transformer architectures for predicting quantum mechanical (QM) molecular properties.


Sujet(s)
Science citoyenne/méthodes , Science citoyenne/tendances , Prévision/méthodes , Algorithmes , Participation communautaire , Humains , Apprentissage machine/tendances , Imagerie par résonance magnétique/méthodes , Spectroscopie par résonance magnétique/méthodes , Modèles statistiques
2.
J Chem Inf Model ; 60(10): 4629-4639, 2020 10 26.
Article de Anglais | MEDLINE | ID: mdl-32786700

RÉSUMÉ

Deep learning has demonstrated significant potential in advancing state of the art in many problem domains, especially those benefiting from automated feature extraction. Yet, the methodology has seen limited adoption in the field of ligand-based virtual screening (LBVS) as traditional approaches typically require large, target-specific training sets, which limits their value in most prospective applications. Here, we report the development of a neural network architecture and a learning framework designed to yield a generally applicable tool for LBVS. Our approach uses the molecular graph as input and involves learning a representation that places compounds of similar biological profiles in close proximity within a hyperdimensional feature space; this is achieved by simultaneously leveraging historical screening data against a multitude of targets during training. Cosine distance between molecules in this space becomes a general similarity metric and can readily be used to rank order database compounds in LBVS workflows. We demonstrate the resulting model generalizes exceptionally well to compounds and targets not used in its training. In three commonly employed LBVS benchmarks, our method outperforms popular fingerprinting algorithms without the need for any target-specific training. Moreover, we show the learned representation yields superior performance in scaffold hopping tasks and is largely orthogonal to existing fingerprints. Summarily, we have developed and validated a framework for learning a molecular representation that is applicable to LBVS in a target-agnostic fashion, with as few as one query compound. Our approach can also enable organizations to generate additional value from large screening data repositories, and to this end we are making its implementation freely available at https://github.com/totient-bio/gatnn-vs.


Sujet(s)
Algorithmes , , Bases de données factuelles , Ligands , Études prospectives
3.
Pac Symp Biocomput ; 22: 154-165, 2017.
Article de Anglais | MEDLINE | ID: mdl-27896971

RÉSUMÉ

As biomedical data has become increasingly easy to generate in large quantities, the methods used to analyze it have proliferated rapidly. Reproducible and reusable methods are required to learn from large volumes of data reliably. To address this issue, numerous groups have developed workflow specifications or execution engines, which provide a framework with which to perform a sequence of analyses. One such specification is the Common Workflow Language, an emerging standard which provides a robust and flexible framework for describing data analysis tools and workflows. In addition, reproducibility can be furthered by executors or workflow engines which interpret the specification and enable additional features, such as error logging, file organization, optim1izations to computation and job scheduling, and allow for easy computing on large volumes of data. To this end, we have developed the Rabix Executor, an open-source workflow engine for the purposes of improving reproducibility through reusability and interoperability of workflow descriptions.


Sujet(s)
Logiciel , Flux de travaux , Biologie informatique , Humains , Modèles statistiques , Reproductibilité des résultats
SÉLECTION CITATIONS
DÉTAIL DE RECHERCHE
...