Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
1.
Nature ; 620(7972): 47-60, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-37532811

RESUMO

Artificial intelligence (AI) is being increasingly integrated into scientific discovery to augment and accelerate research, helping scientists to generate hypotheses, design experiments, collect and interpret large datasets, and gain insights that might not have been possible using traditional scientific methods alone. Here we examine breakthroughs over the past decade that include self-supervised learning, which allows models to be trained on vast amounts of unlabelled data, and geometric deep learning, which leverages knowledge about the structure of scientific data to enhance model accuracy and efficiency. Generative AI methods can create designs, such as small-molecule drugs and proteins, by analysing diverse data modalities, including images and sequences. We discuss how these methods can help scientists throughout the scientific process and the central issues that remain despite such advances. Both developers and users of AI toolsneed a better understanding of when such approaches need improvement, and challenges posed by poor data quality and stewardship remain. These issues cut across scientific disciplines and require developing foundational algorithmic approaches that can contribute to scientific understanding or acquire it autonomously, making them critical areas of focus for AI innovation.


Assuntos
Inteligência Artificial , Projetos de Pesquisa , Inteligência Artificial/normas , Inteligência Artificial/tendências , Conjuntos de Dados como Assunto , Aprendizado Profundo , Projetos de Pesquisa/normas , Projetos de Pesquisa/tendências , Aprendizado de Máquina não Supervisionado
3.
J Chem Inf Model ; 60(4): 1955-1968, 2020 04 27.
Artigo em Inglês | MEDLINE | ID: mdl-32243153

RESUMO

One of the key requirements for incorporating machine learning (ML) into the drug discovery process is complete traceability and reproducibility of the model building and evaluation process. With this in mind, we have developed an end-to-end modular and extensible software pipeline for building and sharing ML models that predict key pharma-relevant parameters. The ATOM Modeling PipeLine, or AMPL, extends the functionality of the open source library DeepChem and supports an array of ML and molecular featurization tools. We have benchmarked AMPL on a large collection of pharmaceutical data sets covering a wide range of parameters. Our key findings indicate that traditional molecular fingerprints underperform other feature representation methods. We also find that data set size correlates directly with prediction performance, which points to the need to expand public data sets. Uncertainty quantification can help predict model error, but correlation with error varies considerably between data sets and model types. Our findings point to the need for an extensible pipeline that can be shared to make model building more widely accessible and reproducible. This software is open source and available at: https://github.com/ATOMconsortium/AMPL.


Assuntos
Descoberta de Drogas , Software , Aprendizado de Máquina , Reprodutibilidade dos Testes
4.
PLoS Comput Biol ; 14(6): e1006176, 2018 06.
Artigo em Inglês | MEDLINE | ID: mdl-29927936

RESUMO

We use reinforcement learning to train an agent for computational RNA design: given a target secondary structure, design a sequence that folds to that structure in silico. Our agent uses a novel graph convolutional architecture allowing a single model to be applied to arbitrary target structures of any length. After training it on randomly generated targets, we test it on the Eterna100 benchmark and find it outperforms all previous algorithms. Analysis of its solutions shows it has successfully learned some advanced strategies identified by players of the game Eterna, allowing it to solve some very difficult structures. On the other hand, it has failed to learn other strategies, possibly because they were not required for the targets in the training set. This suggests the possibility that future improvements to the training protocol may yield further gains in performance.


Assuntos
Desenho Assistido por Computador/instrumentação , RNA/química , Algoritmos , Simulação por Computador , Aprendizagem , Aprendizado de Máquina , Conformação de Ácido Nucleico , Resolução de Problemas , Dobramento de RNA/fisiologia
5.
J Chem Inf Model ; 57(8): 2068-2076, 2017 08 28.
Artigo em Inglês | MEDLINE | ID: mdl-28692267

RESUMO

Multitask deep learning has emerged as a powerful tool for computational drug discovery. However, despite a number of preliminary studies, multitask deep networks have yet to be widely deployed in the pharmaceutical and biotech industries. This lack of acceptance stems from both software difficulties and lack of understanding of the robustness of multitask deep networks. Our work aims to resolve both of these barriers to adoption. We introduce a high-quality open-source implementation of multitask deep networks as part of the DeepChem open-source platform. Our implementation enables simple python scripts to construct, fit, and evaluate sophisticated deep models. We use our implementation to analyze the performance of multitask deep networks and related deep models on four collections of pharmaceutical data (three of which have not previously been analyzed in the literature). We split these data sets into train/valid/test using time and neighbor splits to test multitask deep learning performance under challenging conditions. Our results demonstrate that multitask deep networks are surprisingly robust and can offer strong improvement over random forests. Our analysis and open-source implementation in DeepChem provide an argument that multitask deep networks are ready for widespread use in commercial drug discovery.


Assuntos
Descoberta de Drogas/métodos , Aprendizado de Máquina , Absorção de Radiação , Concentração Inibidora 50 , Inibidores de Proteínas Quinases/química , Inibidores de Proteínas Quinases/farmacologia , Inibidores de Serina Proteinase/química , Inibidores de Serina Proteinase/farmacologia , Software , Raios Ultravioleta
6.
J Chem Inf Model ; 56(10): 1936-1949, 2016 10 24.
Artigo em Inglês | MEDLINE | ID: mdl-27689393

RESUMO

The binding affinities (IC50) reported for diverse structural and chemical classes of human ß-secretase 1 (BACE-1) inhibitors in literature were modeled using multiple in silico ligand based modeling approaches and statistical techniques. The descriptor space encompasses simple binary molecular fingerprint, one- and two-dimensional constitutional, physicochemical, and topological descriptors, and sophisticated three-dimensional molecular fields that require appropriate structural alignments of varied chemical scaffolds in one universal chemical space. The affinities were modeled using qualitative classification or quantitative regression schemes involving linear, nonlinear, and deep neural network (DNN) machine-learning methods used in the scientific literature for quantitative-structure activity relationships (QSAR). In a departure from tradition, ∼20% of the chemically diverse data set (205 compounds) was used to train the model with the remaining ∼80% of the structural and chemical analogs used as part of an external validation (1273 compounds) and prospective test (69 compounds) sets respectively to ascertain the model performance. The machine-learning methods investigated herein performed well in both the qualitative classification (∼70% accuracy) and quantitative IC50 predictions (RMSE ∼ 1 log). The success of the 2D descriptor based machine learning approach when compared against the 3D field based technique pursued for hBACE-1 inhibitors provides a strong impetus for systematically applying such methods during the lead identification and optimization efforts for other protein families as well.


Assuntos
Secretases da Proteína Precursora do Amiloide/antagonistas & inibidores , Ácido Aspártico Endopeptidases/antagonistas & inibidores , Descoberta de Drogas , Secretases da Proteína Precursora do Amiloide/química , Secretases da Proteína Precursora do Amiloide/metabolismo , Ácido Aspártico Endopeptidases/química , Ácido Aspártico Endopeptidases/metabolismo , Simulação por Computador , Descoberta de Drogas/métodos , Humanos , Ligantes , Aprendizado de Máquina , Modelos Moleculares , Redes Neurais de Computação , Relação Quantitativa Estrutura-Atividade , Bibliotecas de Moléculas Pequenas/química , Bibliotecas de Moléculas Pequenas/farmacologia
7.
mSystems ; 6(6): e0023321, 2021 Dec 21.
Artigo em Inglês | MEDLINE | ID: mdl-34726496

RESUMO

After emerging in China in late 2019, the novel coronavirus severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spread worldwide, and as of mid-2021, it remains a significant threat globally. Only a few coronaviruses are known to infect humans, and only two cause infections similar in severity to SARS-CoV-2: Severe acute respiratory syndrome-related coronavirus, a species closely related to SARS-CoV-2 that emerged in 2002, and Middle East respiratory syndrome-related coronavirus, which emerged in 2012. Unlike the current pandemic, previous epidemics were controlled rapidly through public health measures, but the body of research investigating severe acute respiratory syndrome and Middle East respiratory syndrome has proven valuable for identifying approaches to treating and preventing novel coronavirus disease 2019 (COVID-19). Building on this research, the medical and scientific communities have responded rapidly to the COVID-19 crisis and identified many candidate therapeutics. The approaches used to identify candidates fall into four main categories: adaptation of clinical approaches to diseases with related pathologies, adaptation based on virological properties, adaptation based on host response, and data-driven identification (ID) of candidates based on physical properties or on pharmacological compendia. To date, a small number of therapeutics have already been authorized by regulatory agencies such as the Food and Drug Administration (FDA), while most remain under investigation. The scale of the COVID-19 crisis offers a rare opportunity to collect data on the effects of candidate therapeutics. This information provides insight not only into the management of coronavirus diseases but also into the relative success of different approaches to identifying candidate therapeutics against an emerging disease. IMPORTANCE The COVID-19 pandemic is a rapidly evolving crisis. With the worldwide scientific community shifting focus onto the SARS-CoV-2 virus and COVID-19, a large number of possible pharmaceutical approaches for treatment and prevention have been proposed. What was known about each of these potential interventions evolved rapidly throughout 2020 and 2021. This fast-paced area of research provides important insight into how the ongoing pandemic can be managed and also demonstrates the power of interdisciplinary collaboration to rapidly understand a virus and match its characteristics with existing or novel pharmaceuticals. As illustrated by the continued threat of viral epidemics during the current millennium, a rapid and strategic response to emerging viral threats can save lives. In this review, we explore how different modes of identifying candidate therapeutics have borne out during COVID-19.

8.
ArXiv ; 2021 Mar 03.
Artigo em Inglês | MEDLINE | ID: mdl-33688554

RESUMO

After emerging in China in late 2019, the novel coronavirus SARS-CoV-2 spread worldwide and as of mid-2021 remains a significant threat globally. Only a few coronaviruses are known to infect humans, and only two cause infections similar in severity to SARS-CoV-2: Severe acute respiratory syndrome-related coronavirus, a closely related species of SARS-CoV-2 that emerged in 2002, and Middle East respiratory syndrome-related coronavirus, which emerged in 2012. Unlike the current pandemic, previous epidemics were controlled rapidly through public health measures, but the body of research investigating severe acute respiratory syndrome and Middle East respiratory syndrome has proven valuable for identifying approaches to treating and preventing novel coronavirus disease 2019 (COVID-19). Building on this research, the medical and scientific communities have responded rapidly to the COVID-19 crisis to identify many candidate therapeutics. The approaches used to identify candidates fall into four main categories: adaptation of clinical approaches to diseases with related pathologies, adaptation based on virological properties, adaptation based on host response, and data-driven identification of candidates based on physical properties or on pharmacological compendia. To date, a small number of therapeutics have already been authorized by regulatory agencies such as the Food and Drug Administration (FDA), while most remain under investigation. The scale of the COVID-19 crisis offers a rare opportunity to collect data on the effects of candidate therapeutics. This information provides insight not only into the management of coronavirus diseases, but also into the relative success of different approaches to identifying candidate therapeutics against an emerging disease.

9.
Nat Med ; 25(1): 24-29, 2019 01.
Artigo em Inglês | MEDLINE | ID: mdl-30617335

RESUMO

Here we present deep-learning techniques for healthcare, centering our discussion on deep learning in computer vision, natural language processing, reinforcement learning, and generalized methods. We describe how these computational techniques can impact a few key areas of medicine and explore how to build end-to-end systems. Our discussion of computer vision focuses largely on medical imaging, and we describe the application of natural language processing to domains such as electronic health record data. Similarly, reinforcement learning is discussed in the context of robotic-assisted surgery, and generalized deep-learning methods for genomics are reviewed.


Assuntos
Aprendizado Profundo , Atenção à Saúde , Diagnóstico por Imagem , Registros Eletrônicos de Saúde , Humanos , Processamento de Linguagem Natural
10.
Chem Sci ; 9(2): 513-530, 2018 Jan 14.
Artigo em Inglês | MEDLINE | ID: mdl-29629118

RESUMO

Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mechanical and biophysical datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm.

11.
ACS Cent Sci ; 4(11): 1520-1530, 2018 Nov 28.
Artigo em Inglês | MEDLINE | ID: mdl-30555904

RESUMO

The arc of drug discovery entails a multiparameter optimization problem spanning vast length scales. The key parameters range from solubility (angstroms) to protein-ligand binding (nanometers) to in vivo toxicity (meters). Through feature learning-instead of feature engineering-deep neural networks promise to outperform both traditional physics-based and knowledge-based machine learning models for predicting molecular properties pertinent to drug discovery. To this end, we present the PotentialNet family of graph convolutions. These models are specifically designed for and achieve state-of-the-art performance for protein-ligand binding affinity. We further validate these deep neural networks by setting new standards of performance in several ligand-based tasks. In parallel, we introduce a new metric, the Regression Enrichment Factor EFχ (R), to measure the early enrichment of computational models for chemical data. Finally, we introduce a cross-validation strategy based on structural homology clustering that can more accurately measure model generalizability, which crucially distinguishes the aims of machine learning for drug discovery from standard machine learning tasks.

12.
ACS Cent Sci ; 3(4): 283-293, 2017 Apr 26.
Artigo em Inglês | MEDLINE | ID: mdl-28470045

RESUMO

Recent advances in machine learning have made significant contributions to drug discovery. Deep neural networks in particular have been demonstrated to provide significant boosts in predictive power when inferring the properties and activities of small-molecule compounds (Ma, J. et al. J. Chem. Inf. MODEL: 2015, 55, 263-274). However, the applicability of these techniques has been limited by the requirement for large amounts of training data. In this work, we demonstrate how one-shot learning can be used to significantly lower the amounts of data required to make meaningful predictions in drug discovery applications. We introduce a new architecture, the iterative refinement long short-term memory, that, when combined with graph convolutional neural networks, significantly improves learning of meaningful distance metrics over small-molecules. We open source all models introduced in this work as part of DeepChem, an open-source framework for deep-learning in drug discovery (Ramsundar, B. deepchem.io. https://github.com/deepchem/deepchem, 2016).

13.
ACS Cent Sci ; 3(10): 1103-1113, 2017 Oct 25.
Artigo em Inglês | MEDLINE | ID: mdl-29104927

RESUMO

We describe a fully data driven model that learns to perform a retrosynthetic reaction prediction task, which is treated as a sequence-to-sequence mapping problem. The end-to-end trained model has an encoder-decoder architecture that consists of two recurrent neural networks, which has previously shown great success in solving other sequence-to-sequence prediction tasks such as machine translation. The model is trained on 50,000 experimental reaction examples from the United States patent literature, which span 10 broad reaction types that are commonly used by medicinal chemists. We find that our model performs comparably with a rule-based expert system baseline model, and also overcomes certain limitations associated with rule-based expert systems and with any machine learning approach that contains a rule-based expert system component. Our model provides an important first step toward solving the challenging problem of computational retrosynthetic analysis.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA