MoleculeNet: a benchmark for molecular machine learning.

Wu, Zhenqin; Ramsundar, Bharath; Feinberg, Evan N; Gomes, Joseph; Geniesse, Caleb; Pappu, Aneesh S; Leswing, Karl; Pande, Vijay

Wu, Zhenqin; Ramsundar, Bharath; Feinberg, Evan N; Gomes, Joseph; Geniesse, Caleb; Pappu, Aneesh S; Leswing, Karl; Pande, Vijay.

Afiliação

Wu Z; Department of Chemistry , Stanford University , Stanford , CA 94305 , USA . Email: pande@stanford.edu.
Ramsundar B; Department of Computer Science , Stanford University , Stanford , CA 94305 , USA.
Feinberg EN; Program in Biophysics , Stanford School of Medicine , Stanford , CA 94305 , USA.
Gomes J; Department of Chemistry , Stanford University , Stanford , CA 94305 , USA . Email: pande@stanford.edu.
Geniesse C; Program in Biophysics , Stanford School of Medicine , Stanford , CA 94305 , USA.
Pappu AS; Department of Computer Science , Stanford University , Stanford , CA 94305 , USA.
Leswing K; Schrodinger Inc. , USA.
Pande V; Department of Chemistry , Stanford University , Stanford , CA 94305 , USA . Email: pande@stanford.edu.

Chem Sci ; 9(2): 513-530, 2018 Jan 14.

Article em En | MEDLINE | ID: mdl-29629118

ABSTRACT

ABSTRACT

Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mechanical and biophysical datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm.

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Idioma: En Revista: Chem Sci Ano de publicação: 2018 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Idioma: En Revista: Chem Sci Ano de publicação: 2018 Tipo de documento: Article