Your browser doesn't support javascript.
loading
MISATO: machine learning dataset of protein-ligand complexes for structure-based drug discovery.
Siebenmorgen, Till; Menezes, Filipe; Benassou, Sabrina; Merdivan, Erinc; Didi, Kieran; Mourão, André Santos Dias; Kitel, Radoslaw; Liò, Pietro; Kesselheim, Stefan; Piraud, Marie; Theis, Fabian J; Sattler, Michael; Popowicz, Grzegorz M.
Afiliação
  • Siebenmorgen T; Molecular Targets and Therapeutics Center, Institute of Structural Biology, Helmholtz Munich, Neuherberg, Germany.
  • Menezes F; TUM School of Natural Sciences, Department of Bioscience, Bayerisches NMR Zentrum, Technical University of Munich, Garching, Germany.
  • Benassou S; Molecular Targets and Therapeutics Center, Institute of Structural Biology, Helmholtz Munich, Neuherberg, Germany.
  • Merdivan E; TUM School of Natural Sciences, Department of Bioscience, Bayerisches NMR Zentrum, Technical University of Munich, Garching, Germany.
  • Didi K; Jülich Supercomputing Centre, Forschungszentrum Jülich, Jülich, Germany.
  • Mourão ASD; Helmholtz AI, Helmholtz Munich, Neuherberg, Germany.
  • Kitel R; Computer Laboratory, Cambridge University, Cambridge, UK.
  • Liò P; Molecular Targets and Therapeutics Center, Institute of Structural Biology, Helmholtz Munich, Neuherberg, Germany.
  • Kesselheim S; TUM School of Natural Sciences, Department of Bioscience, Bayerisches NMR Zentrum, Technical University of Munich, Garching, Germany.
  • Piraud M; Faculty of Chemistry, Jagiellonian University, Krakow, Poland.
  • Theis FJ; Computer Laboratory, Cambridge University, Cambridge, UK.
  • Sattler M; Jülich Supercomputing Centre, Forschungszentrum Jülich, Jülich, Germany.
  • Popowicz GM; Helmholtz AI, Helmholtz Munich, Neuherberg, Germany.
Nat Comput Sci ; 4(5): 367-378, 2024 May.
Article em En | MEDLINE | ID: mdl-38730184
ABSTRACT
Large language models have greatly enhanced our ability to understand biology and chemistry, yet robust methods for structure-based drug discovery, quantum chemistry and structural biology are still sparse. Precise biomolecule-ligand interaction datasets are urgently needed for large language models. To address this, we present MISATO, a dataset that combines quantum mechanical properties of small molecules and associated molecular dynamics simulations of ~20,000 experimental protein-ligand complexes with extensive validation of experimental data. Starting from the existing experimental structures, semi-empirical quantum mechanics was used to systematically refine these structures. A large collection of molecular dynamics traces of protein-ligand complexes in explicit water is included, accumulating over 170 µs. We give examples of machine learning (ML) baseline models proving an improvement of accuracy by employing our data. An easy entry point for ML experts is provided to enable the next generation of drug discovery artificial intelligence models.
Assuntos

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Proteínas / Descoberta de Drogas / Simulação de Dinâmica Molecular / Aprendizado de Máquina Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Proteínas / Descoberta de Drogas / Simulação de Dinâmica Molecular / Aprendizado de Máquina Idioma: En Ano de publicação: 2024 Tipo de documento: Article