Your browser doesn't support javascript.
loading
Folding the human proteome using BioNeMo: A fused dataset of structural models for machine learning purposes.
Hetmann, Michael; Parigger, Lena; Sirelkhatim, Hassan; Stern, Abraham; Krassnigg, Andreas; Gruber, Karl; Steinkellner, Georg; Ruau, David; Gruber, Christian C.
Afiliação
  • Hetmann M; Innophore, San Francisco, CA, USA.
  • Parigger L; Innophore, San Francisco, CA, USA.
  • Sirelkhatim H; NVIDIA, Santa Clara, CA, USA.
  • Stern A; NVIDIA, Santa Clara, CA, USA.
  • Krassnigg A; Innophore, San Francisco, CA, USA.
  • Gruber K; Innophore, San Francisco, CA, USA.
  • Steinkellner G; Innophore, San Francisco, CA, USA.
  • Ruau D; NVIDIA, Santa Clara, CA, USA. druau@nvidia.com.
  • Gruber CC; Innophore, San Francisco, CA, USA. christian.gruber@innophore.com.
Sci Data ; 11(1): 591, 2024 Jun 06.
Article em En | MEDLINE | ID: mdl-38844754
ABSTRACT
Human proteins are crucial players in both health and disease. Understanding their molecular landscape is a central topic in biological research. Here, we present an extensive dataset of predicted protein structures for 42,042 distinct human proteins, including splicing variants, derived from the UniProt reference proteome UP000005640. To ensure high quality and comparability, the dataset was generated by combining state-of-the-art modeling-tools AlphaFold 2, OpenFold, and ESMFold, provided within NVIDIA's BioNeMo platform, as well as homology modeling using Innophore's CavitomiX platform. Our dataset is offered in both unedited and edited formats for diverse research requirements. The unedited version contains structures as generated by the different prediction methods, whereas the edited version contains refinements, including a dataset of structures without low prediction-confidence regions and structures in complex with predicted ligands based on homologs in the PDB. We are confident that this dataset represents the most comprehensive collection of human protein structures available today, facilitating diverse applications such as structure-based drug design and the prediction of protein function and interactions.
Assuntos

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Proteoma / Aprendizado de Máquina Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Proteoma / Aprendizado de Máquina Idioma: En Ano de publicação: 2024 Tipo de documento: Article