Your browser doesn't support javascript.
loading
End-to-end sequence-structure-function meta-learning predicts genome-wide chemical-protein interactions for dark proteins.
Cai, Tian; Xie, Li; Zhang, Shuo; Chen, Muge; He, Di; Badkul, Amitesh; Liu, Yang; Namballa, Hari Krishna; Dorogan, Michael; Harding, Wayne W; Mura, Cameron; Bourne, Philip E; Xie, Lei.
Afiliación
  • Cai T; Ph.D. Program in Computer Science, The Graduate Center, The City University of New York, New York, New York, United States of America.
  • Xie L; Department of Computer Science, Hunter College, The City University of New York, New York, New York, United States of America.
  • Zhang S; Ph.D. Program in Computer Science, The Graduate Center, The City University of New York, New York, New York, United States of America.
  • Chen M; Master Program in Computer Science, Courant Institute of Mathematical Sciences, New York University, New York, New York, United States of America.
  • He D; Ph.D. Program in Computer Science, The Graduate Center, The City University of New York, New York, New York, United States of America.
  • Badkul A; Department of Computer Science, Hunter College, The City University of New York, New York, New York, United States of America.
  • Liu Y; Department of Computer Science, Hunter College, The City University of New York, New York, New York, United States of America.
  • Namballa HK; Department of Chemistry, Hunter College, The City University of New York, New York, New York, United States of America.
  • Dorogan M; Department of Chemistry, Hunter College, The City University of New York, New York, New York, United States of America.
  • Harding WW; Department of Chemistry, Hunter College, The City University of New York, New York, New York, United States of America.
  • Mura C; School of Data Science & Department of Biomedical Engineering, University of Virginia, Charlottesville, Virginia, United States of America.
  • Bourne PE; School of Data Science & Department of Biomedical Engineering, University of Virginia, Charlottesville, Virginia, United States of America.
  • Xie L; Ph.D. Program in Computer Science, The Graduate Center, The City University of New York, New York, New York, United States of America.
PLoS Comput Biol ; 19(1): e1010851, 2023 01.
Article en En | MEDLINE | ID: mdl-36652496
ABSTRACT
Systematically discovering protein-ligand interactions across the entire human and pathogen genomes is critical in chemical genomics, protein function prediction, drug discovery, and many other areas. However, more than 90% of gene families remain "dark"-i.e., their small-molecule ligands are undiscovered due to experimental limitations or human/historical biases. Existing computational approaches typically fail when the dark protein differs from those with known ligands. To address this challenge, we have developed a deep learning framework, called PortalCG, which consists of four novel components (i) a 3-dimensional ligand binding site enhanced sequence pre-training strategy to encode the evolutionary links between ligand-binding sites across gene families; (ii) an end-to-end pretraining-fine-tuning strategy to reduce the impact of inaccuracy of predicted structures on function predictions by recognizing the sequence-structure-function paradigm; (iii) a new out-of-cluster meta-learning algorithm that extracts and accumulates information learned from predicting ligands of distinct gene families (meta-data) and applies the meta-data to a dark gene family; and (iv) a stress model selection step, using different gene families in the test data from those in the training and development data sets to facilitate model deployment in a real-world scenario. In extensive and rigorous benchmark experiments, PortalCG considerably outperformed state-of-the-art techniques of machine learning and protein-ligand docking when applied to dark gene families, and demonstrated its generalization power for target identifications and compound screenings under out-of-distribution (OOD) scenarios. Furthermore, in an external validation for the multi-target compound screening, the performance of PortalCG surpassed the rational design from medicinal chemists. Our results also suggest that a differentiable sequence-structure-function deep learning framework, where protein structural information serves as an intermediate layer, could be superior to conventional methodology where predicted protein structures were used for the compound screening. We applied PortalCG to two case studies to exemplify its potential in drug discovery designing selective dual-antagonists of dopamine receptors for the treatment of opioid use disorder (OUD), and illuminating the understudied human genome for target diseases that do not yet have effective and safe therapeutics. Our results suggested that PortalCG is a viable solution to the OOD problem in exploring understudied regions of protein functional space.
Asunto(s)

Texto completo: 1 Bases de datos: MEDLINE Asunto principal: Algoritmos / Proteínas Tipo de estudio: Prognostic_studies / Risk_factors_studies Límite: Humans Idioma: En Revista: PLoS Comput Biol Asunto de la revista: BIOLOGIA / INFORMATICA MEDICA Año: 2023 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo: 1 Bases de datos: MEDLINE Asunto principal: Algoritmos / Proteínas Tipo de estudio: Prognostic_studies / Risk_factors_studies Límite: Humans Idioma: En Revista: PLoS Comput Biol Asunto de la revista: BIOLOGIA / INFORMATICA MEDICA Año: 2023 Tipo del documento: Article País de afiliación: Estados Unidos