RESUMO
Predicting drug-target interactions (DTIs) is one of the crucial tasks in drug discovery, but traditional wet-lab experiments are costly and time-consuming. Recently, deep learning has emerged as a promising tool for accelerating DTI prediction due to its powerful performance. However, the models trained on limited known DTI data struggle to generalize effectively to novel drug-target pairs. In this work, we propose a strategy to train an ensemble of models by capturing both domain-generic and domain-specific features (E-DIS) to learn diverse domain features and adapt them to out-of-distribution data. Multiple experts were trained on different domains to capture and align domain-specific information from various distributions without accessing any data from unseen domains. E-DIS provides a comprehensive representation of proteins and ligands by capturing diverse features. Experimental results on four benchmark data sets in both in-domain and cross-domain settings demonstrated that E-DIS significantly improved model performance and domain generalization compared to existing methods. Our approach presents a significant advancement in DTI prediction by combining domain-generic and domain-specific features, enhancing the generalization ability of the DTI prediction model.
Assuntos
Aprendizado Profundo , Descoberta de Drogas , Proteínas , Descoberta de Drogas/métodos , Proteínas/química , Proteínas/metabolismo , Ligantes , Preparações Farmacêuticas/química , Preparações Farmacêuticas/metabolismo , Domínios ProteicosRESUMO
Molecular simulations, which simulate the motions of particles according to fundamental laws of physics, have been applied to a wide range of fields from physics and materials science to biochemistry and drug discovery. Developed for computationally intensive applications, most molecular simulation software involves significant use of hard-coded derivatives and code reuse across various programming languages. In this Review, we first align the relationship between molecular simulations and artificial intelligence (AI) and reveal the coherence between the two. We then discuss how the AI platform can create new possibilities and deliver new solutions to molecular simulations, from the perspective of algorithms, programming paradigms, and even hardware. Rather than focusing solely on increasingly complex neural network models, we introduce various concepts and techniques brought about by modern AI and explore how they can be transacted to molecular simulations. To this end, we summarized several representative applications of molecular simulations enhanced by AI, including from differentiable programming and high-throughput simulations. Finally, we look ahead to promising directions that may help address existing issues in the current framework of AI-enhanced molecular simulations.
RESUMO
Data-driven predictive methods that can efficiently and accurately transform protein sequences into biologically active structures are highly valuable for scientific research and medical development. Determining an accurate folding landscape using coevolutionary information is fundamental to the success of modern protein structure prediction methods. As the state of the art, AlphaFold2 has dramatically raised the accuracy without performing explicit coevolutionary analysis. Nevertheless, its performance still shows strong dependence on available sequence homologues. Based on the interrogation on the cause of such dependence, we presented EvoGen, a meta generative model, to remedy the underperformance of AlphaFold2 for poor MSA targets. By prompting the model with calibrated or virtually generated homologue sequences, EvoGen helps AlphaFold2 fold accurately in the low-data regime and even achieve encouraging performance with single-sequence predictions. Being able to make accurate predictions with few-shot MSA not only generalizes AlphaFold2 better for orphan sequences but also democratizes its use for high-throughput applications. Besides, EvoGen combined with AlphaFold2 yields a probabilistic structure generation method that could explore alternative conformations of protein sequences, and the task-aware differentiable algorithm for sequence generation will benefit other related tasks including protein design.