RESUMO
The proteins within the human epidermal growth factor receptor (EGFR) family, members of the tyrosine kinase receptor family, play a pivotal role in the molecular mechanisms driving the development of various tumors. Tyrosine kinase inhibitors, key compounds in targeted therapy, encounter challenges in cancer treatment due to emerging drug resistance mutations. Consequently, machine learning has undergone significant evolution to address the challenges of cancer drug discovery related to EGFR family proteins. However, the application of deep learning in this area is hindered by inherent difficulties associated with small-scale data, particularly the risk of overfitting. Moreover, the design of a model architecture that facilitates learning through multi-task and transfer learning, coupled with appropriate molecular representation, poses substantial challenges. In this study, we introduce GraphEGFR, a deep learning regression model designed to enhance molecular representation and model architecture for predicting the bioactivity of inhibitors against both wild-type and mutant EGFR family proteins. GraphEGFR integrates a graph attention mechanism for molecular graphs with deep and convolutional neural networks for molecular fingerprints. We observed that GraphEGFR models employing multi-task and transfer learning strategies generally achieve predictive performance comparable to existing competitive methods. The integration of molecular graphs and fingerprints adeptly captures relationships between atoms and enables both global and local pattern recognition. We further validated potential multi-targeted inhibitors for wild-type and mutant HER1 kinases, exploring key amino acid residues through molecular dynamics simulations to understand molecular interactions. This predictive model offers a robust strategy that could significantly contribute to overcoming the challenges of developing deep learning models for drug discovery with limited data and exploring new frontiers in multi-targeted kinase drug discovery for EGFR family proteins.
Assuntos
Aprendizado Profundo , Receptores ErbB , Inibidores de Proteínas Quinases , Receptores ErbB/antagonistas & inibidores , Receptores ErbB/metabolismo , Receptores ErbB/química , Inibidores de Proteínas Quinases/farmacologia , Inibidores de Proteínas Quinases/química , Humanos , Aprendizado de Máquina , Descoberta de Drogas , Redes Neurais de ComputaçãoRESUMO
Graph neural networks (GNN) offer an alternative approach to boost the screening effectiveness in drug discovery. However, their efficacy is often hindered by limited datasets. To address this limitation, we introduced a robust GNN training framework, applied to various chemical databases to identify potent non-nucleoside reverse transcriptase inhibitors (NNRTIs) against the challenging K103N-mutated HIV-1 RT. Leveraging self-supervised learning (SSL) pre-training to tackle data scarcity, we screened 1,824,367 compounds, using multi-step approach that incorporated machine learning (ML)-based screening, analysis of absorption, distribution, metabolism, and excretion (ADME) prediction, drug-likeness properties, and molecular docking. Ultimately, 45 compounds were left as potential candidates with 17 of the compounds were previously identified as NNRTIs, exemplifying the model's efficacy. The remaining 28 compounds are anticipated to be repurposed for new uses. Molecular dynamics (MD) simulations on repurposed candidates unveiled two promising preclinical drugs: one designed against Plasmodium falciparum and the other serving as an antibacterial agent. Both have superior binding affinity compared to anti-HIV drugs. This conceptual framework could be adapted for other disease-specific therapeutics, facilitating the identification of potent compounds effective against both WT and mutants while revealing novel scaffolds for drug design and discovery.
RESUMO
In the pursuit of novel antiretroviral therapies for human immunodeficiency virus type-1 (HIV-1) proteases (PRs), recent improvements in drug discovery have embraced machine learning (ML) techniques to guide the design process. This study employs ensemble learning models to identify crucial substructures as significant features for drug development. Using molecular docking techniques, a collection of 160 darunavir (DRV) analogs was designed based on these key substructures and subsequently screened using molecular docking techniques. Chemical structures with high fitness scores were selected, combined, and one-dimensional (1D) screening based on beyond Lipinski's rule of five (bRo5) and ADME (absorption, distribution, metabolism, and excretion) prediction implemented in the Combined Analog generator Tool (CAT) program. A total of 473 screened analogs were subjected to docking analysis through convolutional neural networks scoring function against both the wild-type (WT) and 12 major mutated PRs. DRV analogs with negative changes in binding free energy ( ΔΔ G bind ) compared to DRV could be categorized into four attractive groups based on their interactions with the majority of vital PRs. The analysis of interaction profiles revealed that potent designed analogs, targeting both WT and mutant PRs, exhibited interactions with common key amino acid residues. This observation further confirms that the ML model-guided approach effectively identified the substructures that play a crucial role in potent analogs. It is expected to function as a powerful computational tool, offering valuable guidance in the identification of chemical substructures for synthesis and subsequent experimental testing.
Assuntos
Infecções por HIV , Inibidores da Protease de HIV , HIV-1 , Humanos , Darunavir/farmacologia , Inibidores da Protease de HIV/farmacologia , Inibidores da Protease de HIV/química , Peptídeo Hidrolases/farmacologia , Simulação de Acoplamento Molecular , Protease de HIV/química , Descoberta de DrogasRESUMO
The urgent demand for chemical safety necessitates the real-time detection of carbon monoxide (CO), a highly toxic gas. MXene, a 2D material, has shown potential for gas sensing applications (e.g., NH3, NO, SO2, CO2) due to its high surface accessibility, electrical conductivity, stability, and flexibility in surface functionalization. However, the pristine MXene generally exhibits poor interaction with CO; still, transition metal decoration can strengthen the interaction between CO and MXene. This study presents a high-throughput screening of 450 combinations of transition-metal (TM) decorated MXene (TM@MXene) for CO sensing applications using an integrated active learning (AL) and density functional theory (DFT) screening pipeline. Our AL pipeline, adopting a crystal graph convolutional neural network (CGCNN) as a surrogate model, successfully accelerates the screening of CO sensor candidates with minimal computational resources. This study identifies Sc@Zr3C2O2 and Y@Zr3C2O2 as the optimal TM@MXene candidates with promising CO sensing performance regarding the screening criteria of recovery time, surface stability, charge transfer, and sensitivity to CO. The proposed AL framework can be extended for property finetuning in the combinatorial chemical space.
RESUMO
A multitargeted therapeutic approach with hybrid drugs is a promising strategy to enhance anticancer efficiency and overcome drug resistance in nonsmall cell lung cancer (NSCLC) treatment. Estimating affinities of small molecules against targets of interest typically proceeds as a preliminary action for recent drug discovery in the pharmaceutical industry. In this investigation, we employed machine learning models to provide a computationally affordable means for computer-aided screening to accelerate the discovery of potential drug compounds. In particular, we introduced a quantitative structure-activity-relationship (QSAR)-based multitask learning model to facilitate an in silico screening system of multitargeted drug development. Our method combines a recently developed graph-based neural network architecture, principal neighborhood aggregation (PNA), with a descriptor-based deep neural network supporting synergistic utilization of molecular graph and fingerprint features. The model was generated by more than ten-thousands affinity-reported ligands of seven crucial receptor tyrosine kinases in NSCLC from two public data sources. As a result, our multitask model demonstrated better performance than all other benchmark models, as well as achieving satisfying predictive ability regarding applicable QSAR criteria for most tasks within the model's applicability. Since our model could potentially be a screening tool for practical use, we have provided a model implementation platform with a tutorial that is freely accessible hence, advising the first move in a long journey of cancer drug development.
Assuntos
Descoberta de Drogas/métodos , Ligantes , Inibidores de Proteínas Quinases/química , Receptores Proteína Tirosina Quinases/química , Algoritmos , Carcinoma Pulmonar de Células não Pequenas , Bases de Dados de Produtos Farmacêuticos , Humanos , Neoplasias Pulmonares , Aprendizado de Máquina , Inibidores de Proteínas Quinases/farmacologia , Relação Quantitativa Estrutura-Atividade , Receptores Proteína Tirosina Quinases/antagonistas & inibidores , Reprodutibilidade dos Testes , Bibliotecas de Moléculas Pequenas , Fluxo de TrabalhoRESUMO
Deep metric learning is a supervised learning paradigm to construct a meaningful vector space to represent complex objects. A successful application of deep metric learning to pointsets means that we can avoid expensive retrieval operations on objects such as documents and can significantly facilitate many machine learning and data mining tasks involving pointsets. We propose a self-supervised deep metric learning solution for pointsets. The novelty of our proposed solution lies in a self-supervision mechanism that makes use of a distribution distance for set ranking called the Earth's Mover Distance (EMD) to generate pseudo labels and a pointset augmentation method for supporting the learning solution. Our experimental studies on documents, graphs, and point clouds datasets show that our proposed solutions outperform baselines and state-of-the-art approaches under the unsupervised settings. The learned self-supervised representation can also be used as a pre-trained model, which can boost downstream tasks with a fine-tuning step and outperform state-of-the-art language models.
RESUMO
The flexible tuning ability of dual-atom catalysts (DACs) makes them an ideal system for a wide range of electrochemical applications. However, the large design space of DACs and the complexity in the binding motif of electrochemical intermediates hinder the efficient determination of DAC combinations for desirable catalytic properties. A crystal graph convolutional neural network (CGCNN) was adopted for DACs to accelerate the high-throughput screening of hydrogen evolution reaction (HER) catalysts. From a pool of 435 dual-atom combinations in N-doped graphene (N6Gr), we screened out two high-performance HER catalysts (AuCo@N6Gr and NiNi@N6Gr) with excellent HER, electronic conductivity, and stability using the combination of CGCNN and density functional theory (DFT). Furthermore, comprehensive DFT studies were conducted on these two catalysts to confirm their outstanding reaction kinetics and to understand the cooperative effect between the metal pair for HER. To obtain ideal hydrogen binding in AuCo, the inert Au weakens the strong hydrogen binding of Co, while for NiNi, the two weakly binding Ni cooperate. The present protocol was able to select the two catalysts with different physical origins for HER and can be applied to other DAC catalysts, which should hasten catalyst discovery.