RESUMO
The development of STING inhibitors for the treatment of STING-related inflammatory diseases continues to encounter significant challenges. The activation of STING is a multi-step process that includes binding with cGAMP, self-oligomerization, and translocation from the endoplasmic reticulum to the Golgi apparatus, ultimately inducing the expression of IRF3 and NF-κB-mediated interferons and inflammatory cytokines. It has been demonstrated that disruption of any of these steps can effectively inhibit STING activation. Traditional structure-based drug screening methodologies generally focus on specific binding sites. In this study, a TransformerCPI model based on protein primary sequences and independent of binding sites is employed to identify compounds capable of binding to the STING protein. The natural product Licochalcone D (LicoD) is identified as a potent and selective STING inhibitor. LicoD does not bind to the classical ligand-binding pocket; instead, it covalently modifies the Cys148 residue of STING. This modification inhibits STING oligomerization, consequently suppressing the recruitment of TBK1 and the nuclear translocation of IRF3 and NF-κB. LicoD treatment ameliorates the inflammatory phenotype in Trex1-1- mice and inhibits the progression of DSS-induced colitis and AOM/DSS-induced colitis-associated colon cancer (CAC). In summary, this study reveals the potential of LicoD in treating STING-driven inflammatory diseases. It also demonstrates the utility of the TransformerCPI model in discovering allosteric compounds beyond the conventional binding pockets.
RESUMO
Extracting knowledge from complex and diverse chemical texts is a pivotal task for both experimental and computational chemists. The task is still considered to be extremely challenging due to the complexity of the chemical language and scientific literature. This study explored the power of fine-tuned large language models (LLMs) on five intricate chemical text mining tasks: compound entity recognition, reaction role labelling, metal-organic framework (MOF) synthesis information extraction, nuclear magnetic resonance spectroscopy (NMR) data extraction, and the conversion of reaction paragraphs to action sequences. The fine-tuned LLMs demonstrated impressive performance, significantly reducing the need for repetitive and extensive prompt engineering experiments. For comparison, we guided ChatGPT (GPT-3.5-turbo) and GPT-4 with prompt engineering and fine-tuned GPT-3.5-turbo as well as other open-source LLMs such as Mistral, Llama3, Llama2, T5, and BART. The results showed that the fine-tuned ChatGPT models excelled in all tasks. They achieved exact accuracy levels ranging from 69% to 95% on these tasks with minimal annotated data. They even outperformed those task-adaptive pre-training and fine-tuning models that were based on a significantly larger amount of in-domain data. Notably, fine-tuned Mistral and Llama3 show competitive abilities. Given their versatility, robustness, and low-code capability, leveraging fine-tuned LLMs as flexible and effective toolkits for automated data acquisition could revolutionize chemical knowledge extraction.
RESUMO
Ensuring drug safety in the early stages of drug development is crucial to avoid costly failures in subsequent phases. However, the economic burden associated with detecting drug off-targets and potential side effects through in vitro safety screening and animal testing is substantial. Drug off-target interactions, along with the adverse drug reactions they induce, are significant factors affecting drug safety. To assess the liability of candidate drugs, we developed an artificial intelligence model for the precise prediction of compound off-target interactions, leveraging multi-task graph neural networks. The outcomes of off-target predictions can serve as representations for compounds, enabling the differentiation of drugs under various ATC codes and the classification of compound toxicity. Furthermore, the predicted off-target profiles are employed in adverse drug reaction (ADR) enrichment analysis, facilitating the inference of potential ADRs for a drug. Using the withdrawn drug Pergolide as an example, we elucidate the mechanisms underlying ADRs at the target level, contributing to the exploration of the potential clinical relevance of newly predicted off-target interactions. Overall, our work facilitates the early assessment of compound safety/toxicity based on off-target identification, deduces potential ADRs of drugs, and ultimately promotes the secure development of drugs.
RESUMO
Small-molecule glucagon-like peptide-1 receptor (GLP-1R) agonists are recognized as promising therapeutics for type 2 diabetes mellitus (T2DM) and obesity. Danuglipron, an investigational small-molecule agonist, has demonstrated high efficacy in clinical trials. However, further development of danuglipron is challenged by a high rate of gastrointestinal adverse events. While these effects may be target-related, it is plausible that the carboxylic acid group present in danuglipron may also play a role in these outcomes by affecting the pharmacokinetic properties and dosing regimen of danuglipron, as well as by exerting direct gastrointestinal irritation. Therefore, this study aims to replace the problematic carboxylic acid group by exploring the internal binding cavity of danuglipron bound to GLP-1R using a water molecule displacement strategy. A series of novel triazole-containing compounds have been designed and synthesized during the structure-activity relationship (SAR) study. These efforts resulted in the discovery of compound 2j with high potency (EC50 = 0.065 nM). Moreover, docking simulations revealed that compound 2j directly interacts with the residue Glu387 within the internal cavity of GLP-1R, effectively displacing the structural water previously bound to Glu387. Subsequent in vitro and in vivo experiments demonstrated that compound 2j had comparable efficacy to danuglipron in enhancing insulin secretion and improving glycemic control. Collectively, this study offers a practicable approach for the discovery of novel small-molecule GLP-1R agonists based on danuglipron, and compound 2j may serve as a lead compound to further exploit the unoccupied internal cavity of danuglipron's binding pocket.
Assuntos
Agonistas do Receptor do Peptídeo 1 Semelhante ao Glucagon , Animais , Humanos , Masculino , Camundongos , Diabetes Mellitus Tipo 2/tratamento farmacológico , Diabetes Mellitus Tipo 2/metabolismo , Relação Dose-Resposta a Droga , Receptor do Peptídeo Semelhante ao Glucagon 1/agonistas , Receptor do Peptídeo Semelhante ao Glucagon 1/metabolismo , Agonistas do Receptor do Peptídeo 1 Semelhante ao Glucagon/química , Agonistas do Receptor do Peptídeo 1 Semelhante ao Glucagon/farmacologia , Hipoglicemiantes/farmacologia , Hipoglicemiantes/química , Hipoglicemiantes/síntese química , Simulação de Acoplamento Molecular , Estrutura Molecular , Bibliotecas de Moléculas Pequenas/química , Bibliotecas de Moléculas Pequenas/farmacologia , Bibliotecas de Moléculas Pequenas/síntese química , Relação Estrutura-Atividade , Triazóis/química , Triazóis/farmacologia , Triazóis/síntese químicaRESUMO
Lipophilicity is a fundamental physical property that significantly affects various aspects of drug behavior, including solubility, permeability, metabolism, distribution, protein binding, and toxicity. Accurate prediction of lipophilicity, measured by the logD7.4 value (the distribution coefficient between n-octanol and buffer at physiological pH 7.4), is crucial for successful drug discovery and design. However, the limited availability of data for logD modeling poses a significant challenge to achieving satisfactory generalization capability. To address this challenge, we have developed a novel logD7.4 prediction model called RTlogD, which leverages knowledge from multiple sources. RTlogD combines pre-training on a chromatographic retention time (RT) dataset since the RT is influenced by lipophilicity. Additionally, microscopic pKa values are incorporated as atomic features, providing valuable insights into ionizable sites and ionization capacity. Furthermore, logP is integrated as an auxiliary task within a multitask learning framework. We conducted ablation studies and presented a detailed analysis, showcasing the effectiveness and interpretability of RT, pKa, and logP in the RTlogD model. Notably, our RTlogD model demonstrated superior performance compared to commonly used algorithms and prediction tools. These results underscore the potential of the RTlogD model to improve the accuracy and generalization of logD prediction in drug discovery and design. In summary, the RTlogD model addresses the challenge of limited data availability in logD modeling by leveraging knowledge from RT, microscopic pKa, and logP. Incorporating these factors enhances the predictive capabilities of our model, and it holds promise for real-world applications in drug discovery and design scenarios.
RESUMO
Meteorological factors, which are periodic and regular in a long run, have an unignorable impact on human health. Accurate health risk prediction based on meteorological factors is essential for optimal allocation of resource in healthcare units. However, due to the non-stationary and non-linear nature of the original hospitalization sequence, traditional methods are less robust in predicting it. This study aims to investigate hospital admission prediction models using time series pre-processing algorithms and deep learning approach based on meteorological factors. Using the electronic medical record data from Panyu Central Hospital and meteorological data of Panyu district from 2003 to 2019, 46,089 eligible patients with lower respiratory tract infections (LRTIs) and four meteorological factors were identified to build and evaluate the prediction models. A novel hybrid model, Cascade GAM-CEEMDAN-LSTM Model (CGCLM), was established in combination with generalized additive model (GAM), complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN), and long-short term memory (LSTM) networks for predicting daily admissions of patients with LRTIs. The experimental results show that CGCLM multistep method proposed in this paper outperforms single LSTM model in the prediction of health risk time series at different time window sizes. Moreover, our results also indicate that CGCLM has the best prediction performance when the time window is set to 61 days (RMSE = 1.12, MAE = 0.87, R2 = 0.93). Adequate extraction of exposure-response relationships between meteorological factors and diseases and suitable handling of sequence pre-processing have an important role in time series prediction. This hybrid climate-based model for predicting LRTIs disease can also be extended to time series prediction of other epidemic disease.