Pesquisa | Portal Regional da BVS

1.

GenoM7GNet: An Efficient N⁷-Methylguanosine Site Prediction Approach Based on a Nucleotide Language Model.

Li, Chuang; Wang, Heshi; Wen, Yanhua; Yin, Rui; Zeng, Xiangxiang; Li, Keqin.

IEEE/ACM Trans Comput Biol Bioinform ; PP2024 Sep 20.

Artigo em Inglês | MEDLINE | ID: mdl-39302806

RESUMO

N7 -methylguanosine (m7G), one of the mainstream post-transcriptional RNA modifications, occupies an exceedingly significant place in medical treatments. However, classic approaches for identifying m7G sites are costly both in time and equipment. Meanwhile, the existing machine learning methods extract limited hidden information from RNA sequences, thus making it difficult to improve the accuracy. Therefore, we put forward to a deep learning network, called "GenoM7GNet," for m7G site identification. This model utilizes a Bidirectional Encoder Representation from Transformers (BERT) and is pretrained on nucleotide sequences data to capture hidden patterns from RNA sequences for m7G site prediction. Moreover, through detailed comparative experiments with various deep learning models, we discovered that the one-dimensional convolutional neural network (CNN) exhibits outstanding performance in sequence feature learning and classification. The proposed GenoM7GNet model achieved 0.953in accuracy, 0.932in sensitivity, 0.976in specificity, 0.907in Matthews Correlation Coefficient and 0.984in Area Under the receiver operating characteristic Curve on performance evaluation. Extensive experimental results further prove that our GenoM7GNet model markedly surpasses other state-of-the-art models in predicting m7G sites, exhibiting high computing performance.

2.

SSR-DTA: Substructure-aware multi-layer graph neural networks for drug-target binding affinity prediction.

Liu, Yuansheng; Xia, Xinyan; Gong, Yongshun; Song, Bosheng; Zeng, Xiangxiang.

Artif Intell Med ; 157: 102983, 2024 Sep 17.

Artigo em Inglês | MEDLINE | ID: mdl-39321746

RESUMO

Accurate prediction of drug-target binding affinity (DTA) is essential in the field of drug discovery. Recently, scientists have been attempting to utilize artificial intelligence prediction to screen out a significant number of ineffective compounds, thereby mitigating labor and financial losses. While graph neural networks (GNNs) have been applied to DTA, existing GNNs have limitations in effectively extracting substructural features across various sizes. Functional groups play a crucial role in modulating molecular properties, but existing GNNs struggle with feature extraction from certain motifs due to scale mismatches. Additionally, sequence-based models for target proteins lack the integration of structural information. To address these limitations, we present SSR-DTA, a multi-layer graph network capable of adapting to diverse structural sizes, which can extract richer biological features, thereby improving the robustness and accuracy of predictions. Multi-layer GNNs enable the capture of molecular motifs across different scales, ranging from atomic to macrocyclic motifs. Furthermore, we introduce BiGNN to simultaneously learn sequence and structural information. Sequence information corresponds to the primary structure of proteins, while graph information represents the tertiary structure. BiGNN assimilates richer information compared to sequence-based methods while mitigating the impact of errors from predicted structures, resulting in more accurate predictions. Through rigorous experimental evaluations conducted on four benchmark datasets, we demonstrate the superiority of SSR-DTA over state-of-the-art models. Particularly, in comparison to state-of-the-art models, SSR-DTA demonstrates an impressive 20% reduction in mean squared error on the Davis dataset and a 5% reduction on the KIBA dataset, underscoring its potential as a valuable tool for advancing DTA prediction.

3.

Attribute-guided prototype network for few-shot molecular property prediction.

Hou, Linlin; Xiang, Hongxin; Zeng, Xiangxiang; Cao, Dongsheng; Zeng, Li; Song, Bosheng.

Brief Bioinform ; 25(5)2024 Jul 25.

Artigo em Inglês | MEDLINE | ID: mdl-39133096

RESUMO

The molecular property prediction (MPP) plays a crucial role in the drug discovery process, providing valuable insights for molecule evaluation and screening. Although deep learning has achieved numerous advances in this area, its success often depends on the availability of substantial labeled data. The few-shot MPP is a more challenging scenario, which aims to identify unseen property with only few available molecules. In this paper, we propose an attribute-guided prototype network (APN) to address the challenge. APN first introduces an molecular attribute extractor, which can not only extract three different types of fingerprint attributes (single fingerprint attributes, dual fingerprint attributes, triplet fingerprint attributes) by considering seven circular-based, five path-based, and two substructure-based fingerprints, but also automatically extract deep attributes from self-supervised learning methods. Furthermore, APN designs the Attribute-Guided Dual-channel Attention module to learn the relationship between the molecular graphs and attributes and refine the local and global representation of the molecules. Compared with existing works, APN leverages high-level human-defined attributes and helps the model to explicitly generalize knowledge in molecular graphs. Experiments on benchmark datasets show that APN can achieve state-of-the-art performance in most cases and demonstrate that the attributes are effective for improving few-shot MPP performance. In addition, the strong generalization ability of APN is verified by conducting experiments on data from different domains.

Assuntos

Aprendizado Profundo , Descoberta de Drogas , Descoberta de Drogas/métodos , Humanos , Algoritmos , Redes Neurais de Computação

4.

Glypred: Lysine Glycation Site Prediction via CCU-LightGBM-BiLSTM Framework with Multi-Head Attention Mechanism.

Zuo, Yun; Zhang, Bangyi; Dong, Yinkang; He, Wenying; Bi, Yue; Liu, Xiangrong; Zeng, Xiangxiang; Deng, Zhaohong.

J Chem Inf Model ; 64(16): 6699-6711, 2024 Aug 26.

Artigo em Inglês | MEDLINE | ID: mdl-39121059

RESUMO

Glycation, a type of posttranslational modification, preferentially occurs on lysine and arginine residues, impairing protein functionality and altering characteristics. This process is linked to diseases such as Alzheimer's, diabetes, and atherosclerosis. Traditional wet lab experiments are time-consuming, whereas machine learning has significantly streamlined the prediction of protein glycation sites. Despite promising results, challenges remain, including data imbalance, feature redundancy, and suboptimal classifier performance. This research introduces Glypred, a lysine glycation site prediction model combining ClusterCentroids Undersampling (CCU), LightGBM, and bidirectional long short-term memory network (BiLSTM) methodologies, with an additional multihead attention mechanism integrated into the BiLSTM. To achieve this, the study undertakes several key steps: selecting diverse feature types to capture comprehensive protein information, employing a cluster-based undersampling strategy to balance the data set, using LightGBM for feature selection to enhance model performance, and implementing a bidirectional LSTM network for accurate classification. Together, these approaches ensure that Glypred effectively identifies glycation sites with high accuracy and robustness. For feature encoding, five distinct feature typesâAAC, KMER, DR, PWAA, and EBGWâwere selected to capture a broad spectrum of protein sequence and biological information. These encoded features were integrated and validated to ensure comprehensive protein information acquisition. To address the issue of highly imbalanced positive and negative samples, various undersampling algorithms, including random undersampling, NearMiss, edited nearest neighbor rule, and CCU, were evaluated. CCU was ultimately chosen to remove redundant nonglycated training data, establishing a balanced data set that enhances the model's accuracy and robustness. For feature selection, the LightGBM ensemble learning algorithm was employed to reduce feature dimensionality by identifying the most significant features. This approach accelerates model training, enhances generalization capabilities, and ensures good transferability of the model. Finally, a bidirectional long short-term memory network was used as the classifier, with a network structure designed to capture glycation modification site features from both forward and backward directions. To prevent overfitting, appropriate regularization parameters and dropout rates were introduced, achieving efficient classification. Experimental results show that Glypred achieved optimal performance. This model provides new insights for bioinformatics and encourages the application of similar strategies in other fields. A lysine glycation site prediction software tool was also developed using the PyQt5 library, offering researchers an auxiliary screening tool to reduce workload and improve efficiency. The software and data sets are available on GitHub: https://github.com/ZBYnb/Glypred.

Assuntos

Lisina , Glicosilação , Lisina/química , Lisina/metabolismo , Proteínas/química , Proteínas/metabolismo , Aprendizado de Máquina , Biologia Computacional/métodos , Humanos , Redes Neurais de Computação , Bases de Dados de Proteínas

5.

ECD-CDGI: An efficient energy-constrained diffusion model for cancer driver gene identification.

Wang, Tao; Zhuo, Linlin; Chen, Yifan; Fu, Xiangzheng; Zeng, Xiangxiang; Zou, Quan.

PLoS Comput Biol ; 20(8): e1012400, 2024 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-39213450

RESUMO

The identification of cancer driver genes (CDGs) poses challenges due to the intricate interdependencies among genes and the influence of measurement errors and noise. We propose a novel energy-constrained diffusion (ECD)-based model for identifying CDGs, termed ECD-CDGI. This model is the first to design an ECD-Attention encoder by combining the ECD technique with an attention mechanism. ECD-Attention encoder excels at generating robust gene representations that reveal the complex interdependencies among genes while reducing the impact of data noise. We concatenate topological embedding extracted from gene-gene networks through graph transformers to these gene representations. We conduct extensive experiments across three testing scenarios. Extensive experiments show that the ECD-CDGI model possesses the ability to not only be proficient in identifying known CDGs but also efficiently uncover unknown potential CDGs. Furthermore, compared to the GNN-based approach, the ECD-CDGI model exhibits fewer constraints by existing gene-gene networks, thereby enhancing its capability to identify CDGs. Additionally, ECD-CDGI is open-source and freely available. We have also launched the model as a complimentary online tool specifically crafted to expedite research efforts focused on CDGs identification.

Assuntos

Biologia Computacional , Redes Reguladoras de Genes , Neoplasias , Humanos , Biologia Computacional/métodos , Redes Reguladoras de Genes/genética , Neoplasias/genética , Modelos Genéticos , Algoritmos , Oncogenes/genética , Genes Neoplásicos/genética , Bases de Dados Genéticas

6.

A Foundation Model Identifies Broad-Spectrum Antimicrobial Peptides against Drug-Resistant Bacterial Infection.

Li, Tingting; Ren, Xuanbai; Luo, Xiaoli; Wang, Zhuole; Li, Zhenlu; Luo, Xiaoyan; Shen, Jun; Li, Yun; Yuan, Dan; Nussinov, Ruth; Zeng, Xiangxiang; Shi, Junfeng; Cheng, Feixiong.

Nat Commun ; 15(1): 7538, 2024 Aug 30.

Artigo em Inglês | MEDLINE | ID: mdl-39214978

RESUMO

Development of potent and broad-spectrum antimicrobial peptides (AMPs) could help overcome the antimicrobial resistance crisis. We develop a peptide language-based deep generative framework (deepAMP) for identifying potent, broad-spectrum AMPs. Using deepAMP to reduce antimicrobial resistance and enhance the membrane-disrupting abilities of AMPs, we identify, synthesize, and experimentally test 18 T1-AMP (Tier 1) and 11 T2-AMP (Tier 2) candidates in a two-round design and by employing cross-optimization-validation. More than 90% of the designed AMPs show a better inhibition than penetratin in both Gram-positive (i.e., S. aureus) and Gram-negative bacteria (i.e., K. pneumoniae and P. aeruginosa). T2-9 shows the strongest antibacterial activity, comparable to FDA-approved antibiotics. We show that three AMPs (T1-2, T1-5 and T2-10) significantly reduce resistance to S. aureus compared to ciprofloxacin and are effective against skin wound infection in a female wound mouse model infected with P. aeruginosa. In summary, deepAMP expedites discovery of effective, broad-spectrum AMPs against drug-resistant bacteria.

Assuntos

Antibacterianos , Peptídeos Antimicrobianos , Testes de Sensibilidade Microbiana , Animais , Camundongos , Feminino , Antibacterianos/farmacologia , Antibacterianos/uso terapêutico , Peptídeos Antimicrobianos/farmacologia , Peptídeos Antimicrobianos/química , Farmacorresistência Bacteriana/efeitos dos fármacos , Staphylococcus aureus/efeitos dos fármacos , Pseudomonas aeruginosa/efeitos dos fármacos , Modelos Animais de Doenças , Infecção dos Ferimentos/tratamento farmacológico , Infecção dos Ferimentos/microbiologia , Humanos , Infecções Bacterianas/tratamento farmacológico , Infecções Bacterianas/microbiologia , Bactérias Gram-Negativas/efeitos dos fármacos , Peptídeos Catiônicos Antimicrobianos/farmacologia

7.

Relationship Between Temporomandibular Joint Effusion, Pain, and Jaw Function Limitation: A 2D and 3D Comparative Study.

Lau Rui Han, Sophie; Xiang, Jie; Zeng, Xiang-Xiang; Fan, Pei-Di; Cheng, Qiao-Yu; Zhou, Xue-Man; Ye, Zheng; Xiong, Xin; Wang, Jun.

J Pain Res ; 17: 2051-2062, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38881762

RESUMO

Purpose: This study aimed to investigate the relationship between temporomandibular joint (TMJ) effusion and TMJ pain, as well as jaw function limitation in patients via two-dimensional (2D) and three-dimensional (3D) magnetic resonance imaging (MRI) evaluation. Patients and Methods: 121 patients diagnosed with temporomandibular disorder (TMD) were included. TMJ effusion was assessed qualitatively using MRI and quantified with 3D Slicer software, then graded accordingly. In addition, a visual analogue scale (VAS) was employed for pain reporting and an 8-item Jaw Functional Limitations Scale (JFLS-8) was utilized to evaluate jaw function limitation. Statistical analyses were performed appropriately for group comparisons and association determination. A probability of p<0.05 was considered statistically significant. Results: 2D qualitative and 3D quantitative strategies were in high agreement for TMJ effusion grades (κ = 0.766). No significant associations were found between joint effusion and TMJ pain, nor with disc displacement and JLFS-8 scores. Moreover, the binary logistic regression analysis showed significant association between sex and the presence of TMJ effusion, exhibiting an Odds Ratio of 5.168 for females (p = 0.008). Conclusion: 2D qualitative evaluation was as effective as 3D quantitative assessment for TMJ effusion diagnosis. No significant associations were found between TMJ effusion and TMJ pain, disc displacement or jaw function limitation. However, it was suggested that female patients suffering from TMD may be at a risk for TMJ effusion. Further prospective research is needed for validation.

8.

Evolutionary Multiobjective Molecule Optimization in an Implicit Chemical Space.

Xia, Xin; Liu, Yiping; Zheng, Chunhou; Zhang, Xingyi; Wu, Qingwen; Gao, Xin; Zeng, Xiangxiang; Su, Yansen.

J Chem Inf Model ; 64(13): 5161-5174, 2024 Jul 08.

Artigo em Inglês | MEDLINE | ID: mdl-38870455

RESUMO

Optimization techniques play a pivotal role in advancing drug development, serving as the foundation of numerous generative methods tailored to efficiently design optimized molecules derived from existing lead compounds. However, existing methods often encounter difficulties in generating diverse, novel, and high-property molecules that simultaneously optimize multiple drug properties. To overcome this bottleneck, we propose a multiobjective molecule optimization framework (MOMO). MOMO employs a specially designed Pareto-based multiproperty evaluation strategy at the molecular sequence level to guide the evolutionary search in an implicit chemical space. A comparative analysis of MOMO with five state-of-the-art methods across two benchmark multiproperty molecule optimization tasks reveals that MOMO markedly outperforms them in terms of diversity, novelty, and optimized properties. The practical applicability of MOMO in drug discovery has also been validated on four challenging tasks in the real-world discovery problem. These results suggest that MOMO can provide a useful tool to facilitate molecule optimization problems with multiple properties.

Assuntos

Descoberta de Drogas , Descoberta de Drogas/métodos , Desenho de Fármacos , Algoritmos

9.

HydrogelFinder: A Foundation Model for Efficient Self-Assembling Peptide Discovery Guided by Non-Peptidal Small Molecules.

Ren, Xuanbai; Wei, Jiaying; Luo, Xiaoli; Liu, Yuansheng; Li, Kenli; Zhang, Qiang; Gao, Xin; Yan, Sizhe; Wu, Xia; Jiang, Xingyue; Liu, Mingquan; Cao, Dongsheng; Wei, Leyi; Zeng, Xiangxiang; Shi, Junfeng.

Adv Sci (Weinh) ; 11(26): e2400829, 2024 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-38704695

RESUMO

Self-assembling peptides have numerous applications in medicine, food chemistry, and nanotechnology. However, their discovery has traditionally been serendipitous rather than driven by rational design. Here, HydrogelFinder, a foundation model is developed for the rational design of self-assembling peptides from scratch. This model explores the self-assembly properties by molecular structure, leveraging 1,377 self-assembling non-peptidal small molecules to navigate chemical space and improve structural diversity. Utilizing HydrogelFinder, 111 peptide candidates are generated and synthesized 17 peptides, subsequently experimentally validating the self-assembly and biophysical characteristics of nine peptides ranging from 1-10 amino acids-all achieved within a 19-day workflow. Notably, the two de novo-designed self-assembling peptides demonstrated low cytotoxicity and biocompatibility, as confirmed by live/dead assays. This work highlights the capacity of HydrogelFinder to diversify the design of self-assembling peptides through non-peptidal small molecules, offering a powerful toolkit and paradigm for future peptide discovery endeavors.

Assuntos

Peptídeos , Peptídeos/química

10.

Exploring the Conformational Ensembles of Protein-Protein Complex with Transformer-Based Generative Model.

Wang, Jianmin; Wang, Xun; Chu, Yanyi; Li, Chunyan; Li, Xue; Meng, Xiangyu; Fang, Yitian; No, Kyoung Tai; Mao, Jiashun; Zeng, Xiangxiang.

J Chem Theory Comput ; 20(11): 4469-4480, 2024 Jun 11.

Artigo em Inglês | MEDLINE | ID: mdl-38816696

RESUMO

Protein-protein interactions are the basis of many protein functions, and understanding the contact and conformational changes of protein-protein interactions is crucial for linking the protein structure to biological function. Although difficult to detect experimentally, molecular dynamics (MD) simulations are widely used to study the conformational ensembles and dynamics of protein-protein complexes, but there are significant limitations in sampling efficiency and computational costs. In this study, a generative neural network was trained on protein-protein complex conformations obtained from molecular simulations to directly generate novel conformations with physical realism. We demonstrated the use of a deep learning model based on the transformer architecture to explore the conformational ensembles of protein-protein complexes through MD simulations. The results showed that the learned latent space can be used to generate unsampled conformations of protein-protein complexes for obtaining new conformations complementing pre-existing ones, which can be used as an exploratory tool for the analysis and enhancement of molecular simulations of protein-protein complexes.

Assuntos

Simulação de Dinâmica Molecular , Conformação Proteica , Proteínas , Proteínas/química , Redes Neurais de Computação , Ligação Proteica

11.

Enhancing Molecular Property Prediction through Task-Oriented Transfer Learning: Integrating Universal Structural Insights and Domain-Specific Knowledge.

Duan, Yanjing; Yang, Xixi; Zeng, Xiangxiang; Wang, Wenxuan; Deng, Youchao; Cao, Dongsheng.

J Med Chem ; 67(11): 9575-9586, 2024 Jun 13.

Artigo em Inglês | MEDLINE | ID: mdl-38748846

RESUMO

Precisely predicting molecular properties is crucial in drug discovery, but the scarcity of labeled data poses a challenge for applying deep learning methods. While large-scale self-supervised pretraining has proven an effective solution, it often neglects domain-specific knowledge. To tackle this issue, we introduce Task-Oriented Multilevel Learning based on BERT (TOML-BERT), a dual-level pretraining framework that considers both structural patterns and domain knowledge of molecules. TOML-BERT achieved state-of-the-art prediction performance on 10 pharmaceutical datasets. It has the capability to mine contextual information within molecular structures and extract domain knowledge from massive pseudo-labeled data. The dual-level pretraining accomplished significant positive transfer, with its two components making complementary contributions. Interpretive analysis elucidated that the effectiveness of the dual-level pretraining lies in the prior learning of a task-related molecular representation. Overall, TOML-BERT demonstrates the potential of combining multiple pretraining tasks to extract task-oriented knowledge, advancing molecular property prediction in drug discovery.

Assuntos

Descoberta de Drogas , Descoberta de Drogas/métodos , Aprendizado Profundo , Estrutura Molecular

12.

Monodirectional tissue P systems with proteins on cells.

Song, Bosheng; Hu, Chuanlong; Zeng, Xiangxiang.

IEEE Trans Nanobioscience ; PP2024 May 23.

Artigo em Inglês | MEDLINE | ID: mdl-38781071

RESUMO

A variant of tissue-like P systems is known as monodirectional tissue P systems, where objects only have one direction to move between two regions. In this article, a special kind of objects named proteins are added to monodirectional tissue P systems, which can control objects moving between regions, and such computational models are named as monodirectional tissue P systems with proteins on cells (PMT P systems). We discuss the computational properties of PMT P systems. In more detail, PMT P systems employing two cells, one protein controlling a rule, and at most one object used in each symport rule are capable of achievement of Turing universality. In addition, PMT P systems using one protein controlling a rule, and at most one object used in each symport rule can effectively solve the Boolean satisfiability problem (simply SAT).

13.

ChemFH: an integrated tool for screening frequent false positives in chemical biology and drug discovery.

Shi, Shaohua; Fu, Li; Yi, Jiacai; Yang, Ziyi; Zhang, Xiaochen; Deng, Youchao; Wang, Wenxuan; Wu, Chengkun; Zhao, Wentao; Hou, Tingjun; Zeng, Xiangxiang; Lyu, Aiping; Cao, Dongsheng.

Nucleic Acids Res ; 52(W1): W439-W449, 2024 Jul 05.

Artigo em Inglês | MEDLINE | ID: mdl-38783035

RESUMO

High-throughput screening rapidly tests an extensive array of chemical compounds to identify hit compounds for specific biological targets in drug discovery. However, false-positive results disrupt hit compound screening, leading to wastage of time and resources. To address this, we propose ChemFH, an integrated online platform facilitating rapid virtual evaluation of potential false positives, including colloidal aggregators, spectroscopic interference compounds, firefly luciferase inhibitors, chemical reactive compounds, promiscuous compounds, and other assay interferences. By leveraging a dataset containing 823 391 compounds, we constructed high-quality prediction models using multi-task directed message-passing network (DMPNN) architectures combining uncertainty estimation, yielding an average AUC value of 0.91. Furthermore, ChemFH incorporated 1441 representative alert substructures derived from the collected data and ten commonly used frequent hitter screening rules. ChemFH was validated with an external set of 75 compounds. Subsequently, the virtual screening capability of ChemFH was successfully confirmed through its application to five virtual screening libraries. Furthermore, ChemFH underwent additional validation on two natural products and FDA-approved drugs, yielding reliable and accurate results. ChemFH is a comprehensive, reliable, and computationally efficient screening pipeline that facilitates the identification of true positive results in assays, contributing to enhanced efficiency and success rates in drug discovery. ChemFH is freely available via https://chemfh.scbdd.com/.

Assuntos

Descoberta de Drogas , Ensaios de Triagem em Larga Escala , Software , Descoberta de Drogas/métodos , Ensaios de Triagem em Larga Escala/métodos , Avaliação Pré-Clínica de Medicamentos/métodos , Reações Falso-Positivas , Bibliotecas de Moléculas Pequenas/farmacologia , Bibliotecas de Moléculas Pequenas/química , Humanos

14.

Editorial: Artificial intelligence in drug discovery and development.

Wei, Leyi; Zou, Quan; Zeng, Xiangxiang.

Methods ; 226: 133-137, 2024 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-38582311

Assuntos

Inteligência Artificial , Descoberta de Drogas , Descoberta de Drogas/métodos , Humanos , Desenvolvimento de Medicamentos/métodos , Desenvolvimento de Medicamentos/tendências

15.

ADMETlab 3.0: an updated comprehensive online ADMET prediction platform enhanced with broader coverage, improved performance, API functionality and decision support.

Fu, Li; Shi, Shaohua; Yi, Jiacai; Wang, Ningning; He, Yuanhang; Wu, Zhenxing; Peng, Jinfu; Deng, Youchao; Wang, Wenxuan; Wu, Chengkun; Lyu, Aiping; Zeng, Xiangxiang; Zhao, Wentao; Hou, Tingjun; Cao, Dongsheng.

Nucleic Acids Res ; 52(W1): W422-W431, 2024 Jul 05.

Artigo em Inglês | MEDLINE | ID: mdl-38572755

RESUMO

ADMETlab 3.0 is the second updated version of the web server that provides a comprehensive and efficient platform for evaluating ADMET-related parameters as well as physicochemical properties and medicinal chemistry characteristics involved in the drug discovery process. This new release addresses the limitations of the previous version and offers broader coverage, improved performance, API functionality, and decision support. For supporting data and endpoints, this version includes 119 features, an increase of 31 compared to the previous version. The updated number of entries is 1.5 times larger than the previous version with over 400 000 entries. ADMETlab 3.0 incorporates a multi-task DMPNN architecture coupled with molecular descriptors, a method that not only guaranteed calculation speed for each endpoint simultaneously, but also achieved a superior performance in terms of accuracy and robustness. In addition, an API has been introduced to meet the growing demand for programmatic access to large amounts of data in ADMETlab 3.0. Moreover, this version includes uncertainty estimates in the prediction results, aiding in the confident selection of candidate compounds for further studies and experiments. ADMETlab 3.0 is publicly for access without the need for registration at: https://admetlab3.scbdd.com.

Assuntos

Descoberta de Drogas , Internet , Software , Descoberta de Drogas/métodos , Humanos , Preparações Farmacêuticas/química , Preparações Farmacêuticas/metabolismo

16.

The present state and challenges of active learning in drug discovery.

Wang, Lei; Zhou, Zhenran; Yang, Xixi; Shi, Shaohua; Zeng, Xiangxiang; Cao, Dongsheng.

Drug Discov Today ; 29(6): 103985, 2024 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-38642700

RESUMO

Active learning (AL) is an iterative feedback process that efficiently identifies valuable data within vast chemical space, even with limited labeled data. This characteristic renders it a valuable approach to tackle the ongoing challenges faced in drug discovery, such as the ever-expanding explore space and the limitations of labeled data. Consequently, AL is increasingly gaining prominence in the field of drug development. In this paper, we comprehensively review the application of AL at all stages of drug discovery, including compounds-target interaction prediction, virtual screening, molecular generation and optimization, as well as molecular properties prediction. Additionally, we discuss the challenges and prospects associated with the current applications of AL in drug discovery.

Assuntos

Descoberta de Drogas , Descoberta de Drogas/métodos , Humanos , Aprendizagem Baseada em Problemas , Desenvolvimento de Medicamentos/métodos

17.

scCAN: Clustering With Adaptive Neighbor-Based Imputation Method for Single-Cell RNA-Seq Data.

Dong, Shujie; Liu, Yuansheng; Gong, Yongshun; Dong, Xiangjun; Zeng, Xiangxiang.

IEEE/ACM Trans Comput Biol Bioinform ; 21(1): 95-105, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38285569

RESUMO

Single-cell RNA sequencing (scRNA-seq) is widely used to study cellular heterogeneity in different samples. However, due to technical deficiencies, dropout events often result in zero gene expression values in the gene expression matrix. In this paper, we propose a new imputation method called scCAN, based on adaptive neighborhood clustering, to estimate the zero value of dropouts. Our method continuously updates cell-cell similarity information by simultaneously learning similarity relationships, clustering structures, and imposing new rank constraints on the Laplacian matrix of the similarity matrix, improving the imputation of dropout zero values. To evaluate the performance of this method, we used four simulated and eight real scRNA-seq data for downstream analyses, including cell clustering, recovered gene expression, and reconstructed cell trajectories. Our method improves the performance of the downstream analysis and is better than other imputation methods.

Assuntos

Perfilação da Expressão Gênica , Análise da Expressão Gênica de Célula Única , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Análise por Conglomerados

18.

Comprehensive evaluation of molecule property prediction with ChatGPT.

Cai, Xibao; Lai, Houtim; Wang, Xing; Wang, Longyue; Liu, Wei; Wang, Yijun; Wang, Zixu; Cao, Dongsheng; Zeng, Xiangxiang.

Methods ; 222: 133-141, 2024 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-38242382

RESUMO

The versatility of ChatGPT in performing a diverse range of tasks has elicited considerable interest on its potential applications within professional fields. Taking drug discovery as a testbed, this paper provides a comprehensive evaluation of ChatGPT's ability on molecule property prediction. The study focuses on three aspects: 1) Effects of different prompt settings, where we investigate the impact of varying prompts on the prediction outcomes of ChatGPT; 2) Comprehensive evaluation on molecule property prediction, where we conduct a comprehensive evaluation on 53 ADMET-related endpoints; 3) Analysis of ChatGPT's potential and limitations, where we make comparisons with models tailored for molecule property prediction, thus gaining a more accurate understanding of ChatGPT's capabilities and limitations in this area. Through comprehensive evaluation, we find that 1) With appropriate prompt settings, ChatGPT can attain satisfactory prediction outcomes that are competitive with specialized models designed for those tasks. 2) Prompt settings significantly affect ChatGPT's performance. Among all prompt settings, the strategy of selecting examples in few-shot has the greatest impact on results. Scaffold sampling greatly outperforms random sampling. 3) The capacity of ChatGPT to accomplish high-precision predictions is significantly influenced by the quality of examples provided, which may constrain its practical applicability in real-world scenarios. This work highlights ChatGPT's potential and limitations on molecule property prediction, which we hope can inspire future design and evaluation of Large Language Models within scientific domains.

Assuntos

Descoberta de Drogas , Projetos de Pesquisa

19.

OptADMET: a web-based tool for substructure modifications to improve ADMET properties of lead compounds.

Yi, Jiacai; Shi, Shaohua; Fu, Li; Yang, Ziyi; Nie, Pengfei; Lu, Aiping; Wu, Chengkun; Deng, Yafeng; Hsieh, Changyu; Zeng, Xiangxiang; Hou, Tingjun; Cao, Dongsheng.

Nat Protoc ; 19(4): 1105-1121, 2024 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-38263521

RESUMO

Lead optimization is a crucial step in the drug discovery process, which aims to design potential drug candidates from biologically active hits. During lead optimization, active hits undergo modifications to improve their absorption, distribution, metabolism, excretion and toxicity (ADMET) profiles. Medicinal chemists face key questions regarding which compound(s) should be synthesized next and how to balance multiple ADMET properties. Reliable transformation rules from multiple experimental analyses are critical to improve this decision-making process. We developed OptADMET ( https://cadd.nscc-tj.cn/deploy/optadmet/ ), an integrated web-based platform that provides chemical transformation rules for 32 ADMET properties and leverages prior experimental data for lead optimization. The multiproperty transformation rule database contains a total of 41,779 validated transformation rules generated from the analysis of 177,191 reliable experimental datasets. Additionally, 146,450 rules were generated by analyzing 239,194 molecular data predictions. OptADMET provides the ADMET profiles of all optimized molecules from the queried molecule and enables the prediction of desirable substructure transformations and subsequent validation of drug candidates. OptADMET is based on matched molecular pairs analysis derived from synthetic chemistry, thus providing improved practicality over other methods. OptADMET is designed for use by both experimental and computational scientists.

Assuntos

Descoberta de Drogas , Internet , Bases de Dados Factuais

20.

Deep Generative Models in De Novo Drug Molecule Generation.

Pang, Chao; Qiao, Jianbo; Zeng, Xiangxiang; Zou, Quan; Wei, Leyi.

J Chem Inf Model ; 64(7): 2174-2194, 2024 Apr 08.

Artigo em Inglês | MEDLINE | ID: mdl-37934070

RESUMO

The discovery of new drugs has important implications for human health. Traditional methods for drug discovery rely on experiments to optimize the structure of lead molecules, which are time-consuming and high-cost. Recently, artificial intelligence has exhibited promising and efficient performance for drug-like molecule generation. In particular, deep generative models achieve great success in de novo generation of drug-like molecules with desired properties, showing massive potential for novel drug discovery. In this study, we review the recent progress of molecule generation using deep generative models, mainly focusing on molecule representations, public databases, data processing tools, and advanced artificial intelligence based molecule generation frameworks. In particular, we present a comprehensive comparison of state-of-the-art deep generative models for molecule generation and a summary of commonly used molecular design strategies. We identify research gaps and challenges of molecule generation such as the need for better databases, missing 3D information in molecular representation, and the lack of high-precision evaluation metrics. We suggest future directions for molecular generation and drug discovery.

Assuntos

Inteligência Artificial , Benchmarking , Humanos , Bases de Dados Factuais , Descoberta de Drogas , Desenho de Fármacos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA