Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 133
Filter
1.
Brief Bioinform ; 25(5)2024 Jul 25.
Article in English | MEDLINE | ID: mdl-39133096

ABSTRACT

The molecular property prediction (MPP) plays a crucial role in the drug discovery process, providing valuable insights for molecule evaluation and screening. Although deep learning has achieved numerous advances in this area, its success often depends on the availability of substantial labeled data. The few-shot MPP is a more challenging scenario, which aims to identify unseen property with only few available molecules. In this paper, we propose an attribute-guided prototype network (APN) to address the challenge. APN first introduces an molecular attribute extractor, which can not only extract three different types of fingerprint attributes (single fingerprint attributes, dual fingerprint attributes, triplet fingerprint attributes) by considering seven circular-based, five path-based, and two substructure-based fingerprints, but also automatically extract deep attributes from self-supervised learning methods. Furthermore, APN designs the Attribute-Guided Dual-channel Attention module to learn the relationship between the molecular graphs and attributes and refine the local and global representation of the molecules. Compared with existing works, APN leverages high-level human-defined attributes and helps the model to explicitly generalize knowledge in molecular graphs. Experiments on benchmark datasets show that APN can achieve state-of-the-art performance in most cases and demonstrate that the attributes are effective for improving few-shot MPP performance. In addition, the strong generalization ability of APN is verified by conducting experiments on data from different domains.


Subject(s)
Deep Learning , Drug Discovery , Drug Discovery/methods , Humans , Algorithms , Neural Networks, Computer
2.
J Chem Inf Model ; 64(16): 6699-6711, 2024 Aug 26.
Article in English | MEDLINE | ID: mdl-39121059

ABSTRACT

Glycation, a type of posttranslational modification, preferentially occurs on lysine and arginine residues, impairing protein functionality and altering characteristics. This process is linked to diseases such as Alzheimer's, diabetes, and atherosclerosis. Traditional wet lab experiments are time-consuming, whereas machine learning has significantly streamlined the prediction of protein glycation sites. Despite promising results, challenges remain, including data imbalance, feature redundancy, and suboptimal classifier performance. This research introduces Glypred, a lysine glycation site prediction model combining ClusterCentroids Undersampling (CCU), LightGBM, and bidirectional long short-term memory network (BiLSTM) methodologies, with an additional multihead attention mechanism integrated into the BiLSTM. To achieve this, the study undertakes several key steps: selecting diverse feature types to capture comprehensive protein information, employing a cluster-based undersampling strategy to balance the data set, using LightGBM for feature selection to enhance model performance, and implementing a bidirectional LSTM network for accurate classification. Together, these approaches ensure that Glypred effectively identifies glycation sites with high accuracy and robustness. For feature encoding, five distinct feature types─AAC, KMER, DR, PWAA, and EBGW─were selected to capture a broad spectrum of protein sequence and biological information. These encoded features were integrated and validated to ensure comprehensive protein information acquisition. To address the issue of highly imbalanced positive and negative samples, various undersampling algorithms, including random undersampling, NearMiss, edited nearest neighbor rule, and CCU, were evaluated. CCU was ultimately chosen to remove redundant nonglycated training data, establishing a balanced data set that enhances the model's accuracy and robustness. For feature selection, the LightGBM ensemble learning algorithm was employed to reduce feature dimensionality by identifying the most significant features. This approach accelerates model training, enhances generalization capabilities, and ensures good transferability of the model. Finally, a bidirectional long short-term memory network was used as the classifier, with a network structure designed to capture glycation modification site features from both forward and backward directions. To prevent overfitting, appropriate regularization parameters and dropout rates were introduced, achieving efficient classification. Experimental results show that Glypred achieved optimal performance. This model provides new insights for bioinformatics and encourages the application of similar strategies in other fields. A lysine glycation site prediction software tool was also developed using the PyQt5 library, offering researchers an auxiliary screening tool to reduce workload and improve efficiency. The software and data sets are available on GitHub: https://github.com/ZBYnb/Glypred.


Subject(s)
Lysine , Glycosylation , Lysine/chemistry , Lysine/metabolism , Proteins/chemistry , Proteins/metabolism , Machine Learning , Computational Biology/methods , Humans , Neural Networks, Computer , Databases, Protein
3.
J Pain Res ; 17: 2051-2062, 2024.
Article in English | MEDLINE | ID: mdl-38881762

ABSTRACT

Purpose: This study aimed to investigate the relationship between temporomandibular joint (TMJ) effusion and TMJ pain, as well as jaw function limitation in patients via two-dimensional (2D) and three-dimensional (3D) magnetic resonance imaging (MRI) evaluation. Patients and Methods: 121 patients diagnosed with temporomandibular disorder (TMD) were included. TMJ effusion was assessed qualitatively using MRI and quantified with 3D Slicer software, then graded accordingly. In addition, a visual analogue scale (VAS) was employed for pain reporting and an 8-item Jaw Functional Limitations Scale (JFLS-8) was utilized to evaluate jaw function limitation. Statistical analyses were performed appropriately for group comparisons and association determination. A probability of p<0.05 was considered statistically significant. Results: 2D qualitative and 3D quantitative strategies were in high agreement for TMJ effusion grades (κ = 0.766). No significant associations were found between joint effusion and TMJ pain, nor with disc displacement and JLFS-8 scores. Moreover, the binary logistic regression analysis showed significant association between sex and the presence of TMJ effusion, exhibiting an Odds Ratio of 5.168 for females (p = 0.008). Conclusion: 2D qualitative evaluation was as effective as 3D quantitative assessment for TMJ effusion diagnosis. No significant associations were found between TMJ effusion and TMJ pain, disc displacement or jaw function limitation. However, it was suggested that female patients suffering from TMD may be at a risk for TMJ effusion. Further prospective research is needed for validation.

4.
J Chem Inf Model ; 64(13): 5161-5174, 2024 Jul 08.
Article in English | MEDLINE | ID: mdl-38870455

ABSTRACT

Optimization techniques play a pivotal role in advancing drug development, serving as the foundation of numerous generative methods tailored to efficiently design optimized molecules derived from existing lead compounds. However, existing methods often encounter difficulties in generating diverse, novel, and high-property molecules that simultaneously optimize multiple drug properties. To overcome this bottleneck, we propose a multiobjective molecule optimization framework (MOMO). MOMO employs a specially designed Pareto-based multiproperty evaluation strategy at the molecular sequence level to guide the evolutionary search in an implicit chemical space. A comparative analysis of MOMO with five state-of-the-art methods across two benchmark multiproperty molecule optimization tasks reveals that MOMO markedly outperforms them in terms of diversity, novelty, and optimized properties. The practical applicability of MOMO in drug discovery has also been validated on four challenging tasks in the real-world discovery problem. These results suggest that MOMO can provide a useful tool to facilitate molecule optimization problems with multiple properties.


Subject(s)
Drug Discovery , Drug Discovery/methods , Drug Design , Algorithms
5.
Article in English | MEDLINE | ID: mdl-38781071

ABSTRACT

A variant of tissue-like P systems is known as monodirectional tissue P systems, where objects only have one direction to move between two regions. In this article, a special kind of objects named proteins are added to monodirectional tissue P systems, which can control objects moving between regions, and such computational models are named as monodirectional tissue P systems with proteins on cells (PMT P systems). We discuss the computational properties of PMT P systems. In more detail, PMT P systems employing two cells, one protein controlling a rule, and at most one object used in each symport rule are capable of achievement of Turing universality. In addition, PMT P systems using one protein controlling a rule, and at most one object used in each symport rule can effectively solve the Boolean satisfiability problem (simply SAT).

6.
Nucleic Acids Res ; 52(W1): W439-W449, 2024 Jul 05.
Article in English | MEDLINE | ID: mdl-38783035

ABSTRACT

High-throughput screening rapidly tests an extensive array of chemical compounds to identify hit compounds for specific biological targets in drug discovery. However, false-positive results disrupt hit compound screening, leading to wastage of time and resources. To address this, we propose ChemFH, an integrated online platform facilitating rapid virtual evaluation of potential false positives, including colloidal aggregators, spectroscopic interference compounds, firefly luciferase inhibitors, chemical reactive compounds, promiscuous compounds, and other assay interferences. By leveraging a dataset containing 823 391 compounds, we constructed high-quality prediction models using multi-task directed message-passing network (DMPNN) architectures combining uncertainty estimation, yielding an average AUC value of 0.91. Furthermore, ChemFH incorporated 1441 representative alert substructures derived from the collected data and ten commonly used frequent hitter screening rules. ChemFH was validated with an external set of 75 compounds. Subsequently, the virtual screening capability of ChemFH was successfully confirmed through its application to five virtual screening libraries. Furthermore, ChemFH underwent additional validation on two natural products and FDA-approved drugs, yielding reliable and accurate results. ChemFH is a comprehensive, reliable, and computationally efficient screening pipeline that facilitates the identification of true positive results in assays, contributing to enhanced efficiency and success rates in drug discovery. ChemFH is freely available via https://chemfh.scbdd.com/.


Subject(s)
Drug Discovery , High-Throughput Screening Assays , Software , Drug Discovery/methods , High-Throughput Screening Assays/methods , Drug Evaluation, Preclinical/methods , False Positive Reactions , Small Molecule Libraries/pharmacology , Small Molecule Libraries/chemistry , Humans
7.
J Chem Theory Comput ; 20(11): 4469-4480, 2024 Jun 11.
Article in English | MEDLINE | ID: mdl-38816696

ABSTRACT

Protein-protein interactions are the basis of many protein functions, and understanding the contact and conformational changes of protein-protein interactions is crucial for linking the protein structure to biological function. Although difficult to detect experimentally, molecular dynamics (MD) simulations are widely used to study the conformational ensembles and dynamics of protein-protein complexes, but there are significant limitations in sampling efficiency and computational costs. In this study, a generative neural network was trained on protein-protein complex conformations obtained from molecular simulations to directly generate novel conformations with physical realism. We demonstrated the use of a deep learning model based on the transformer architecture to explore the conformational ensembles of protein-protein complexes through MD simulations. The results showed that the learned latent space can be used to generate unsampled conformations of protein-protein complexes for obtaining new conformations complementing pre-existing ones, which can be used as an exploratory tool for the analysis and enhancement of molecular simulations of protein-protein complexes.


Subject(s)
Molecular Dynamics Simulation , Protein Conformation , Proteins , Proteins/chemistry , Neural Networks, Computer , Protein Binding
8.
Adv Sci (Weinh) ; 11(26): e2400829, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38704695

ABSTRACT

Self-assembling peptides have numerous applications in medicine, food chemistry, and nanotechnology. However, their discovery has traditionally been serendipitous rather than driven by rational design. Here, HydrogelFinder, a foundation model is developed for the rational design of self-assembling peptides from scratch. This model explores the self-assembly properties by molecular structure, leveraging 1,377 self-assembling non-peptidal small molecules to navigate chemical space and improve structural diversity. Utilizing HydrogelFinder, 111 peptide candidates are generated and synthesized 17 peptides, subsequently experimentally validating the self-assembly and biophysical characteristics of nine peptides ranging from 1-10 amino acids-all achieved within a 19-day workflow. Notably, the two de novo-designed self-assembling peptides demonstrated low cytotoxicity and biocompatibility, as confirmed by live/dead assays. This work highlights the capacity of HydrogelFinder to diversify the design of self-assembling peptides through non-peptidal small molecules, offering a powerful toolkit and paradigm for future peptide discovery endeavors.


Subject(s)
Peptides , Peptides/chemistry
9.
J Med Chem ; 67(11): 9575-9586, 2024 Jun 13.
Article in English | MEDLINE | ID: mdl-38748846

ABSTRACT

Precisely predicting molecular properties is crucial in drug discovery, but the scarcity of labeled data poses a challenge for applying deep learning methods. While large-scale self-supervised pretraining has proven an effective solution, it often neglects domain-specific knowledge. To tackle this issue, we introduce Task-Oriented Multilevel Learning based on BERT (TOML-BERT), a dual-level pretraining framework that considers both structural patterns and domain knowledge of molecules. TOML-BERT achieved state-of-the-art prediction performance on 10 pharmaceutical datasets. It has the capability to mine contextual information within molecular structures and extract domain knowledge from massive pseudo-labeled data. The dual-level pretraining accomplished significant positive transfer, with its two components making complementary contributions. Interpretive analysis elucidated that the effectiveness of the dual-level pretraining lies in the prior learning of a task-related molecular representation. Overall, TOML-BERT demonstrates the potential of combining multiple pretraining tasks to extract task-oriented knowledge, advancing molecular property prediction in drug discovery.


Subject(s)
Drug Discovery , Drug Discovery/methods , Deep Learning , Molecular Structure
10.
Nucleic Acids Res ; 52(W1): W422-W431, 2024 Jul 05.
Article in English | MEDLINE | ID: mdl-38572755

ABSTRACT

ADMETlab 3.0 is the second updated version of the web server that provides a comprehensive and efficient platform for evaluating ADMET-related parameters as well as physicochemical properties and medicinal chemistry characteristics involved in the drug discovery process. This new release addresses the limitations of the previous version and offers broader coverage, improved performance, API functionality, and decision support. For supporting data and endpoints, this version includes 119 features, an increase of 31 compared to the previous version. The updated number of entries is 1.5 times larger than the previous version with over 400 000 entries. ADMETlab 3.0 incorporates a multi-task DMPNN architecture coupled with molecular descriptors, a method that not only guaranteed calculation speed for each endpoint simultaneously, but also achieved a superior performance in terms of accuracy and robustness. In addition, an API has been introduced to meet the growing demand for programmatic access to large amounts of data in ADMETlab 3.0. Moreover, this version includes uncertainty estimates in the prediction results, aiding in the confident selection of candidate compounds for further studies and experiments. ADMETlab 3.0 is publicly for access without the need for registration at: https://admetlab3.scbdd.com.


Subject(s)
Drug Discovery , Internet , Software , Drug Discovery/methods , Humans , Pharmaceutical Preparations/chemistry , Pharmaceutical Preparations/metabolism
11.
Drug Discov Today ; 29(6): 103985, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38642700

ABSTRACT

Active learning (AL) is an iterative feedback process that efficiently identifies valuable data within vast chemical space, even with limited labeled data. This characteristic renders it a valuable approach to tackle the ongoing challenges faced in drug discovery, such as the ever-expanding explore space and the limitations of labeled data. Consequently, AL is increasingly gaining prominence in the field of drug development. In this paper, we comprehensively review the application of AL at all stages of drug discovery, including compounds-target interaction prediction, virtual screening, molecular generation and optimization, as well as molecular properties prediction. Additionally, we discuss the challenges and prospects associated with the current applications of AL in drug discovery.


Subject(s)
Drug Discovery , Drug Discovery/methods , Humans , Problem-Based Learning , Drug Development/methods
13.
Nat Protoc ; 19(4): 1105-1121, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38263521

ABSTRACT

Lead optimization is a crucial step in the drug discovery process, which aims to design potential drug candidates from biologically active hits. During lead optimization, active hits undergo modifications to improve their absorption, distribution, metabolism, excretion and toxicity (ADMET) profiles. Medicinal chemists face key questions regarding which compound(s) should be synthesized next and how to balance multiple ADMET properties. Reliable transformation rules from multiple experimental analyses are critical to improve this decision-making process. We developed OptADMET ( https://cadd.nscc-tj.cn/deploy/optadmet/ ), an integrated web-based platform that provides chemical transformation rules for 32 ADMET properties and leverages prior experimental data for lead optimization. The multiproperty transformation rule database contains a total of 41,779 validated transformation rules generated from the analysis of 177,191 reliable experimental datasets. Additionally, 146,450 rules were generated by analyzing 239,194 molecular data predictions. OptADMET provides the ADMET profiles of all optimized molecules from the queried molecule and enables the prediction of desirable substructure transformations and subsequent validation of drug candidates. OptADMET is based on matched molecular pairs analysis derived from synthetic chemistry, thus providing improved practicality over other methods. OptADMET is designed for use by both experimental and computational scientists.


Subject(s)
Drug Discovery , Internet , Databases, Factual
14.
Article in English | MEDLINE | ID: mdl-38285569

ABSTRACT

Single-cell RNA sequencing (scRNA-seq) is widely used to study cellular heterogeneity in different samples. However, due to technical deficiencies, dropout events often result in zero gene expression values in the gene expression matrix. In this paper, we propose a new imputation method called scCAN, based on adaptive neighborhood clustering, to estimate the zero value of dropouts. Our method continuously updates cell-cell similarity information by simultaneously learning similarity relationships, clustering structures, and imposing new rank constraints on the Laplacian matrix of the similarity matrix, improving the imputation of dropout zero values. To evaluate the performance of this method, we used four simulated and eight real scRNA-seq data for downstream analyses, including cell clustering, recovered gene expression, and reconstructed cell trajectories. Our method improves the performance of the downstream analysis and is better than other imputation methods.


Subject(s)
Gene Expression Profiling , Single-Cell Gene Expression Analysis , Sequence Analysis, RNA/methods , Single-Cell Analysis/methods , Cluster Analysis
15.
Methods ; 222: 133-141, 2024 Feb.
Article in English | MEDLINE | ID: mdl-38242382

ABSTRACT

The versatility of ChatGPT in performing a diverse range of tasks has elicited considerable interest on its potential applications within professional fields. Taking drug discovery as a testbed, this paper provides a comprehensive evaluation of ChatGPT's ability on molecule property prediction. The study focuses on three aspects: 1) Effects of different prompt settings, where we investigate the impact of varying prompts on the prediction outcomes of ChatGPT; 2) Comprehensive evaluation on molecule property prediction, where we conduct a comprehensive evaluation on 53 ADMET-related endpoints; 3) Analysis of ChatGPT's potential and limitations, where we make comparisons with models tailored for molecule property prediction, thus gaining a more accurate understanding of ChatGPT's capabilities and limitations in this area. Through comprehensive evaluation, we find that 1) With appropriate prompt settings, ChatGPT can attain satisfactory prediction outcomes that are competitive with specialized models designed for those tasks. 2) Prompt settings significantly affect ChatGPT's performance. Among all prompt settings, the strategy of selecting examples in few-shot has the greatest impact on results. Scaffold sampling greatly outperforms random sampling. 3) The capacity of ChatGPT to accomplish high-precision predictions is significantly influenced by the quality of examples provided, which may constrain its practical applicability in real-world scenarios. This work highlights ChatGPT's potential and limitations on molecule property prediction, which we hope can inspire future design and evaluation of Large Language Models within scientific domains.


Subject(s)
Drug Discovery , Research Design
16.
J Chem Inf Model ; 64(7): 2174-2194, 2024 Apr 08.
Article in English | MEDLINE | ID: mdl-37934070

ABSTRACT

The discovery of new drugs has important implications for human health. Traditional methods for drug discovery rely on experiments to optimize the structure of lead molecules, which are time-consuming and high-cost. Recently, artificial intelligence has exhibited promising and efficient performance for drug-like molecule generation. In particular, deep generative models achieve great success in de novo generation of drug-like molecules with desired properties, showing massive potential for novel drug discovery. In this study, we review the recent progress of molecule generation using deep generative models, mainly focusing on molecule representations, public databases, data processing tools, and advanced artificial intelligence based molecule generation frameworks. In particular, we present a comprehensive comparison of state-of-the-art deep generative models for molecule generation and a summary of commonly used molecular design strategies. We identify research gaps and challenges of molecule generation such as the need for better databases, missing 3D information in molecular representation, and the lack of high-precision evaluation metrics. We suggest future directions for molecular generation and drug discovery.


Subject(s)
Artificial Intelligence , Benchmarking , Humans , Databases, Factual , Drug Discovery , Drug Design
17.
IEEE J Biomed Health Inform ; 28(1): 569-579, 2024 Jan.
Article in English | MEDLINE | ID: mdl-37991904

ABSTRACT

Adverse drug-drug interactions (DDIs) pose potential risks in polypharmacy due to unknown physicochemical incompatibilities between co-administered drugs. Recent studies have utilized multi-layer graph neural network architectures to model hierarchical molecular substructures of drugs, achieving excellent DDI prediction performance. While extant substructural frameworks effectively encode interactions from atom-level features, they overlook valuable chemical bond representations within molecular graphs. More critically, given the multifaceted nature of DDI prediction tasks involving both known and novel drug combinations, previous methods lack tailored strategies to address these distinct scenarios. The resulting lack of adaptability impedes further improvements to model performance. To tackle these challenges, we propose PEB-DDI, a DDI prediction learning framework with enhanced substructure extraction. First, the information of chemical bonds is integrated and synchronously updated with the atomic nodes. Then, different dual-view strategies are selected based on whether novel drugs are present in the prediction task. Particularly, we constructed Molecular fingerprint-Molecular graph view for transductive task, and Bipartite graph-Molecular graph view for inductive task. Rigorous evaluations on benchmark datasets underscore PEB-DDI's superior performance. Notably, on DrugBank, it achieves an outstanding accuracy rate of 98.18% when predicting previously unknown interactions among approved drugs. Even when faced with novel drugs, PEB-DDI consistently exhibits outstanding generalization capabilities with an accuracy rate of 88.06%, attributing to the proper migrating of molecular basic structure learning.


Subject(s)
Neural Networks, Computer , Humans , Drug Interactions
18.
IEEE J Biomed Health Inform ; 28(3): 1564-1574, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38153823

ABSTRACT

The prediction of molecular properties remains a challenging task in the field of drug design and development. Recently, there has been a growing interest in the analysis of biological images. Molecular images, as a novel representation, have proven to be competitive, yet they lack explicit information and detailed semantic richness. Conversely, semantic information in SMILES sequences is explicit but lacks spatial structural details. Therefore, in this study, we focus on and explore the relationship between these two types of representations, proposing a novel multimodal architecture named ISMol. ISMol relies on a cross-attention mechanism to extract information representations of molecules from both images and SMILES strings, thereby predicting molecular properties. Evaluation results on 14 small molecule ADMET datasets indicate that ISMol outperforms machine learning (ML) and deep learning (DL) models based on single-modal representations. In addition, we analyze our method through a large number of experiments to test the superiority, interpretability and generalizability of the method. In summary, ISMol offers a powerful deep learning toolbox for drug discovery in a variety of molecular properties.


Subject(s)
Drug Design , Drug Discovery , Humans , Machine Learning , Semantics
19.
J Chem Inf Model ; 64(1): 238-249, 2024 01 08.
Article in English | MEDLINE | ID: mdl-38103039

ABSTRACT

Drug repositioning plays a key role in disease treatment. With the large-scale chemical data increasing, many computational methods are utilized for drug-disease association prediction. However, most of the existing models neglect the positive influence of non-Euclidean data and multisource information, and there is still a critical issue for graph neural networks regarding how to set the feature diffuse distance. To solve the problems, we proposed SiSGC, which makes full use of the biological knowledge information as initial features and learns the structure information from the constructed heterogeneous graph with the adaptive selection of the information diffuse distance. Then, the structural features are fused with the denoised similarity information and fed to the advanced classifier of CatBoost to make predictions. Three different data sets are used to confirm the robustness and generalization of SiSGC under two splitting strategies. Experiment results demonstrate that the proposed model achieves superior performance compared with the six leading methods and four variants. Our case study on breast neoplasms further indicates that SiSGC is trustworthy and robust yet simple. We also present four drugs for breast cancer treatment with high confidence and further give an explanation for demonstrating the rationality. There is no doubt that SiSGC can be used as a beneficial supplement for drug repositioning.


Subject(s)
Drug Repositioning , Neural Networks, Computer
SELECTION OF CITATIONS
SEARCH DETAIL