Search | VHL Search Portal

DTI-MLCD: predicting drug-target interactions using multi-label learning with community detection method.

Chu, Yanyi; Shan, Xiaoqi; Chen, Tianhang; Jiang, Mingming; Wang, Yanjing; Wang, Qiankun; Salahub, Dennis Russell; Xiong, Yi; Wei, Dong-Qing.

Brief Bioinform ; 22(3)2021 05 20.

Article in English | MEDLINE | ID: mdl-32964234

ABSTRACT

Identifying drug-target interactions (DTIs) is an important step for drug discovery and drug repositioning. To reduce the experimental cost, a large number of computational approaches have been proposed for this task. The machine learning-based models, especially binary classification models, have been developed to predict whether a drug-target pair interacts or not. However, there is still much room for improvement in the performance of current methods. Multi-label learning can overcome some difficulties caused by single-label learning in order to improve the predictive performance. The key challenge faced by multi-label learning is the exponential-sized output space, and considering label correlations can help to overcome this challenge. In this paper, we facilitate multi-label classification by introducing community detection methods for DTI prediction, named DTI-MLCD. Moreover, we updated the gold standard data set by adding 15,000 more positive DTI samples in comparison to the data set, which has widely been used by most of previously published DTI prediction methods since 2008. The proposed DTI-MLCD is applied to both data sets, demonstrating its superiority over other machine learning methods and several existing methods. The data sets and source code of this study are freely available at https://github.com/a96123155/DTI-MLCD.

Subject(s)

Algorithms , Computational Biology/methods , Machine Learning , Pharmaceutical Preparations/metabolism , Proteins/metabolism , Computer Simulation , Drug Discovery/methods , Drug Repositioning/methods , Internet , Molecular Targeted Therapy/methods , Pharmaceutical Preparations/administration & dosage , Pharmaceutical Preparations/chemistry , Protein Binding , Proteins/antagonists & inhibitors , Proteins/chemistry , Reproducibility of Results

DTI-CDF: a cascade deep forest model towards the prediction of drug-target interactions based on hybrid features.

Chu, Yanyi; Kaushik, Aman Chandra; Wang, Xiangeng; Wang, Wei; Zhang, Yufang; Shan, Xiaoqi; Salahub, Dennis Russell; Xiong, Yi; Wei, Dong-Qing.

Brief Bioinform ; 22(1): 451-462, 2021 01 18.

Article in English | MEDLINE | ID: mdl-31885041

ABSTRACT

Drug-target interactions (DTIs) play a crucial role in target-based drug discovery and development. Computational prediction of DTIs can effectively complement experimental wet-lab techniques for the identification of DTIs, which are typically time- and resource-consuming. However, the performances of the current DTI prediction approaches suffer from a problem of low precision and high false-positive rate. In this study, we aim to develop a novel DTI prediction method for improving the prediction performance based on a cascade deep forest (CDF) model, named DTI-CDF, with multiple similarity-based features between drugs and the similarity-based features between target proteins extracted from the heterogeneous graph, which contains known DTIs. In the experiments, we built five replicates of 10-fold cross-validation under three different experimental settings of data sets, namely, corresponding DTI values of certain drugs (SD), targets (ST), or drug-target pairs (SP) in the training sets are missed but existed in the test sets. The experimental results demonstrate that our proposed approach DTI-CDF achieves a significantly higher performance than that of the traditional ensemble learning-based methods such as random forest and XGBoost, deep neural network, and the state-of-the-art methods such as DDR. Furthermore, there are 1352 newly predicted DTIs which are proved to be correct by KEGG and DrugBank databases. The data sets and source code are freely available at https://github.com//a96123155/DTI-CDF.

Subject(s)

Drug Development/methods , Proteomics/methods , Software , Humans , Molecular Docking Simulation/methods , Sequence Analysis, Protein/methods

Prediction of CYP450 Enzyme-Substrate Selectivity Based on the Network-Based Label Space Division Method.

Shan, Xiaoqi; Wang, Xiangeng; Li, Cheng-Dong; Chu, Yanyi; Zhang, Yufang; Xiong, Yi; Wei, Dong-Qing.

J Chem Inf Model ; 59(11): 4577-4586, 2019 11 25.

Article in English | MEDLINE | ID: mdl-31603319

ABSTRACT

A drug may be metabolized by multiple cytochrome P450 (CYP450) isoforms. Predicting the metabolic fate of drugs is very important to prevent drug-drug interactions in the development of novel pharmaceuticals. Prediction of CYP450 enzyme-substrate selectivity is formulized as a multilabel learning task in this study. First, we compared the performance of feature combinations based on four different categories of features, which are physiochemical property descriptors, mol2vec descriptors, extended connectivity fingerprints, and molecular access system key fingerprints on modeling. After identifying the best combination of features, we applied seven different multilabel models, which are multilabel k-nearest neighbor (ML-kNN), multilabel twin support vector machine, and five network-based label space division (NLSD)-based methods (NLSD-MLP, NLSD-XGB, NLSD-EXT, NLSD-RF, and NLSD-SVM). All of the six models (ML-kNN, NLSD-MLP, NLSD-XGB, NLSD-EXT, NLSD-RF, and NLSD-SVM) in this paper exhibit better performances than the previous work. Besides, NLSD-XGB achieves the best performance with the average top-1 prediction success of 91.1%, the average top-2 prediction success of 96.2%, and the average top-3 prediction success of 98.2%. When compared with the previous work, NLSD-XGB shows a significant improvement over 11% on top-1 in the 10 times repeated 5-fold cross-validation test and over 14% on top-1 in the 10 times repeated hold-out method. To the best of our knowledge, the network-based label space division model is first introduced in drug metabolism and performs well in this task.

Subject(s)

Cytochrome P-450 Enzyme System/metabolism , Pharmaceutical Preparations/metabolism , Humans , Models, Biological , Neural Networks, Computer , Pharmaceutical Preparations/chemistry , Substrate Specificity , Support Vector Machine

Effect of Cholesterol on C99 Dimerization: Revealed by Molecular Dynamics Simulations.

Li, Cheng-Dong; Junaid, Muhammad; Shan, Xiaoqi; Wang, Yanjing; Wang, Xiangeng; Khan, Abbas; Wei, Dong-Qing.

Front Mol Biosci ; 9: 872385, 2022.

Article in English | MEDLINE | ID: mdl-35928227

ABSTRACT

C99 is the immediate precursor for amyloid beta (Aß) and therefore is a central intermediate in the pathway that is believed to result in Alzheimer's disease (AD). It has been suggested that cholesterol is associated with C99, but the dynamic details of how cholesterol affects C99 assembly and the Aß formation remain unclear. To investigate this question, we employed coarse-grained and all-atom molecular dynamics simulations to study the effect of cholesterol and membrane composition on C99 dimerization. We found that although the existence of cholesterol delays C99 dimerization, there is no direct competition between C99 dimerization and cholesterol association. In contrast, the existence of cholesterol makes the C99 dimer more stable, which presents a cholesterol binding C99 dimer model. Cholesterol and membrane composition change the dimerization rate and conformation distribution of C99, which will subsequently influence the production of Aß. Our results provide insights into the potential influence of the physiological environment on the C99 dimerization, which will help us understand Aß formation and AD's etiology.

SPVec: A Word2vec-Inspired Feature Representation Method for Drug-Target Interaction Prediction.

Zhang, Yu-Fang; Wang, Xiangeng; Kaushik, Aman Chandra; Chu, Yanyi; Shan, Xiaoqi; Zhao, Ming-Zhu; Xu, Qin; Wei, Dong-Qing.

Front Chem ; 7: 895, 2019.

Article in English | MEDLINE | ID: mdl-31998687

ABSTRACT

Drug discovery is an academical and commercial process of global importance. Accurate identification of drug-target interactions (DTIs) can significantly facilitate the drug discovery process. Compared to the costly, labor-intensive and time-consuming experimental methods, machine learning (ML) plays an ever-increasingly important role in effective, efficient and high-throughput identification of DTIs. However, upstream feature extraction methods require tremendous human resources and expert insights, which limits the application of ML approaches. Inspired by the unsupervised representation learning methods like Word2vec, we here proposed SPVec, a novel way to automatically represent raw data such as SMILES strings and protein sequences into continuous, information-rich and lower-dimensional vectors, so as to avoid the sparseness and bit collisions from the cumbersomely manually extracted features. Visualization of SPVec nicely illustrated that the similar compounds or proteins occupy similar vector space, which indicated that SPVec not only encodes compound substructures or protein sequences efficiently, but also implicitly reveals some important biophysical and biochemical patterns. Compared with manually-designed features like MACCS fingerprints and amino acid composition (AAC), SPVec showed better performance with several state-of-art machine learning classifiers such as Gradient Boosting Decision Tree, Random Forest and Deep Neural Network on BindingDB. The performance and robustness of SPVec were also confirmed on independent test sets obtained from DrugBank database. Also, based on the whole DrugBank dataset, we predicted the possibilities of all unlabeled DTIs, where two of the top five predicted novel DTIs were supported by external evidences. These results indicated that SPVec can provide an effective and efficient way to discover reliable DTIs, which would be beneficial for drug reprofiling.

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL