Search | VHL Regional Portal

DNA protein binding recognition based on lifelong learning.

Liu, Yongsan; Guan, ShiXuan; Jiang, TengSheng; Fu, Qiming; Ma, Jieming; Cui, Zhiming; Ding, Yijie; Wu, Hongjie.

Comput Biol Med ; 164: 107094, 2023 09.

Article in English | MEDLINE | ID: mdl-37459792

ABSTRACT

In recent years, research in the field of bioinformatics has focused on predicting the raw sequences of proteins, and some scholars consider DNA-binding protein prediction as a classification task. Many statistical and machine learning-based methods have been widely used in DNA-binding proteins research. The aforementioned methods are indeed more efficient than those based on manual classification, but there is still room for improvement in terms of prediction accuracy and speed. In this study, researchers used Average Blocks, Discrete Cosine Transform, Discrete Wavelet Transform, Global encoding, Normalized Moreau-Broto Autocorrelation and Pseudo position-specific scoring matrix to extract evolutionary features. A dynamic deep network based on lifelong learning architecture was then proposed in order to fuse six features and thus allow for more efficient classification of DNA-binding proteins. The multi-feature fusion allows for a more accurate description of the desired protein information than single features. This model offers a fresh perspective on the dichotomous classification problem in bioinformatics and broadens the application field of lifelong learning. The researchers ran trials on three datasets and contrasted them with other classification techniques to show the model's effectiveness in this study. The findings demonstrated that the model used in this research was superior to other approaches in terms of single-sample specificity (81.0%, 83.0%) and single-sample sensitivity (82.4%, 90.7%), and achieves high accuracy on the benchmark dataset (88.4%, 80.0%, and 76.6%).

Subject(s)

DNA-Binding Proteins , Machine Learning , Protein Binding , DNA-Binding Proteins/metabolism , Computational Biology/methods , DNA

MV-H-RKM: A Multiple View-Based Hypergraph Regularized Restricted Kernel Machine for Predicting DNA-Binding Proteins.

Guan, Shixuan; Qian, Yuqing; Jiang, Tengsheng; Jiang, Min; Ding, Yijie; Wu, Hongjie.

IEEE/ACM Trans Comput Biol Bioinform ; 20(2): 1246-1256, 2023.

Article in English | MEDLINE | ID: mdl-35731758

ABSTRACT

DNA-binding proteins (DBPs) have a significant impact on many life activities, so identification of DBPs is a crucial issue. And it is greatly helpful to understand the mechanism of protein-DNA interactions. In traditional experimental methods, it is significant time-consuming and labor-consuming to identify DBPs. In recent years, many researchers have proposed lots of different DBP identification methods based on machine learning algorithm to overcome shortcomings mentioned above. However, most existing methods cannot get satisfactory results. In this paper, we focus on developing a new predictor of DBPs, called Multi-View Hypergraph Restricted Kernel Machines (MV-H-RKM). In this method, we extract five features from the three views of the proteins. To fuse these features, we couple them by means of the shared hidden vector. Besides, we employ the hypergraph regularization to enforce the structure consistency between original features and the hidden vector. Experimental results show that the accuracy of MV-H-RKM is 84.09% and 85.48% on PDB1075 and PDB186 data set respectively, and demonstrate that our proposed method performs better than other state-of-the-art approaches. The code is publicly available at https://github.com/ShixuanGG/MV-H-RKM.

Subject(s)

DNA-Binding Proteins , Support Vector Machine , DNA-Binding Proteins/chemistry , Algorithms , DNA/chemistry , Machine Learning

Protein-DNA Binding Residues Prediction Using a Deep Learning Model With Hierarchical Feature Extraction.

Guan, Shixuan; Zou, Quan; Wu, Hongjie; Ding, Yijie.

IEEE/ACM Trans Comput Biol Bioinform ; 20(5): 2619-2628, 2023.

Article in English | MEDLINE | ID: mdl-35834447

ABSTRACT

Biologically important effects occur when proteins bind to other substances, of which binding to DNA is a crucial one. Therefore, accurate identification of protein-DNA binding residues is important for further understanding of the protein-DNA interaction mechanism. Although wet-lab methods can accurately obtain the location of bound residues, it requires significant human, financial and time costs. There is thus an urgent need to develop efficient computational-based methods. Most current state-of-the-art methods are two-step approaches: the first step uses a sliding window technique to extract residue features; the second step uses each residue as an input to the model for prediction. This has a negative impact on the efficiency of prediction and ease of use. In this study, we propose a sequence-to-sequence (seq2seq) model that can input the entire protein sequence of variable length and use two modules, Transformer Encoder Block and Feature Extracting Block, for hierarchical feature extraction, where Transformer Encoder Block is used to extract global features, and then Feature Extracting Block is used to extract local features to further improve the recognition capability of the model. The comparison results on two benchmark datasets, namely PDNA-543 and PDNA-41, prove the effectiveness of our method in identifying protein-DNA binding residues.

DNA-binding protein prediction based on deep transfer learning.

Yan, Jun; Jiang, Tengsheng; Liu, Junkai; Lu, Yaoyao; Guan, Shixuan; Li, Haiou; Wu, Hongjie; Ding, Yijie.

Math Biosci Eng ; 19(8): 7719-7736, 2022 05 24.

Article in English | MEDLINE | ID: mdl-35801442

ABSTRACT

The study of DNA binding proteins (DBPs) is of great importance in the biomedical field and plays a key role in this field. At present, many researchers are working on the prediction and detection of DBPs. Traditional DBP prediction mainly uses machine learning methods. Although these methods can obtain relatively high pre-diction accuracy, they consume large quantities of human effort and material resources. Transfer learning has certain advantages in dealing with such prediction problems. Therefore, in the present study, two features were extracted from a protein sequence, a transfer learning method was used, and two classical transfer learning algorithms were compared to transfer samples and construct data sets. In the final step, DBPs are detected by building a deep learning neural network model in a way that uses attention mechanisms.

Subject(s)

DNA-Binding Proteins , Neural Networks, Computer , Algorithms , Humans , Machine Learning

Research on DNA-Binding Protein Identification Method Based on LSTM-CNN Feature Fusion.

Lu, Weizhong; Chen, Xiaoyi; Zhang, Yu; Wu, Hongjie; Ding, Yijie; Shen, Jiawei; Guan, Shixuan; Li, Haiou.

Comput Math Methods Med ; 2022: 9705275, 2022.

Article in English | MEDLINE | ID: mdl-35693256

ABSTRACT

Protein is closely related to life activities. As a kind of protein, DNA-binding protein plays an irreplaceable role in life activities. Therefore, it is very important to study DNA-binding protein, which is a subject worthy of study. Although traditional biotechnology has high precision, its cost and efficiency are increasingly unable to meet the needs of modern society. Machine learning methods can make up for the deficiencies of biological experimental techniques to a certain extent, but they are not as simple and fast as deep learning for data processing. In this paper, a deep learning framework based on parallel long and short-term memory(LSTM) and convolutional neural networks(CNN) was proposed to identify DNA-binding protein. This model can not only further extract the information and features of protein sequences, but also the features of evolutionary information. Finally, the two features are combined for training and testing. On the PDB2272 dataset, compared with PDBP_Fusion model, Accuracy(ACC) and Matthew's Correlation Coefficient (MCC) increased by 3.82% and 7.98% respectively. The experimental results of this model have certain advantages.

Subject(s)

DNA-Binding Proteins , Neural Networks, Computer , Amino Acid Sequence , Humans , Machine Learning

G Protein-Coupled Receptor Interaction Prediction Based on Deep Transfer Learning.

Jiang, Tengsheng; Chen, Yuhui; Guan, Shixuan; Hu, Zhongtian; Lu, Weizhong; Fu, Qiming; Ding, Yijie; Li, Haiou; Wu, Hongjie.

IEEE/ACM Trans Comput Biol Bioinform ; 19(6): 3126-3134, 2022.

Article in English | MEDLINE | ID: mdl-34780331

ABSTRACT

G protein-coupled receptors (GPCRs) account for about 40% to 50% of drug targets. Many human diseases are related to G protein coupled receptors. Accurate prediction of GPCR interaction is not only essential to understand its structural role, but also helps design more effective drugs. At present, the prediction of GPCR interaction mainly uses machine learning methods. Machine learning methods generally require a large number of independent and identically distributed samples to achieve good results. However, the number of available GPCR samples that have been marked is scarce. Transfer learning has a strong advantage in dealing with such small sample problems. Therefore, this paper proposes a transfer learning method based on sample similarity, using XGBoost as a weak classifier and using the TrAdaBoost algorithm based on JS divergence for data weight initialization to transfer samples to construct a data set. After that, the deep neural network based on the attention mechanism is used for model training. The existing GPCR is used for prediction. In short-distance contact prediction, the accuracy of our method is 0.26 higher than similar methods.

Subject(s)

Algorithms , Receptors, G-Protein-Coupled , Humans , Receptors, G-Protein-Coupled/chemistry , Neural Networks, Computer , Machine Learning

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL