Pesquisa | Biblioteca Virtual em Saúde

1.

UsIL-6: An unbalanced learning strategy for identifying IL-6 inducing peptides by undersampling technique.

Liao, Yan-Hong; Chen, Shou-Zhi; Bin, Yan-Nan; Zhao, Jian-Ping; Feng, Xin-Long; Zheng, Chun-Hou.

Comput Methods Programs Biomed ; 250: 108176, 2024 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-38677081

RESUMO

BACKGROUND AND OBJECTIVE: Interleukin-6 (IL-6) is the critical factor of early warning, monitoring, and prognosis in the inflammatory storm of COVID-19 cases. IL-6 inducing peptides, which can induce cytokine IL-6 production, are very important for the development of diagnosis and immunotherapy. Although the existing methods have some success in predicting IL-6 inducing peptides, there is still room for improvement in the performance of these models in practical application. METHODS: In this study, we proposed UsIL-6, a high-performance bioinformatics tool for identifying IL-6 inducing peptides. First, we extracted five groups of physicochemical properties and sequence structural information from IL-6 inducing peptide sequences, and obtained a 636-dimensional feature vector, we also employed NearMiss3 undersampling method and normalization method StandardScaler to process the data. Then, a 40-dimensional optimal feature vector was obtained by Boruta feature selection method. Finally, we combined this feature vector with extreme randomization tree classifier to build the final model UsIL-6. RESULTS: The AUC value of UsIL-6 on the independent test dataset was 0.87, and the BACC value was 0.808, which indicated that UsIL-6 had better performance than the existing methods in IL-6 inducing peptide recognition. CONCLUSIONS: The performance comparison on independent test dataset confirmed that UsIL-6 could achieve the highest performance, best robustness, and most excellent generalization ability. We hope that UsIL-6 will become a valuable method to identify, annotate and characterize new IL-6 inducing peptides.

Assuntos

Biologia Computacional , Interleucina-6 , Peptídeos , Humanos , Peptídeos/química , Biologia Computacional/métodos , COVID-19 , Algoritmos , Aprendizado de Máquina , SARS-CoV-2

2.

AMGDTI: drug-target interaction prediction based on adaptive meta-graph learning in heterogeneous network.

Su, Yansen; Hu, Zhiyang; Wang, Fei; Bin, Yannan; Zheng, Chunhou; Li, Haitao; Chen, Haowen; Zeng, Xiangxiang.

Brief Bioinform ; 25(1)2023 11 22.

Artigo em Inglês | MEDLINE | ID: mdl-38145949

RESUMO

Prediction of drug-target interactions (DTIs) is essential in medicine field, since it benefits the identification of molecular structures potentially interacting with drugs and facilitates the discovery and reposition of drugs. Recently, much attention has been attracted to network representation learning to learn rich information from heterogeneous data. Although network representation learning algorithms have achieved success in predicting DTI, several manually designed meta-graphs limit the capability of extracting complex semantic information. To address the problem, we introduce an adaptive meta-graph-based method, termed AMGDTI, for DTI prediction. In the proposed AMGDTI, the semantic information is automatically aggregated from a heterogeneous network by training an adaptive meta-graph, thereby achieving efficient information integration without requiring domain knowledge. The effectiveness of the proposed AMGDTI is verified on two benchmark datasets. Experimental results demonstrate that the AMGDTI method overall outperforms eight state-of-the-art methods in predicting DTI and achieves the accurate identification of novel DTIs. It is also verified that the adaptive meta-graph exhibits flexibility and effectively captures complex fine-grained semantic information, enabling the learning of intricate heterogeneous network topology and the inference of potential drug-target relationship.

Assuntos

Algoritmos , Medicina , Benchmarking , Sistemas de Liberação de Medicamentos , Semântica

3.

FFMAVP: a new classifier based on feature fusion and multitask learning for identifying antiviral peptides and their subclasses.

Cao, Ruifen; Hu, Weiling; Wei, Pijing; Ding, Yun; Bin, Yannan; Zheng, Chunhou.

Brief Bioinform ; 24(6)2023 09 22.

Artigo em Inglês | MEDLINE | ID: mdl-37861174

RESUMO

Antiviral peptides (AVPs) are widely found in animals and plants, with high specificity and strong sensitivity to drug-resistant viruses. However, due to the great heterogeneity of different viruses, most of the AVPs have specific antiviral activities. Therefore, it is necessary to identify the specific activities of AVPs on virus types. Most existing studies only identify AVPs, with only a few studies identifying subclasses by training multiple binary classifiers. We develop a two-stage prediction tool named FFMAVP that can simultaneously predict AVPs and their subclasses. In the first stage, we identify whether a peptide is AVP or not. In the second stage, we predict the six virus families and eight species specifically targeted by AVPs based on two multiclass tasks. Specifically, the feature extraction module in the two-stage task of FFMAVP adopts the same neural network structure, in which one branch extracts features based on amino acid feature descriptors and the other branch extracts sequence features. Then, the two types of features are fused for the following task. Considering the correlation between the two tasks of the second stage, a multitask learning model is constructed to improve the effectiveness of the two multiclass tasks. In addition, to improve the effectiveness of the second stage, the network parameters trained through the first-stage data are used to initialize the network parameters in the second stage. As a demonstration, the cross-validation results, independent test results and visualization results show that FFMAVP achieves great advantages in both stages.

Assuntos

Algoritmos , Peptídeos , Peptídeos/química , Redes Neurais de Computação , Aprendizado de Máquina , Antivirais/farmacologia , Antivirais/química

4.

Deep learning-based multi-functional therapeutic peptides prediction with a multi-label focal dice loss function.

Fan, Henghui; Yan, Wenhui; Wang, Lihua; Liu, Jie; Bin, Yannan; Xia, Junfeng.

Bioinformatics ; 39(6)2023 06 01.

Artigo em Inglês | MEDLINE | ID: mdl-37216900

RESUMO

MOTIVATION: With the great number of peptide sequences produced in the postgenomic era, it is highly desirable to identify the various functions of therapeutic peptides quickly. Furthermore, it is a great challenge to predict accurate multi-functional therapeutic peptides (MFTP) via sequence-based computational tools. RESULTS: Here, we propose a novel multi-label-based method, named ETFC, to predict 21 categories of therapeutic peptides. The method utilizes a deep learning-based model architecture, which consists of four blocks: embedding, text convolutional neural network, feed-forward network, and classification blocks. This method also adopts an imbalanced learning strategy with a novel multi-label focal dice loss function. multi-label focal dice loss is applied in the ETFC method to solve the inherent imbalance problem in the multi-label dataset and achieve competitive performance. The experimental results state that the ETFC method is significantly better than the existing methods for MFTP prediction. With the established framework, we use the teacher-student-based knowledge distillation to obtain the attention weight from the self-attention mechanism in the MFTP prediction and quantify their contributions toward each of the investigated activities. AVAILABILITY AND IMPLEMENTATION: The source code and dataset are available via: https://github.com/xialab-ahu/ETFC.

Assuntos

Aprendizado Profundo , Humanos , Redes Neurais de Computação , Peptídeos/uso terapêutico , Software

5.

PACVP: Prediction of Anti-Coronavirus Peptides Using a Stacking Learning Strategy With Effective Feature Representation.

Chen, Shouzhi; Liao, Yanhong; Zhao, Jianping; Bin, Yannan; Zheng, Chunhou.

IEEE/ACM Trans Comput Biol Bioinform ; 20(5): 3106-3116, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37022025

RESUMO

Due to the global outbreak of COVID-19 and its variants, antiviral peptides with anti-coronavirus activity (ACVPs) represent a promising new drug candidate for the treatment of coronavirus infection. At present, several computational tools have been developed to identify ACVPs, but the overall prediction performance is still not enough to meet the actual therapeutic application. In this study, we constructed an efficient and reliable prediction model PACVP (Prediction of Anti-CoronaVirus Peptides) for identifying ACVPs based on effective feature representation and a two-layer stacking learning framework. In the first layer, we use nine feature encoding methods with different feature representation angles to characterize the rich sequence information and fuse them into a feature matrix. Secondly, data normalization and unbalanced data processing are carried out. Next, 12 baseline models are constructed by combining three feature selection methods and four machine learning classification algorithms. In the second layer, we input the optimal probability features into the logistic regression algorithm (LR) to train the final model PACVP. The experiments show that PACVP achieves favorable prediction performance on independent test dataset, with ACC of 0.9208 and AUC of 0.9465. We hope that PACVP will become a useful method for identifying, annotating and characterizing novel ACVPs.

Assuntos

COVID-19 , Peptídeos , Humanos , Algoritmos , Aprendizado de Máquina , Probabilidade

6.

PhaGAA: an integrated web server platform for phage genome annotation and analysis.

Wu, Jiawei; Liu, Qingrui; Li, Min; Xu, Jiliang; Wang, Chen; Zhang, Junyin; Xiao, Minfeng; Bin, Yannan; Xia, Junfeng.

Bioinformatics ; 39(3)2023 03 01.

Artigo em Inglês | MEDLINE | ID: mdl-36882183

RESUMO

MOTIVATION: Phage genome annotation plays a key role in the design of phage therapy. To date, there have been various genome annotation tools for phages, but most of these tools focus on mono-functional annotation and have complex operational processes. Accordingly, comprehensive and user-friendly platforms for phage genome annotation are needed. RESULTS: Here, we propose PhaGAA, an online integrated platform for phage genome annotation and analysis. By incorporating several annotation tools, PhaGAA is constructed to annotate the prophage genome at DNA and protein levels and provide the analytical results. Furthermore, PhaGAA could mine and annotate phage genomes from bacterial genome or metagenome. In summary, PhaGAA will be a useful resource for experimental biologists and help advance the phage synthetic biology in basic and application research. AVAILABILITY AND IMPLEMENTATION: PhaGAA is freely available at http://phage.xialab.info/.

Assuntos

Bacteriófagos , Bacteriófagos/genética , Software , Computadores , Metagenoma , Genoma Bacteriano , Anotação de Sequência Molecular

7.

PrMFTP: Multi-functional therapeutic peptides prediction based on multi-head self-attention mechanism and class weight optimization.

Yan, Wenhui; Tang, Wending; Wang, Lihua; Bin, Yannan; Xia, Junfeng.

PLoS Comput Biol ; 18(9): e1010511, 2022 09.

Artigo em Inglês | MEDLINE | ID: mdl-36094961

RESUMO

Prediction of therapeutic peptide is a significant step for the discovery of promising therapeutic drugs. Most of the existing studies have focused on the mono-functional therapeutic peptide prediction. However, the number of multi-functional therapeutic peptides (MFTP) is growing rapidly, which requires new computational schemes to be proposed to facilitate MFTP discovery. In this study, based on multi-head self-attention mechanism and class weight optimization algorithm, we propose a novel model called PrMFTP for MFTP prediction. PrMFTP exploits multi-scale convolutional neural network, bi-directional long short-term memory, and multi-head self-attention mechanisms to fully extract and learn informative features of peptide sequence to predict MFTP. In addition, we design a class weight optimization scheme to address the problem of label imbalanced data. Comprehensive evaluation demonstrate that PrMFTP is superior to other state-of-the-art computational methods for predicting MFTP. We provide a user-friendly web server of PrMFTP, which is available at http://bioinfo.ahu.edu.cn/PrMFTP.

Assuntos

Algoritmos , Peptídeos , Peptídeos/uso terapêutico

8.

NeuroPred-CLQ: incorporating deep temporal convolutional networks and multi-head attention mechanism to predict neuropeptides.

Chen, Shouzhi; Li, Qing; Zhao, Jianping; Bin, Yannan; Zheng, Chunhou.

Brief Bioinform ; 23(5)2022 09 20.

Artigo em Inglês | MEDLINE | ID: mdl-35988921

RESUMO

Neuropeptides (NPs) are a particular class of informative substances in the immune system and physiological regulation. They play a crucial role in regulating physiological functions in various biological growth and developmental stages. In addition, NPs are crucial for developing new drugs for the treatment of neurological diseases. With the development of molecular biology techniques, some data-driven tools have emerged to predict NPs. However, it is necessary to improve the predictive performance of these tools for NPs. In this study, we developed a deep learning model (NeuroPred-CLQ) based on the temporal convolutional network (TCN) and multi-head attention mechanism to identify NPs effectively and translate the internal relationships of peptide sequences into numerical features by the Word2vec algorithm. The experimental results show that NeuroPred-CLQ learns data information effectively, achieving 93.6% accuracy and 98.8% AUC on the independent test set. The model has better performance in identifying NPs than the state-of-the-art predictors. Visualization of features using t-distribution random neighbor embedding shows that the NeuroPred-CLQ can clearly distinguish the positive NPs from the negative ones. We believe the NeuroPred-CLQ can facilitate drug development and clinical trial studies to treat neurological disorders.

Assuntos

Algoritmos , Neuropeptídeos , Neuropeptídeos/genética , Peptídeos/química

9.

DPProm: A Two-Layer Predictor for Identifying Promoters and Their Types on Phage Genome Using Deep Learning.

Wang, Chen; Zhang, Junyin; Cheng, Li; Wu, Jiawei; Xiao, Minfeng; Xia, Junfeng; Bin, Yannan.

IEEE J Biomed Health Inform ; 26(10): 5258-5266, 2022 10.

Artigo em Inglês | MEDLINE | ID: mdl-35867364

RESUMO

With the number of phage genomes increasing, it is urgent to develop new bioinformatics methods for phage genome annotation. Promoter, a DNA region, is important for gene transcriptional regulation. In the era of post-genomics, the availability of data makes it possible to establish computational models for promoter identification with robustness. In this work, we introduce DPProm, a two-layer model composed of DPProm-1L and DPProm-2L, to predict promoters and their types for phages. On the first layer, as a dual-channel deep neural network ensemble method fusing multi-view features (sequence feature and handcrafted feature), the model DPProm-1L is proposed to identify whether a DNA sequence is a promoter or non-promoter. The sequence feature is extracted with convolutional neural network (CNN). And the handcrafted feature is the combination of free energy, GC content, cumulative skew, and Z curve features. On the second layer, DPProm-2L based on CNN is trained to predict the promoters' types (host or phage). For the realization of prediction on the whole genomes, the model DPProm, combines with a novel sequence data processing workflow, which contains sliding window and merging sequences modules. Experimental results show that DPProm outperforms the state-of-the-art methods, and decreases the false positive rate effectively on whole genome prediction. Furthermore, we provide a user-friendly web at http://bioinfo.ahu.edu.cn/DPProm. We expect that DPProm can serve as a useful tool for identification of promoters and their types.

Assuntos

Bacteriófagos , Aprendizado Profundo , Bacteriófagos/genética , DNA , Genômica/métodos , Humanos , Regiões Promotoras Genéticas/genética

10.

PredAPP: Predicting Anti-Parasitic Peptides with Undersampling and Ensemble Approaches.

Zhang, Wei; Xia, Enhua; Dai, Ruyu; Tang, Wending; Bin, Yannan; Xia, Junfeng.

Interdiscip Sci ; 14(1): 258-268, 2022 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-34608613

RESUMO

Anti-parasitic peptides (APPs) have been regarded as promising therapeutic candidate drugs against parasitic diseases. Due to the fact that the experimental techniques for identifying APPs are expensive and time-consuming, there is an urgent need to develop a computational approach to predict APPs on a large scale. In this study, we provided a computational method, termed PredAPP (Prediction of Anti-Parasitic Peptides) that could effectively identify APPs using an ensemble of well-performed machine learning (ML) classifiers. Firstly, to solve the class imbalance problem, a balanced training dataset was generated by the undersampling method. We found that the balanced dataset based on cluster centroid achieved the best performance. Then, nine groups of features and six ML algorithms were combined to generate 54 classifiers and the output of these classifiers formed 54 feature representations, and in each feature group, we selected the feature representation with best performance for classification. Finally, the selected feature representations were integrated using logistic regression algorithm to construct the prediction model PredAPP. On the independent dataset, PredAPP achieved accuracy and AUC of 0.880 and 0.922, respectively, compared to 0.739 and 0.873 of AMPfun, a state-of-the-art method to predict APPs. The web server of PredAPP is freely accessible at http://predapp.xialab.info and https://github.com/xialab-ahu/PredAPP .

Assuntos

Aprendizado de Máquina , Peptídeos , Algoritmos , Computadores , Modelos Logísticos

11.

Identifying multi-functional bioactive peptide functions using multi-label deep learning.

Tang, Wending; Dai, Ruyu; Yan, Wenhui; Zhang, Wei; Bin, Yannan; Xia, Enhua; Xia, Junfeng.

Brief Bioinform ; 23(1)2022 01 17.

Artigo em Inglês | MEDLINE | ID: mdl-34651655

RESUMO

The bioactive peptide has wide functions, such as lowering blood glucose levels and reducing inflammation. Meanwhile, computational methods such as machine learning are becoming more and more important for peptide functions prediction. Most of the previous studies concentrate on the single-functional bioactive peptides prediction. However, the number of multi-functional peptides is on the increase; therefore, novel computational methods are needed. In this study, we develop a method MLBP (Multi-Label deep learning approach for determining the multi-functionalities of Bioactive Peptides), which can predict multiple functions including anti-cancer, anti-diabetic, anti-hypertensive, anti-inflammatory and anti-microbial simultaneously. MLBP model takes the peptide sequence vector as input to replace the biological and physiochemical features used in other peptides predictors. Using the embedding layer, the dense continuous feature vector is learnt from the sequence vector. Then, we extract convolution features from the feature vector through the convolutional neural network layer and combine with the bidirectional gated recurrent unit layer to improve the prediction performance. The 5-fold cross-validation experiments are conducted on the training dataset, and the results show that Accuracy and Absolute true are 0.695 and 0.685, respectively. On the test dataset, Accuracy and Absolute true of MLBP are 0.709 and 0.697, with 5.0 and 4.7% higher than those of the suboptimum method, respectively. The results indicate MLBP has superior prediction performance on the multi-functional peptides identification. MLBP is available at https://github.com/xialab-ahu/MLBP and http://bioinfo.ahu.edu.cn/MLBP/.

Assuntos

Aprendizado Profundo , Aprendizado de Máquina , Redes Neurais de Computação , Peptídeos

12.

DLFF-ACP: prediction of ACPs based on deep learning and multi-view features fusion.

Cao, Ruifen; Wang, Meng; Bin, Yannan; Zheng, Chunhou.

PeerJ ; 9: e11906, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34414035

RESUMO

An emerging type of therapeutic agent, anticancer peptides (ACPs), has attracted attention because of its lower risk of toxic side effects. However process of identifying ACPs using experimental methods is both time-consuming and laborious. In this study, we developed a new and efficient algorithm that predicts ACPs by fusing multi-view features based on dual-channel deep neural network ensemble model. In the model, one channel used the convolutional neural network CNN to automatically extract the potential spatial features of a sequence. Another channel was used to process and extract more effective features from handcrafted features. Additionally, an effective feature fusion method was explored for the mutual fusion of different features. Finally, we adopted the neural network to predict ACPs based on the fusion features. The performance comparisons across the single and fusion features showed that the fusion of multi-view features could effectively improve the model's predictive ability. Among these, the fusion of the features extracted by the CNN and composition of k-spaced amino acid group pairs achieved the best performance. To further validate the performance of our model, we compared it with other existing methods using two independent test sets. The results showed that our model's area under curve was 0.90, which was higher than that of the other existing methods on the first test set and higher than most of the other existing methods on the second test set. The source code and datasets are available at https://github.com/wame-ng/DLFF-ACP.

13.

Sequence-Based Prediction of Transmembrane Protein Crystallization Propensity.

Zhu, Qizhi; Wang, Lihua; Dai, Ruyu; Zhang, Wei; Tang, Wending; Bin, Yannan; Wang, Zeliang; Xia, Junfeng.

Interdiscip Sci ; 13(4): 693-702, 2021 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-34143353

RESUMO

Transmembrane proteins play a vital role in cell life activities. There are several techniques to determine transmembrane protein structures and X-ray crystallography is the primary methodology. However, due to the special properties of transmembrane proteins, it is still hard to determine their structures by X-ray crystallography technique. To reduce experimental consumption and improve experimental efficiency, it is of great significance to develop computational methods for predicting the crystallization propensity of transmembrane proteins. In this work, we proposed a sequence-based machine learning method, namely Prediction of TransMembrane protein Crystallization propensity (PTMC), to predict the propensity of transmembrane protein crystallization. First, we obtained several general sequence features and the specific encoded features of relative solvent accessibility and hydrophobicity. Second, feature selection was employed to filter out redundant and irrelevant features, and the optimal feature subset is composed of hydrophobicity, amino acid composition and relative solvent accessibility. Finally, we chose extreme gradient boosting by comparing with other several machine learning methods. Comparative results on the independent test set indicate that PTMC outperforms state-of-the-art sequence-based methods in terms of sensitivity, specificity, accuracy, Matthew's Correlation Coefficient (MCC) and Area Under the receiver operating characteristic Curve (AUC). In comparison with two competitors, Bcrystal and TMCrys, PTMC achieves an improvement by 0.132 and 0.179 for sensitivity, 0.014 and 0.127 for specificity, 0.037 and 0.192 for accuracy, 0.128 and 0.362 for MCC, and 0.027 and 0.125 for AUC, respectively.

Assuntos

Biologia Computacional , Proteínas de Membrana , Cristalização , Cristalografia por Raios X , Interações Hidrofóbicas e Hidrofílicas

14.

An improved DNA-binding hot spot residues prediction method by exploring interfacial neighbor properties.

Zhang, Sijia; Wang, Lihua; Zhao, Le; Li, Menglu; Liu, Mengya; Li, Ke; Bin, Yannan; Xia, Junfeng.

BMC Bioinformatics ; 22(Suppl 3): 253, 2021 May 17.

Artigo em Inglês | MEDLINE | ID: mdl-34000983

RESUMO

BACKGROUND: DNA-binding hot spots are dominant and fundamental residues that contribute most of the binding free energy yet accounting for a small portion of protein-DNA interfaces. As experimental methods for identifying hot spots are time-consuming and costly, high-efficiency computational approaches are emerging as alternative pathways to experimental methods. RESULTS: Herein, we present a new computational method, termed inpPDH, for hot spot prediction. To improve the prediction performance, we extract hybrid features which incorporate traditional features and new interfacial neighbor properties. To remove redundant and irrelevant features, feature selection is employed using a two-step feature selection strategy. Finally, a subset of 7 optimal features are chosen to construct the predictor using support vector machine. The results on the benchmark dataset show that this proposed method yields significantly better prediction accuracy than those previously published methods in the literature. Moreover, a user-friendly web server for inpPDH is well established and is freely available at http://bioinfo.ahu.edu.cn/inpPDH . CONCLUSIONS: We have developed an accurate improved prediction model, inpPDH, for hot spot residues in protein-DNA binding interfaces by given the structure of a protein-DNA complex. Moreover, we identify a comprehensive and useful feature subset including the proposed interfacial neighbor features that has an important strength for identifying hot spot residues. Our results indicate that these features are more effective than the conventional features considered previously, and that the combination of interfacial neighbor features and traditional features may support the creation of a discriminative feature set for efficient prediction of hot spot residues in protein-DNA complexes.

Assuntos

Biologia Computacional , Máquina de Vetores de Suporte , Bases de Dados de Proteínas , Ligação Proteica

15.

BBPpred: Sequence-Based Prediction of Blood-Brain Barrier Peptides with Feature Representation Learning and Logistic Regression.

Dai, Ruyu; Zhang, Wei; Tang, Wending; Wynendaele, Evelien; Zhu, Qizhi; Bin, Yannan; De Spiegeleer, Bart; Xia, Junfeng.

J Chem Inf Model ; 61(1): 525-534, 2021 01 25.

Artigo em Inglês | MEDLINE | ID: mdl-33426873

RESUMO

Blood-brain barrier peptides (BBPs) have a large range of biomedical applications since they can cross the blood-brain barrier based on different mechanisms. As experimental methods for the identification of BBPs are laborious and expensive, computational approaches are necessary to be developed for predicting BBPs. In this work, we describe a computational method, BBPpred (blood-brain barrier peptides prediction), that can efficiently identify BBPs using logistic regression. We investigate a wide variety of features from amino acid sequence information, and then a feature learning method is adopted to represent the informative features. To improve the prediction performance, seven informative features are selected for classification by eliminating redundant and irrelevant features. In addition, we specifically create two benchmark data sets (training and independent test), which contain a total of 119 BBPs from public databases and the literature. On the training data set, BBPpred shows promising performances with an AUC score of 0.8764 and an AUPR score of 0.8757 using the 10-fold cross-validation. We also test our new method on the independent test data set and obtain a favorable performance. We envision that BBPpred will be a useful tool for identifying, annotating, and characterizing BBPs. BBPpred is freely available at http://BBPpred.xialab.info.

Assuntos

Barreira Hematoencefálica , Peptídeos , Sequência de Aminoácidos , Modelos Logísticos

16.

A Deep Learning-Based Method for Identification of Bacteriophage-Host Interaction.

Li, Menglu; Wang, Yanan; Li, Fuyi; Zhao, Yun; Liu, Mengya; Zhang, Sijia; Bin, Yannan; Smith, A Ian; Webb, Geoffrey I; Li, Jian; Song, Jiangning; Xia, Junfeng.

IEEE/ACM Trans Comput Biol Bioinform ; 18(5): 1801-1810, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-32813660

RESUMO

Multi-drug resistance (MDR) has become one of the greatest threats to human health worldwide, and novel treatment methods of infections caused by MDR bacteria are urgently needed. Phage therapy is a promising alternative to solve this problem, to which the key is correctly matching target pathogenic bacteria with the corresponding therapeutic phage. Deep learning is powerful for mining complex patterns to generate accurate predictions. In this study, we develop PredPHI (Predicting Phage-Host Interactions), a deep learning-based tool capable of predicting the host of phages from sequence data. We collect >3000 phage-host pairs along with their protein sequences from PhagesDB and GenBank databases and extract a set of features. Then we select high-quality negative samples based on the K-Means clustering method and construct a balanced training set. Finally, we employ a deep convolutional neural network to build the predictive model. The results indicate that PredPHI can achieve a predictive performance of 81 percent in terms of the area under the receiver operating characteristic curve on the test set, and the clustering-based method is significantly more robust than that based on randomly selecting negative samples. These results highlight that PredPHI is a useful and accurate tool for identifying phage-host interactions from sequence data.

Assuntos

Bacteriófagos/genética , Biologia Computacional/métodos , Aprendizado Profundo , Interações Microbianas/genética , Análise de Sequência de DNA/métodos , Algoritmos , Bactérias/genética , DNA Bacteriano/genética , DNA Viral/genética , Farmacorresistência Bacteriana/genética

17.

Predicting Hot Spot Residues at Protein-DNA Binding Interfaces Based on Sequence Information.

Yao, Lingsong; Wang, Huadong; Bin, Yannan.

Interdiscip Sci ; 13(1): 1-11, 2021 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-33068261

RESUMO

Hot spot residues at protein-DNA binding interfaces are hugely important for investigating the underlying mechanism of molecular recognition. Currently, there are a few tools available for identifying the hot spot residues in the protein-DNA complexes. In addition, the three-dimensional protein structures are needed in these tools. However, it is well known that the three-dimensional structures are unavailable for most proteins. Considering the limitation, we proposed a method, named SPDH, for predicting hot spot residues only based on protein sequences. Firstly, we obtained 133 features from physicochemical property, conservation, predicted solvent accessible surface area and structure. Then, we systematically assessed these features based on various feature selection methods to obtain the optimal feature subset and compared the models using four classical machine learning algorithms (support vector machine, random forest, logistic regression, and k-nearest neighbor) on the training dataset. We found that the variability of physicochemical property features between wild and mutative types was important on improving the performance of the prediction model. On the independent test set, our method achieved the performance with AUC of 0.760 and sensitivity of 0.808, and outperformed other methods. The data and source code can be downloaded at https://github.com/xialab-ahu/SPDH .

Assuntos

Algoritmos , Biologia Computacional , DNA , Bases de Dados de Proteínas , Ligação Proteica , Proteínas/metabolismo

18.

Prediction of Radiosensitivity in Head and Neck Squamous Cell Carcinoma Based on Multiple Omics Data.

Liu, Jie; Han, Mengmeng; Yue, Zhenyu; Dong, Chao; Wen, Pengbo; Zhao, Guoping; Wu, Lijun; Xia, Junfeng; Bin, Yannan.

Front Genet ; 11: 960, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-33014019

RESUMO

Head and neck squamous cell carcinoma (HNSCC) is a malignant tumor. Radiotherapy (RT) is an important treatment for HNSCC, but not all patients derive survival benefit from RT due to the individual differences on radiosensitivity. A prediction model of radiosensitivity based on multiple omics data might solve this problem. Compared with single omics data, multiple omics data can illuminate more systematical associations between complex molecular characteristics and cancer phenotypes. In this study, we obtained 122 differential expression genes by analyzing the gene expression data of HNSCC patients with RT (N = 287) and without RT (N = 189) downloaded from The Cancer Genome Atlas. Then, HNSCC patients with RT were randomly divided into a training set (N = 149) and a test set (N = 138). Finally, we combined multiple omics data of 122 differential genes with clinical outcomes on the training set to establish a 12-gene signature by two-stage regularization and multivariable Cox regression models. Using the median score of the 12-gene signature on the training set as the cutoff value, the patients were divided into the high- and low-score groups. The analysis revealed that patients in the low-score group had higher radiosensitivity and would benefit from RT. Furthermore, we developed a nomogram to predict the overall survival of HNSCC patients with RT. We compared the prognostic value of 12-gene signature with those of the gene signatures based on single omics data. It suggested that the 12-gene signature based on multiple omics data achieved the best ability for predicting radiosensitivity. In conclusion, the proposed 12-gene signature is a promising biomarker for estimating the RT options in HNSCC patients.

19.

Prediction of hot spots in protein-DNA binding interfaces based on supervised isometric feature mapping and extreme gradient boosting.

Li, Ke; Zhang, Sijia; Yan, Di; Bin, Yannan; Xia, Junfeng.

BMC Bioinformatics ; 21(Suppl 13): 381, 2020 Sep 17.

Artigo em Inglês | MEDLINE | ID: mdl-32938395

RESUMO

BACKGROUND: Identification of hot spots in protein-DNA interfaces provides crucial information for the research on protein-DNA interaction and drug design. As experimental methods for determining hot spots are time-consuming, labor-intensive and expensive, there is a need for developing reliable computational method to predict hot spots on a large scale. RESULTS: Here, we proposed a new method named sxPDH based on supervised isometric feature mapping (S-ISOMAP) and extreme gradient boosting (XGBoost) to predict hot spots in protein-DNA complexes. We obtained 114 features from a combination of the protein sequence, structure, network and solvent accessible information, and systematically assessed various feature selection methods and feature dimensionality reduction methods based on manifold learning. The results show that the S-ISOMAP method is superior to other feature selection or manifold learning methods. XGBoost was then used to develop hot spots prediction model sxPDH based on the three dimensionality-reduced features obtained from S-ISOMAP. CONCLUSION: Our method sxPDH boosts prediction performance using S-ISOMAP and XGBoost. The AUC of the model is 0.773, and the F1 score is 0.713. Experimental results on benchmark dataset indicate that sxPDH can achieve generally better performance in predicting hot spots compared to the state-of-the-art methods.

Assuntos

Proteínas de Ligação a DNA/metabolismo , Mapeamento de Interação de Proteínas/métodos , Humanos , Modelos Moleculares

20.

Prediction of Neuropeptides from Sequence Information Using Ensemble Classifier and Hybrid Features.

Bin, Yannan; Zhang, Wei; Tang, Wending; Dai, Ruyu; Li, Menglu; Zhu, Qizhi; Xia, Junfeng.

J Proteome Res ; 19(9): 3732-3740, 2020 09 04.

Artigo em Inglês | MEDLINE | ID: mdl-32786686

RESUMO

As hormones in the endocrine system and neurotransmitters in the immune system, neuropeptides (NPs) provide many opportunities for the discovery of new drugs and targets for nervous system disorders. In spite of their importance in the hormonal regulations and immune responses, the bioinformatics predictor for the identification of NPs is lacking. In this study, we develop a predictor for the identification of NPs, named PredNeuroP, based on a two-layer stacking method. In this ensemble predictor, 45 models are introduced as base-learners by combining nine feature descriptors with five machine learning algorithms. Then, we select eight base-learners referring to the sum of accuracy and Pearson correlation coefficient of base-learner pairs on the first-layer learning. On the second-layer learning, the outputs of these advisable base-learners are imported into logistic regression classifier to train the final model, and the outputs are the final predicting results. The accuracy of PredNeuroP is 0.893 and 0.872 on the training and test data sets, respectively. The consistent performance on these data sets approves the practicability of our predictor. Therefore, we expect that PredNeuroP would provide an important advancement in the discovery of NPs as new drugs for the treatment of nervous system disorders. The data sets and Python code are available at https://github.com/xialab-ahu/PredNeuroP.

Assuntos

Aprendizado de Máquina , Neuropeptídeos , Algoritmos , Biologia Computacional , Neuropeptídeos/genética

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA