Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
1.
Brief Bioinform ; 24(6)2023 09 22.
Artigo em Inglês | MEDLINE | ID: mdl-37861174

RESUMO

Antiviral peptides (AVPs) are widely found in animals and plants, with high specificity and strong sensitivity to drug-resistant viruses. However, due to the great heterogeneity of different viruses, most of the AVPs have specific antiviral activities. Therefore, it is necessary to identify the specific activities of AVPs on virus types. Most existing studies only identify AVPs, with only a few studies identifying subclasses by training multiple binary classifiers. We develop a two-stage prediction tool named FFMAVP that can simultaneously predict AVPs and their subclasses. In the first stage, we identify whether a peptide is AVP or not. In the second stage, we predict the six virus families and eight species specifically targeted by AVPs based on two multiclass tasks. Specifically, the feature extraction module in the two-stage task of FFMAVP adopts the same neural network structure, in which one branch extracts features based on amino acid feature descriptors and the other branch extracts sequence features. Then, the two types of features are fused for the following task. Considering the correlation between the two tasks of the second stage, a multitask learning model is constructed to improve the effectiveness of the two multiclass tasks. In addition, to improve the effectiveness of the second stage, the network parameters trained through the first-stage data are used to initialize the network parameters in the second stage. As a demonstration, the cross-validation results, independent test results and visualization results show that FFMAVP achieves great advantages in both stages.


Assuntos
Algoritmos , Peptídeos , Peptídeos/química , Redes Neurais de Computação , Aprendizado de Máquina , Antivirais/farmacologia , Antivirais/química
2.
Brief Bioinform ; 25(1)2023 11 22.
Artigo em Inglês | MEDLINE | ID: mdl-38145949

RESUMO

Prediction of drug-target interactions (DTIs) is essential in medicine field, since it benefits the identification of molecular structures potentially interacting with drugs and facilitates the discovery and reposition of drugs. Recently, much attention has been attracted to network representation learning to learn rich information from heterogeneous data. Although network representation learning algorithms have achieved success in predicting DTI, several manually designed meta-graphs limit the capability of extracting complex semantic information. To address the problem, we introduce an adaptive meta-graph-based method, termed AMGDTI, for DTI prediction. In the proposed AMGDTI, the semantic information is automatically aggregated from a heterogeneous network by training an adaptive meta-graph, thereby achieving efficient information integration without requiring domain knowledge. The effectiveness of the proposed AMGDTI is verified on two benchmark datasets. Experimental results demonstrate that the AMGDTI method overall outperforms eight state-of-the-art methods in predicting DTI and achieves the accurate identification of novel DTIs. It is also verified that the adaptive meta-graph exhibits flexibility and effectively captures complex fine-grained semantic information, enabling the learning of intricate heterogeneous network topology and the inference of potential drug-target relationship.


Assuntos
Algoritmos , Medicina , Benchmarking , Sistemas de Liberação de Medicamentos , Semântica
3.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34651655

RESUMO

The bioactive peptide has wide functions, such as lowering blood glucose levels and reducing inflammation. Meanwhile, computational methods such as machine learning are becoming more and more important for peptide functions prediction. Most of the previous studies concentrate on the single-functional bioactive peptides prediction. However, the number of multi-functional peptides is on the increase; therefore, novel computational methods are needed. In this study, we develop a method MLBP (Multi-Label deep learning approach for determining the multi-functionalities of Bioactive Peptides), which can predict multiple functions including anti-cancer, anti-diabetic, anti-hypertensive, anti-inflammatory and anti-microbial simultaneously. MLBP model takes the peptide sequence vector as input to replace the biological and physiochemical features used in other peptides predictors. Using the embedding layer, the dense continuous feature vector is learnt from the sequence vector. Then, we extract convolution features from the feature vector through the convolutional neural network layer and combine with the bidirectional gated recurrent unit layer to improve the prediction performance. The 5-fold cross-validation experiments are conducted on the training dataset, and the results show that Accuracy and Absolute true are 0.695 and 0.685, respectively. On the test dataset, Accuracy and Absolute true of MLBP are 0.709 and 0.697, with 5.0 and 4.7% higher than those of the suboptimum method, respectively. The results indicate MLBP has superior prediction performance on the multi-functional peptides identification. MLBP is available at https://github.com/xialab-ahu/MLBP and http://bioinfo.ahu.edu.cn/MLBP/.


Assuntos
Aprendizado Profundo , Aprendizado de Máquina , Redes Neurais de Computação , Peptídeos
4.
Brief Bioinform ; 23(5)2022 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-35988921

RESUMO

Neuropeptides (NPs) are a particular class of informative substances in the immune system and physiological regulation. They play a crucial role in regulating physiological functions in various biological growth and developmental stages. In addition, NPs are crucial for developing new drugs for the treatment of neurological diseases. With the development of molecular biology techniques, some data-driven tools have emerged to predict NPs. However, it is necessary to improve the predictive performance of these tools for NPs. In this study, we developed a deep learning model (NeuroPred-CLQ) based on the temporal convolutional network (TCN) and multi-head attention mechanism to identify NPs effectively and translate the internal relationships of peptide sequences into numerical features by the Word2vec algorithm. The experimental results show that NeuroPred-CLQ learns data information effectively, achieving 93.6% accuracy and 98.8% AUC on the independent test set. The model has better performance in identifying NPs than the state-of-the-art predictors. Visualization of features using t-distribution random neighbor embedding shows that the NeuroPred-CLQ can clearly distinguish the positive NPs from the negative ones. We believe the NeuroPred-CLQ can facilitate drug development and clinical trial studies to treat neurological disorders.


Assuntos
Algoritmos , Neuropeptídeos , Neuropeptídeos/genética , Peptídeos/química
5.
Bioinformatics ; 39(6)2023 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-37216900

RESUMO

MOTIVATION: With the great number of peptide sequences produced in the postgenomic era, it is highly desirable to identify the various functions of therapeutic peptides quickly. Furthermore, it is a great challenge to predict accurate multi-functional therapeutic peptides (MFTP) via sequence-based computational tools. RESULTS: Here, we propose a novel multi-label-based method, named ETFC, to predict 21 categories of therapeutic peptides. The method utilizes a deep learning-based model architecture, which consists of four blocks: embedding, text convolutional neural network, feed-forward network, and classification blocks. This method also adopts an imbalanced learning strategy with a novel multi-label focal dice loss function. multi-label focal dice loss is applied in the ETFC method to solve the inherent imbalance problem in the multi-label dataset and achieve competitive performance. The experimental results state that the ETFC method is significantly better than the existing methods for MFTP prediction. With the established framework, we use the teacher-student-based knowledge distillation to obtain the attention weight from the self-attention mechanism in the MFTP prediction and quantify their contributions toward each of the investigated activities. AVAILABILITY AND IMPLEMENTATION: The source code and dataset are available via: https://github.com/xialab-ahu/ETFC.


Assuntos
Aprendizado Profundo , Humanos , Redes Neurais de Computação , Peptídeos/uso terapêutico , Software
6.
Bioinformatics ; 39(3)2023 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-36882183

RESUMO

MOTIVATION: Phage genome annotation plays a key role in the design of phage therapy. To date, there have been various genome annotation tools for phages, but most of these tools focus on mono-functional annotation and have complex operational processes. Accordingly, comprehensive and user-friendly platforms for phage genome annotation are needed. RESULTS: Here, we propose PhaGAA, an online integrated platform for phage genome annotation and analysis. By incorporating several annotation tools, PhaGAA is constructed to annotate the prophage genome at DNA and protein levels and provide the analytical results. Furthermore, PhaGAA could mine and annotate phage genomes from bacterial genome or metagenome. In summary, PhaGAA will be a useful resource for experimental biologists and help advance the phage synthetic biology in basic and application research. AVAILABILITY AND IMPLEMENTATION: PhaGAA is freely available at http://phage.xialab.info/.


Assuntos
Bacteriófagos , Bacteriófagos/genética , Software , Computadores , Metagenoma , Genoma Bacteriano , Anotação de Sequência Molecular
7.
PLoS Comput Biol ; 18(9): e1010511, 2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-36094961

RESUMO

Prediction of therapeutic peptide is a significant step for the discovery of promising therapeutic drugs. Most of the existing studies have focused on the mono-functional therapeutic peptide prediction. However, the number of multi-functional therapeutic peptides (MFTP) is growing rapidly, which requires new computational schemes to be proposed to facilitate MFTP discovery. In this study, based on multi-head self-attention mechanism and class weight optimization algorithm, we propose a novel model called PrMFTP for MFTP prediction. PrMFTP exploits multi-scale convolutional neural network, bi-directional long short-term memory, and multi-head self-attention mechanisms to fully extract and learn informative features of peptide sequence to predict MFTP. In addition, we design a class weight optimization scheme to address the problem of label imbalanced data. Comprehensive evaluation demonstrate that PrMFTP is superior to other state-of-the-art computational methods for predicting MFTP. We provide a user-friendly web server of PrMFTP, which is available at http://bioinfo.ahu.edu.cn/PrMFTP.


Assuntos
Algoritmos , Peptídeos , Peptídeos/uso terapêutico
8.
BMC Bioinformatics ; 22(Suppl 3): 253, 2021 May 17.
Artigo em Inglês | MEDLINE | ID: mdl-34000983

RESUMO

BACKGROUND: DNA-binding hot spots are dominant and fundamental residues that contribute most of the binding free energy yet accounting for a small portion of protein-DNA interfaces. As experimental methods for identifying hot spots are time-consuming and costly, high-efficiency computational approaches are emerging as alternative pathways to experimental methods. RESULTS: Herein, we present a new computational method, termed inpPDH, for hot spot prediction. To improve the prediction performance, we extract hybrid features which incorporate traditional features and new interfacial neighbor properties. To remove redundant and irrelevant features, feature selection is employed using a two-step feature selection strategy. Finally, a subset of 7 optimal features are chosen to construct the predictor using support vector machine. The results on the benchmark dataset show that this proposed method yields significantly better prediction accuracy than those previously published methods in the literature. Moreover, a user-friendly web server for inpPDH is well established and is freely available at http://bioinfo.ahu.edu.cn/inpPDH . CONCLUSIONS: We have developed an accurate improved prediction model, inpPDH, for hot spot residues in protein-DNA binding interfaces by given the structure of a protein-DNA complex. Moreover, we identify a comprehensive and useful feature subset including the proposed interfacial neighbor features that has an important strength for identifying hot spot residues. Our results indicate that these features are more effective than the conventional features considered previously, and that the combination of interfacial neighbor features and traditional features may support the creation of a discriminative feature set for efficient prediction of hot spot residues in protein-DNA complexes.


Assuntos
Biologia Computacional , Máquina de Vetores de Suporte , Bases de Dados de Proteínas , Ligação Proteica
9.
J Chem Inf Model ; 61(1): 525-534, 2021 01 25.
Artigo em Inglês | MEDLINE | ID: mdl-33426873

RESUMO

Blood-brain barrier peptides (BBPs) have a large range of biomedical applications since they can cross the blood-brain barrier based on different mechanisms. As experimental methods for the identification of BBPs are laborious and expensive, computational approaches are necessary to be developed for predicting BBPs. In this work, we describe a computational method, BBPpred (blood-brain barrier peptides prediction), that can efficiently identify BBPs using logistic regression. We investigate a wide variety of features from amino acid sequence information, and then a feature learning method is adopted to represent the informative features. To improve the prediction performance, seven informative features are selected for classification by eliminating redundant and irrelevant features. In addition, we specifically create two benchmark data sets (training and independent test), which contain a total of 119 BBPs from public databases and the literature. On the training data set, BBPpred shows promising performances with an AUC score of 0.8764 and an AUPR score of 0.8757 using the 10-fold cross-validation. We also test our new method on the independent test data set and obtain a favorable performance. We envision that BBPpred will be a useful tool for identifying, annotating, and characterizing BBPs. BBPpred is freely available at http://BBPpred.xialab.info.


Assuntos
Barreira Hematoencefálica , Peptídeos , Sequência de Aminoácidos , Modelos Logísticos
10.
BMC Bioinformatics ; 21(Suppl 13): 381, 2020 Sep 17.
Artigo em Inglês | MEDLINE | ID: mdl-32938395

RESUMO

BACKGROUND: Identification of hot spots in protein-DNA interfaces provides crucial information for the research on protein-DNA interaction and drug design. As experimental methods for determining hot spots are time-consuming, labor-intensive and expensive, there is a need for developing reliable computational method to predict hot spots on a large scale. RESULTS: Here, we proposed a new method named sxPDH based on supervised isometric feature mapping (S-ISOMAP) and extreme gradient boosting (XGBoost) to predict hot spots in protein-DNA complexes. We obtained 114 features from a combination of the protein sequence, structure, network and solvent accessible information, and systematically assessed various feature selection methods and feature dimensionality reduction methods based on manifold learning. The results show that the S-ISOMAP method is superior to other feature selection or manifold learning methods. XGBoost was then used to develop hot spots prediction model sxPDH based on the three dimensionality-reduced features obtained from S-ISOMAP. CONCLUSION: Our method sxPDH boosts prediction performance using S-ISOMAP and XGBoost. The AUC of the model is 0.773, and the F1 score is 0.713. Experimental results on benchmark dataset indicate that sxPDH can achieve generally better performance in predicting hot spots compared to the state-of-the-art methods.


Assuntos
Proteínas de Ligação a DNA/metabolismo , Mapeamento de Interação de Proteínas/métodos , Humanos , Modelos Moleculares
11.
J Proteome Res ; 19(9): 3732-3740, 2020 09 04.
Artigo em Inglês | MEDLINE | ID: mdl-32786686

RESUMO

As hormones in the endocrine system and neurotransmitters in the immune system, neuropeptides (NPs) provide many opportunities for the discovery of new drugs and targets for nervous system disorders. In spite of their importance in the hormonal regulations and immune responses, the bioinformatics predictor for the identification of NPs is lacking. In this study, we develop a predictor for the identification of NPs, named PredNeuroP, based on a two-layer stacking method. In this ensemble predictor, 45 models are introduced as base-learners by combining nine feature descriptors with five machine learning algorithms. Then, we select eight base-learners referring to the sum of accuracy and Pearson correlation coefficient of base-learner pairs on the first-layer learning. On the second-layer learning, the outputs of these advisable base-learners are imported into logistic regression classifier to train the final model, and the outputs are the final predicting results. The accuracy of PredNeuroP is 0.893 and 0.872 on the training and test data sets, respectively. The consistent performance on these data sets approves the practicability of our predictor. Therefore, we expect that PredNeuroP would provide an important advancement in the discovery of NPs as new drugs for the treatment of nervous system disorders. The data sets and Python code are available at https://github.com/xialab-ahu/PredNeuroP.


Assuntos
Aprendizado de Máquina , Neuropeptídeos , Algoritmos , Biologia Computacional , Neuropeptídeos/genética
12.
BMC Med Genet ; 20(Suppl 2): 190, 2019 12 09.
Artigo em Inglês | MEDLINE | ID: mdl-31815613

RESUMO

BACKGROUND: Synonymous mutations have been identified to play important roles in cancer development, although they do not modify the protein sequences. However, relatively little research has specifically delineated the functionality of synonymous mutations in cancer. RESULTS: We investigated the nucleotide-based and amino acid-based features of synonymous mutations across 15 cancer types from The Cancer Genome Atlas (TCGA), and revealed novel driver candidates by identifying hotspot mutations. Firstly, synonymous mutations were analyzed between TCGA and 1000 Genomes Project at nucleotide and amino acid levels. We found that C:G → T:A transitions were the most frequent single-base substitutions, and leucine underwent the largest number of synonymous mutations in TCGA due to prevalent C → T transition, which induced the transformation between optimal and non-optimal codons. Next, 97 synonymous hotspot mutations in 86 genes were nominated as candidate drivers with potential cancer risk by considering the mutational rates across different sequence contexts. We observed that non-CpG-island GC transition sequence context was positively selected across most of cancer types, and different sequence contexts under which hotspot mutations occur could be significance for genetic differences and functional features. We also found that the hotspots were more conserved than neutral mutations of hotspot-mutation-containing-genes and frequently happened at leucine. In addition, we mapped hotspots, neutral and non-hotspot mutations of hotspot-mutation-containing-genes to their respective protein domains and found ion transport domain was the most frequent one, which could mediate the cell interaction and had relevant implication for tumor therapy. And the signatures of synonymous hotspots were qualitatively similar with those of harmful missense variants. CONCLUSIONS: We illustrated the preferences of cancer associated synonymous mutations, especially hotspots, and laid the groundwork for understanding the synonymous mutations act as drivers in cancer.


Assuntos
Mutação , Neoplasias/genética , Aminoácidos/análise , Conjuntos de Dados como Assunto , Humanos , Taxa de Mutação , Neoplasias/classificação
13.
Acta Biochim Biophys Sin (Shanghai) ; 49(2): 170-178, 2017 Feb 06.
Artigo em Inglês | MEDLINE | ID: mdl-28069584

RESUMO

With their capability to inhibit the formation of amyloid-ß peptide (Aß) fibril, norepinephrine (NE), and other catechol derivatives have been considered for the potential treatment of Alzheimer's disease (AD). Such treatment, however, remains debatable because of the diverse functions of Aß and NE in AD pathology. Moreover, the complicated oxidation accompanying NE has caused the majority of the previous research to focus on the binding of NE oxides onto Aß. The molecular mechanism by which Aß interacts with the reduction state of NE, which is correlated with the brain function, should be urgently explored. In this work, by controlling rigorous anaerobic experimental conditions, the molecular mechanism of the Aß/NE interaction was investigated, and two binding sites were revealed. Tyr10 was identified as the strong binding site of NE, and SNK(26-28) segment was the weak binding segment. Furthermore, thioflavin T fluorescence confirmed NE's positive function of inhibiting Aß aggregation through its weak binding with SNK(26-28) segment. Meanwhile, 7-OHCCA fluorescence exhibited NE's negative function of enhancing ·OH generation through inhibiting the Aß/Cu2+ coordination. The viability tests of the neuroblastoma SH-SY5Y cells displayed that the coexistence of NE, Cu2+, and Aß induced lower cell viability than free Cu2+, indicating the significant negative effect of excessive NE on AD progression. These data revealed the possible pathway of NE-induced damage in AD brain, which is significant for understanding the function of NE in Aß-involved AD neuropathology and for designing an NE-related therapeutic strategy for AD.


Assuntos
Peptídeos beta-Amiloides/metabolismo , Norepinefrina/metabolismo , Fragmentos de Peptídeos/metabolismo , Tirosina/metabolismo , Motivos de Aminoácidos/genética , Sequência de Aminoácidos , Peptídeos beta-Amiloides/genética , Peptídeos beta-Amiloides/farmacologia , Sítios de Ligação/genética , Ligação Competitiva , Linhagem Celular Tumoral , Sobrevivência Celular/efeitos dos fármacos , Cobre/metabolismo , Cobre/farmacologia , Humanos , Cinética , Estrutura Molecular , Neuroblastoma/patologia , Norepinefrina/química , Norepinefrina/farmacologia , Fragmentos de Peptídeos/genética , Fragmentos de Peptídeos/farmacologia , Ligação Proteica , Ressonância de Plasmônio de Superfície , Tirosina/genética
14.
Comput Methods Programs Biomed ; 250: 108176, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38677081

RESUMO

BACKGROUND AND OBJECTIVE: Interleukin-6 (IL-6) is the critical factor of early warning, monitoring, and prognosis in the inflammatory storm of COVID-19 cases. IL-6 inducing peptides, which can induce cytokine IL-6 production, are very important for the development of diagnosis and immunotherapy. Although the existing methods have some success in predicting IL-6 inducing peptides, there is still room for improvement in the performance of these models in practical application. METHODS: In this study, we proposed UsIL-6, a high-performance bioinformatics tool for identifying IL-6 inducing peptides. First, we extracted five groups of physicochemical properties and sequence structural information from IL-6 inducing peptide sequences, and obtained a 636-dimensional feature vector, we also employed NearMiss3 undersampling method and normalization method StandardScaler to process the data. Then, a 40-dimensional optimal feature vector was obtained by Boruta feature selection method. Finally, we combined this feature vector with extreme randomization tree classifier to build the final model UsIL-6. RESULTS: The AUC value of UsIL-6 on the independent test dataset was 0.87, and the BACC value was 0.808, which indicated that UsIL-6 had better performance than the existing methods in IL-6 inducing peptide recognition. CONCLUSIONS: The performance comparison on independent test dataset confirmed that UsIL-6 could achieve the highest performance, best robustness, and most excellent generalization ability. We hope that UsIL-6 will become a valuable method to identify, annotate and characterize new IL-6 inducing peptides.


Assuntos
Biologia Computacional , Interleucina-6 , Peptídeos , Humanos , Peptídeos/química , Biologia Computacional/métodos , COVID-19 , Algoritmos , Aprendizado de Máquina , SARS-CoV-2
15.
Acta Biochim Biophys Sin (Shanghai) ; 45(7): 570-7, 2013 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-23747389

RESUMO

It is well known that the aggregation of amyloid-ß peptide (Aß) induced by Cu²âº is related to incubation time, solution pH, and temperature. In this work, the aggregation of Aß1₋42 in the presence of Cu²âº under acidic conditions was studied at different incubation time and temperature (e.g. 25 and 37°C). Incubation temperature, pH, and the presence of Cu²âº in Aß solution were confirmed to alter the morphology of aggregation (fibrils or amorphous aggregates), and the morphology is pivotal for Aß neurotoxicity and Alzheimer disease (AD) development. The results of atomic force microscopy (AFM) indicated that the formation of Aß fibrous morphology is preferred at lower pH, but Cu²âº induced the formation of amorphous aggregates. The aggregation rate of Aß was increased with the elevation of temperature. These results were further confirmed by fluorescence spectroscopy and circular dichroism spectroscopy and it was found that the formation of ß-sheet structure was inhibited by Cu²âº binding to Aß. The result was consistent with AFM observation and the fibrillation process was restrained. We believe that the local charge state in hydrophilic domain of Aß may play a dominant role in the aggregate morphology due to the strong steric hindrance. This research will be valuable for understanding of Aß toxicity in AD.


Assuntos
Ácidos/metabolismo , Peptídeos beta-Amiloides/metabolismo , Cobre/metabolismo , Fragmentos de Peptídeos/metabolismo , Peptídeos beta-Amiloides/química , Dicroísmo Circular , Concentração de Íons de Hidrogênio , Ponto Isoelétrico , Microscopia de Força Atômica , Fragmentos de Peptídeos/química , Conformação Proteica , Temperatura
16.
IEEE/ACM Trans Comput Biol Bioinform ; 20(5): 3106-3116, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37022025

RESUMO

Due to the global outbreak of COVID-19 and its variants, antiviral peptides with anti-coronavirus activity (ACVPs) represent a promising new drug candidate for the treatment of coronavirus infection. At present, several computational tools have been developed to identify ACVPs, but the overall prediction performance is still not enough to meet the actual therapeutic application. In this study, we constructed an efficient and reliable prediction model PACVP (Prediction of Anti-CoronaVirus Peptides) for identifying ACVPs based on effective feature representation and a two-layer stacking learning framework. In the first layer, we use nine feature encoding methods with different feature representation angles to characterize the rich sequence information and fuse them into a feature matrix. Secondly, data normalization and unbalanced data processing are carried out. Next, 12 baseline models are constructed by combining three feature selection methods and four machine learning classification algorithms. In the second layer, we input the optimal probability features into the logistic regression algorithm (LR) to train the final model PACVP. The experiments show that PACVP achieves favorable prediction performance on independent test dataset, with ACC of 0.9208 and AUC of 0.9465. We hope that PACVP will become a useful method for identifying, annotating and characterizing novel ACVPs.


Assuntos
COVID-19 , Peptídeos , Humanos , Algoritmos , Aprendizado de Máquina , Probabilidade
17.
Interdiscip Sci ; 14(1): 258-268, 2022 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-34608613

RESUMO

Anti-parasitic peptides (APPs) have been regarded as promising therapeutic candidate drugs against parasitic diseases. Due to the fact that the experimental techniques for identifying APPs are expensive and time-consuming, there is an urgent need to develop a computational approach to predict APPs on a large scale. In this study, we provided a computational method, termed PredAPP (Prediction of Anti-Parasitic Peptides) that could effectively identify APPs using an ensemble of well-performed machine learning (ML) classifiers. Firstly, to solve the class imbalance problem, a balanced training dataset was generated by the undersampling method. We found that the balanced dataset based on cluster centroid achieved the best performance. Then, nine groups of features and six ML algorithms were combined to generate 54 classifiers and the output of these classifiers formed 54 feature representations, and in each feature group, we selected the feature representation with best performance for classification. Finally, the selected feature representations were integrated using logistic regression algorithm to construct the prediction model PredAPP. On the independent dataset, PredAPP achieved accuracy and AUC of 0.880 and 0.922, respectively, compared to 0.739 and 0.873 of AMPfun, a state-of-the-art method to predict APPs. The web server of PredAPP is freely accessible at http://predapp.xialab.info and https://github.com/xialab-ahu/PredAPP .


Assuntos
Aprendizado de Máquina , Peptídeos , Algoritmos , Computadores , Modelos Logísticos
18.
IEEE J Biomed Health Inform ; 26(10): 5258-5266, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-35867364

RESUMO

With the number of phage genomes increasing, it is urgent to develop new bioinformatics methods for phage genome annotation. Promoter, a DNA region, is important for gene transcriptional regulation. In the era of post-genomics, the availability of data makes it possible to establish computational models for promoter identification with robustness. In this work, we introduce DPProm, a two-layer model composed of DPProm-1L and DPProm-2L, to predict promoters and their types for phages. On the first layer, as a dual-channel deep neural network ensemble method fusing multi-view features (sequence feature and handcrafted feature), the model DPProm-1L is proposed to identify whether a DNA sequence is a promoter or non-promoter. The sequence feature is extracted with convolutional neural network (CNN). And the handcrafted feature is the combination of free energy, GC content, cumulative skew, and Z curve features. On the second layer, DPProm-2L based on CNN is trained to predict the promoters' types (host or phage). For the realization of prediction on the whole genomes, the model DPProm, combines with a novel sequence data processing workflow, which contains sliding window and merging sequences modules. Experimental results show that DPProm outperforms the state-of-the-art methods, and decreases the false positive rate effectively on whole genome prediction. Furthermore, we provide a user-friendly web at http://bioinfo.ahu.edu.cn/DPProm. We expect that DPProm can serve as a useful tool for identification of promoters and their types.


Assuntos
Bacteriófagos , Aprendizado Profundo , Bacteriófagos/genética , DNA , Genômica/métodos , Humanos , Regiões Promotoras Genéticas/genética
19.
PeerJ ; 9: e11906, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34414035

RESUMO

An emerging type of therapeutic agent, anticancer peptides (ACPs), has attracted attention because of its lower risk of toxic side effects. However process of identifying ACPs using experimental methods is both time-consuming and laborious. In this study, we developed a new and efficient algorithm that predicts ACPs by fusing multi-view features based on dual-channel deep neural network ensemble model. In the model, one channel used the convolutional neural network CNN to automatically extract the potential spatial features of a sequence. Another channel was used to process and extract more effective features from handcrafted features. Additionally, an effective feature fusion method was explored for the mutual fusion of different features. Finally, we adopted the neural network to predict ACPs based on the fusion features. The performance comparisons across the single and fusion features showed that the fusion of multi-view features could effectively improve the model's predictive ability. Among these, the fusion of the features extracted by the CNN and composition of k-spaced amino acid group pairs achieved the best performance. To further validate the performance of our model, we compared it with other existing methods using two independent test sets. The results showed that our model's area under curve was 0.90, which was higher than that of the other existing methods on the first test set and higher than most of the other existing methods on the second test set. The source code and datasets are available at https://github.com/wame-ng/DLFF-ACP.

20.
Interdiscip Sci ; 13(1): 1-11, 2021 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-33068261

RESUMO

Hot spot residues at protein-DNA binding interfaces are hugely important for investigating the underlying mechanism of molecular recognition. Currently, there are a few tools available for identifying the hot spot residues in the protein-DNA complexes. In addition, the three-dimensional protein structures are needed in these tools. However, it is well known that the three-dimensional structures are unavailable for most proteins. Considering the limitation, we proposed a method, named SPDH, for predicting hot spot residues only based on protein sequences. Firstly, we obtained 133 features from physicochemical property, conservation, predicted solvent accessible surface area and structure. Then, we systematically assessed these features based on various feature selection methods to obtain the optimal feature subset and compared the models using four classical machine learning algorithms (support vector machine, random forest, logistic regression, and k-nearest neighbor) on the training dataset. We found that the variability of physicochemical property features between wild and mutative types was important on improving the performance of the prediction model. On the independent test set, our method achieved the performance with AUC of 0.760 and sensitivity of 0.808, and outperformed other methods. The data and source code can be downloaded at https://github.com/xialab-ahu/SPDH .


Assuntos
Algoritmos , Biologia Computacional , DNA , Bases de Dados de Proteínas , Ligação Proteica , Proteínas/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA