Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 14 de 14
Filter
Add more filters











Publication year range
1.
Article in English | MEDLINE | ID: mdl-38587961

ABSTRACT

Viruses pose a great threat to human production and life, thus the research and development of antiviral drugs is urgently needed. Antiviral peptides play an important role in drug design and development. Compared with the time-consuming and laborious wet chemical experiment methods, it is critical to use computational methods to predict antiviral peptides accurately and rapidly. However, due to limited data, accurate prediction of antiviral peptides is still challenging and extracting effective feature representations from sequences is crucial for creating accurate models. This study introduces a novel two-step approach, named HybAVPnet, to predict antiviral peptides with a hybrid network architecture based on neural networks and traditional machine learning methods. We adopted a stacking-like structure to capture both the long-term dependencies and local evolution information to achieve a comprehensive and diverse prediction using the predicted labels and probabilities. Using an ensemble technique with the different kinds of features can reduce the variance without increasing the bias. The experimental result shows HybAVPnet can achieve better and more robust performance compared with the state-of-the-art methods, which makes it useful for the research and development of antiviral drugs. Meanwhile, it can also be extended to other peptide recognition problems because of its generalization ability.

2.
Proc Natl Acad Sci U S A ; 121(13): e2308788121, 2024 Mar 26.
Article in English | MEDLINE | ID: mdl-38507445

ABSTRACT

Protein structure prediction has been greatly improved by deep learning in the past few years. However, the most successful methods rely on multiple sequence alignment (MSA) of the sequence homologs of the protein under prediction. In nature, a protein folds in the absence of its sequence homologs and thus, a MSA-free structure prediction method is desired. Here, we develop a single-sequence-based protein structure prediction method RaptorX-Single by integrating several protein language models and a structure generation module and then study its advantage over MSA-based methods. Our experimental results indicate that in addition to running much faster than MSA-based methods such as AlphaFold2, RaptorX-Single outperforms AlphaFold2 and other MSA-free methods in predicting the structure of antibodies (after fine-tuning on antibody data), proteins of very few sequence homologs, and single mutation effects. By comparing different protein language models, our results show that not only the scale but also the training data of protein language models will impact the performance. RaptorX-Single also compares favorably to MSA-based AlphaFold2 when the protein under prediction has a large number of sequence homologs.


Subject(s)
Antibodies , Proteins , Proteins/genetics , Proteins/chemistry , Antibodies/genetics , Sequence Alignment , Algorithms
3.
Bioinformatics ; 39(1)2023 01 01.
Article in English | MEDLINE | ID: mdl-36355462

ABSTRACT

MOTIVATION: Protein structure prediction has been greatly improved by deep learning, but the contribution of different information is yet to be fully understood. This article studies the impacts of two kinds of information for structure prediction: template and multiple sequence alignment (MSA) embedding. Templates have been used by some methods before, such as AlphaFold2, RoseTTAFold and RaptorX. AlphaFold2 and RosetTTAFold only used templates detected by HHsearch, which may not perform very well on some targets. In addition, sequence embedding generated by pre-trained protein language models has not been fully explored for structure prediction. In this article, we study the impact of templates (including the number of templates, the template quality and how the templates are generated) on protein structure prediction accuracy, especially when the templates are detected by methods other than HHsearch. We also study the impact of sequence embedding (generated by MSATransformer and ESM-1b) on structure prediction. RESULTS: We have implemented a deep learning method for protein structure prediction that may take templates and MSA embedding as extra inputs. We study the contribution of templates and MSA embedding to structure prediction accuracy. Our experimental results show that templates can improve structure prediction on 71 of 110 CASP13 (13th Critical Assessment of Structure Prediction) targets and 47 of 91 CASP14 targets, and templates are particularly useful for targets with similar templates. MSA embedding can improve structure prediction on 63 of 91 CASP14 (14th Critical Assessment of Structure Prediction) targets and 87 of 183 CAMEO targets and is particularly useful for proteins with shallow MSAs. When both templates and MSA embedding are used, our method can predict correct folds (TMscore > 0.5) for 16 of 23 CASP14 FM targets and 14 of 18 Continuous Automated Model Evaluation (CAMEO) targets, outperforming RoseTTAFold by 5% and 7%, respectively. AVAILABILITY AND IMPLEMENTATION: Available at https://github.com/xluo233/RaptorXFold. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Computational Biology , Proteins , Proteins/chemistry , Sequence Alignment , Computational Biology/methods , Protein Conformation
4.
Nat Comput Sci ; 1(7): 462-469, 2021 Jul.
Article in English | MEDLINE | ID: mdl-35321360

ABSTRACT

Protein model refinement is the last step applied to improve the quality of a predicted protein model. Currently the most successful refinement methods rely on extensive conformational sampling and thus, take hours or days to refine even a single protein model. Here we propose a fast and effective model refinement method that applies GNN (graph neural networks) to predict refined inter-atom distance probability distribution from an initial model and then rebuilds 3D models from the predicted distance distribution. Tested on the CASP (Critical Assessment of Structure Prediction) refinement targets, our method has comparable accuracy as two leading human groups Feig and Baker, but runs substantially faster. Our method may refine one protein model within ~11 minutes on 1 CPU while Baker needs ~30 hours on 60 CPUs and Feig needs ~16 hours on 1 GPU. Finally, our study shows that GNN outperforms ResNet (convolutional residual neural networks) for model refinement when very limited conformational sampling is allowed.

5.
Bioinformatics ; 36(22-23): 5361-5367, 2021 Apr 01.
Article in English | MEDLINE | ID: mdl-33325480

ABSTRACT

MOTIVATION: Accurately estimating protein model quality in the absence of experimental structure is not only important for model evaluation and selection but also useful for model refinement. Progress has been steadily made by introducing new features and algorithms (especially deep neural networks), but the accuracy of quality assessment (QA) is still not very satisfactory, especially local QA on hard protein targets. RESULTS: We propose a new single-model-based QA method ResNetQA for both local and global quality assessment. Our method predicts model quality by integrating sequential and pairwise features using a deep neural network composed of both 1D and 2D convolutional residual neural networks (ResNet). The 2D ResNet module extracts useful information from pairwise features such as model-derived distance maps, co-evolution information, and predicted distance potential from sequences. The 1D ResNet is used to predict local (global) model quality from sequential features and pooled pairwise information generated by 2D ResNet. Tested on the CASP12 and CASP13 datasets, our experimental results show that our method greatly outperforms existing state-of-the-art methods. Our ablation studies indicate that the 2D ResNet module and pairwise features play an important role in improving model quality assessment. AVAILABILITY AND IMPLEMENTATION: https://github.com/AndersJing/ResNetQA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

6.
Comput Math Methods Med ; 2020: 8894478, 2020.
Article in English | MEDLINE | ID: mdl-33029195

ABSTRACT

Heat shock proteins (HSPs) are ubiquitous in living organisms. HSPs are an essential component for cell growth and survival; the main function of HSPs is controlling the folding and unfolding process of proteins. According to molecular function and mass, HSPs are categorized into six different families: HSP20 (small HSPS), HSP40 (J-proteins), HSP60, HSP70, HSP90, and HSP100. In this paper, improved methods for HSP prediction are proposed-the split amino acid composition (SAAC), the dipeptide composition (DC), the conjoint triad feature (CTF), and the pseudoaverage chemical shift (PseACS) were selected to predict the HSPs with a support vector machine (SVM). In order to overcome the imbalance data classification problems, the syntactic minority oversampling technique (SMOTE) was used to balance the dataset. The overall accuracy was 99.72% with a balanced dataset in the jackknife test by using the optimized combination feature SAAC+DC+CTF+PseACS, which was 4.81% higher than the imbalanced dataset with the same combination feature. The Sn, Sp, Acc, and MCC of HSP families in our predictive model were higher than those in existing methods. This improved method may be helpful for protein function prediction.


Subject(s)
Algorithms , Heat-Shock Proteins/classification , Support Vector Machine , Amino Acid Sequence , Amino Acids/analysis , Animals , Computational Biology , Databases, Protein , Dipeptides/chemistry , Heat-Shock Proteins/genetics , Heat-Shock Proteins/metabolism , Humans , Mathematical Concepts
7.
Front Genet ; 11: 760, 2020.
Article in English | MEDLINE | ID: mdl-32903636

ABSTRACT

As cancer remains one of the main threats of human life, developing efficient cancer treatments is urgent. Anticancer peptides, which could overcome the significant side effects and poor results of traditional cancer treatments, have become a new potential alternative these years. However, identifying anticancer peptides by experimental methods is time consuming and resource consuming, it is of great significance to develop effective computational tools to quickly and accurately identify potential anticancer peptides from amino acid sequences. For most current computational methods, feature representation plays a key role in their final successes. This study proposes a novel fast and accurate approach to identify anticancer peptides using diversified feature representations and ensemble learning method. For the feature representations, the information is encoded from multidimensional feature spaces, including sequence composition, sequence-order, physicochemical properties, etc. In order to better model the potential relationships of peptides, multiple ensemble classifiers, LightGBMs, are applied to detect the different feature sets at first. Then the obtained multiple outputs are used as inputs of the support vector machine classifier, which effectively identifies anticancer peptides. Experimental results on cross validation and independent test sets demonstrate that our method can achieve better or comparable performances compared with other state-of-the-art methods.

8.
IEEE/ACM Trans Comput Biol Bioinform ; 17(6): 1918-1931, 2020.
Article in English | MEDLINE | ID: mdl-30998480

ABSTRACT

As the first step of machine-learning based protein structure and function prediction, the amino acid encoding play a fundamental role in the final success of those methods. Different from the protein sequence encoding, the amino acid encoding can be used in both residue-level and sequence-level prediction of protein properties by combining them with different algorithms. However, it has not attracted enough attention in the past decades, and there are no comprehensive reviews and assessments about encoding methods so far. In this article, we make a systematic classification and propose a comprehensive review and assessment for various amino acid encoding methods. Those methods are grouped into five categories according to their information sources and information extraction methodologies, including binary encoding, physicochemical properties encoding, evolution-based encoding, structure-based encoding, and machine-learning encoding. Then, 16 representative methods from five categories are selected and compared on protein secondary structure prediction and protein fold recognition tasks by using large-scale benchmark datasets. The results show that the evolution-based position-dependent encoding method PSSM achieved the best performance, and the structure-based and machine-learning encoding methods also show some potential for further application, the neural network based distributed representation of amino acids in particular may bring new light to this area. We hope that the review and assessment are useful for future studies in amino acid encoding.


Subject(s)
Amino Acid Sequence/genetics , Amino Acids/chemistry , Computational Biology/methods , Proteins , Sequence Analysis, Protein/methods , Algorithms , Protein Folding , Protein Structure, Secondary/genetics , Proteins/chemistry , Proteins/genetics , Proteins/physiology
9.
Methods Mol Biol ; 2074: 67-80, 2020.
Article in English | MEDLINE | ID: mdl-31583631

ABSTRACT

Identifying residue-residue contacts in protein-protein interactions or complex is crucial for understanding protein and cell functions. DCA (direct-coupling analysis) methods shed some light on this, but they need many sequence homologs to yield accurate prediction. Inspired by the success of our deep-learning method for intraprotein contact prediction, we have developed RaptorX-ComplexContact, a web server for interprotein residue-residue contact prediction. Given a pair of interacting protein sequences, RaptorX-ComplexContact first searches for their sequence homologs and builds two paired multiple sequence alignments (MSA) based on genomic distance and phylogeny information, respectively. Then, RaptorX-ComplexContact uses two deep convolutional residual neural networks (ResNet) to predict interprotein contacts from sequential features and coevolution information of paired MSAs. RaptorX-ComplexContact shall be useful for protein docking, protein-protein interaction prediction, and protein interaction network construction.


Subject(s)
Deep Learning , Proteins/chemistry , Computational Biology/methods , Protein Conformation , Protein Interaction Maps , Sequence Analysis, Protein
10.
Front Bioeng Biotechnol ; 8: 627335, 2020.
Article in English | MEDLINE | ID: mdl-33585423

ABSTRACT

Due to the overuse of antibiotics, people are worried that existing antibiotics will become ineffective against pathogens with the rapid rise of antibiotic-resistant strains. The use of cell wall lytic enzymes to destroy bacteria has become a viable alternative to avoid the crisis of antimicrobial resistance. In this paper, an improved method for cell wall lytic enzymes prediction was proposed and the amino acid composition (AAC), the dipeptide composition (DC), the position-specific score matrix auto-covariance (PSSM-AC), and the auto-covariance average chemical shift (acACS) were selected to predict the cell wall lytic enzymes with support vector machine (SVM). In order to overcome the imbalanced data classification problems and remove redundant or irrelevant features, the synthetic minority over-sampling technique (SMOTE) was used to balance the dataset. The F-score was used to select features. The S n , S p , MCC, and Acc were 99.35%, 99.02%, 0.98, and 99.19% with jackknife test using the optimized combination feature AAC+DC+acACS+PSSM-AC. The S n , S p , MCC, and Acc of cell wall lytic enzymes in our predictive model were higher than those in existing methods. This improved method may be helpful for protein function prediction.

11.
BMC Bioinformatics ; 18(1): 390, 2017 Sep 02.
Article in English | MEDLINE | ID: mdl-28865433

ABSTRACT

BACKGROUND: In structural biology area, protein residue-residue contacts play a crucial role in protein structure prediction. Some researchers have found that the predicted residue-residue contacts could effectively constrain the conformational search space, which is significant for de novo protein structure prediction. In the last few decades, related researchers have developed various methods to predict residue-residue contacts, especially, significant performance has been achieved by using fusion methods in recent years. In this work, a novel fusion method based on rank strategy has been proposed to predict contacts. Unlike the traditional regression or classification strategies, the contact prediction task is regarded as a ranking task. First, two kinds of features are extracted from correlated mutations methods and ensemble machine-learning classifiers, and then the proposed method uses the learning-to-rank algorithm to predict contact probability of each residue pair. RESULTS: First, we perform two benchmark tests for the proposed fusion method (RRCRank) on CASP11 dataset and CASP12 dataset respectively. The test results show that the RRCRank method outperforms other well-developed methods, especially for medium and short range contacts. Second, in order to verify the superiority of ranking strategy, we predict contacts by using the traditional regression and classification strategies based on the same features as ranking strategy. Compared with these two traditional strategies, the proposed ranking strategy shows better performance for three contact types, in particular for long range contacts. Third, the proposed RRCRank has been compared with several state-of-the-art methods in CASP11 and CASP12. The results show that the RRCRank could achieve comparable prediction precisions and is better than three methods in most assessment metrics. CONCLUSIONS: The learning-to-rank algorithm is introduced to develop a novel rank-based method for the residue-residue contact prediction of proteins, which achieves state-of-the-art performance based on the extensive assessment.


Subject(s)
Algorithms , Proteins/chemistry , Machine Learning
12.
BMC Bioinformatics ; 18(1): 275, 2017 May 25.
Article in English | MEDLINE | ID: mdl-28545390

ABSTRACT

BACKGROUND: Protein structure prediction has achieved a lot of progress during the last few decades and a greater number of models for a certain sequence can be predicted. Consequently, assessing the qualities of predicted protein models in perspective is one of the key components of successful protein structure prediction. Over the past years, a number of methods have been developed to address this issue, which could be roughly divided into three categories: single methods, quasi-single methods and clustering (or consensus) methods. Although these methods achieve much success at different levels, accurate protein model quality assessment is still an open problem. RESULTS: Here, we present the MQAPRank, a global protein model quality assessment program based on learning-to-rank. The MQAPRank first sorts the decoy models by using single method based on learning-to-rank algorithm to indicate their relative qualities for the target protein. And then it takes the first five models as references to predict the qualities of other models by using average GDT_TS scores between reference models and other models. Benchmarked on CASP11 and 3DRobot datasets, the MQAPRank achieved better performances than other leading protein model quality assessment methods. Recently, the MQAPRank participated in the CASP12 under the group name FDUBio and achieved the state-of-the-art performances. CONCLUSIONS: The MQAPRank provides a convenient and powerful tool for protein model quality assessment with the state-of-the-art performances, it is useful for protein structure prediction and model quality assessment usages.


Subject(s)
Models, Molecular , Proteins/chemistry , User-Computer Interface , Algorithms , Cluster Analysis , Internet , Protein Conformation , Proteins/metabolism
13.
Sci Rep ; 6: 31571, 2016 08 17.
Article in English | MEDLINE | ID: mdl-27530967

ABSTRACT

Much progress has been made in Protein structure prediction during the last few decades. As the predicted models can span a broad range of accuracy spectrum, the accuracy of quality estimation becomes one of the key elements of successful protein structure prediction. Over the past years, a number of methods have been developed to address this issue, and these methods could be roughly divided into three categories: the single-model methods, clustering-based methods and quasi single-model methods. In this study, we develop a single-model method MQAPRank based on the learning-to-rank algorithm firstly, and then implement a quasi single-model method Quasi-MQAPRank. The proposed methods are benchmarked on the 3DRobot and CASP11 dataset. The five-fold cross-validation on the 3DRobot dataset shows the proposed single model method outperforms other methods whose outputs are taken as features of the proposed method, and the quasi single-model method can further enhance the performance. On the CASP11 dataset, the proposed methods also perform well compared with other leading methods in corresponding categories. In particular, the Quasi-MQAPRank method achieves a considerable performance on the CASP11 Best150 dataset.


Subject(s)
Machine Learning , Protein Transport , Proteins/chemistry , Algorithms , Models, Molecular
14.
Zhongguo Zhong Yao Za Zhi ; 41(3): 484-489, 2016 Feb.
Article in Chinese | MEDLINE | ID: mdl-28868868

ABSTRACT

To observe the effect of extracts of ginseng, notoginseng, and Chuanxiong Rhizome on the cytoskeleton protein F-actin and G-actin of the replicative senescence vascular smooth muscle cells, with human aortic smooth muscle cells (HASMC) as the research object, and the replicative senescence 9th generation cells as the senescence models, the experiment was divided into youth group (5th generation cells), model group (9th generation cells), Chinese medicine low dose group (100 mg•L⁻¹), middle dose group (200 mg•L⁻¹), and high dose group (400 mg•L⁻¹) and resveratrol group (10 µmol•L⁻¹). The intervention time was 48 h. ß-Galactosidase specific staining method was used to calculate the ratio of blue dyeing cells. CCK-8 method was used to detect the cells proliferation. The flow cytometry was used to analyze the cell cycle. Immunofluorescent staining was used to observe morphological changes of F-actin and G-actin. The western blot assay was used to determine the expression of F-actin protein. Compared with the model group, the Chinese medicine groups and resveratrol group significantly reduced the number of blue dyeing cells, improved the ability of cells proliferation, reduced the number of cells in G0/G1 phase, increased the number of cells in S phase, and reduced the protein expression of F-actin and the formation of stress fibers, with obvious intervention effect and statistically significant difference. Therefore, the replicative senescence vascular smooth muscle cells can be used as the models for senescence research, with significant changes in morphology and protein expression of cytoskeleton protein F-actin and G-actin in the process of cells aging. The extracts of ginseng, notoginseng, and Chuanxiong Rhizome have obvious intervention effect on F-actin and G-actin, and it might be indirectly associated with delaying the aging of blood vessels.


Subject(s)
Cytoskeleton/metabolism , Drugs, Chinese Herbal/pharmacology , Muscle, Smooth, Vascular/drug effects , Panax notoginseng/chemistry , Panax/chemistry , Actins/metabolism , Cell Cycle/drug effects , Cell Proliferation/drug effects , Cellular Senescence , Humans , Muscle, Smooth, Vascular/cytology , Muscle, Smooth, Vascular/metabolism , Myocytes, Smooth Muscle/cytology , Myocytes, Smooth Muscle/drug effects , Rhizome/chemistry
SELECTION OF CITATIONS
SEARCH DETAIL