Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 47
Filter
Add more filters











Publication year range
1.
J Cell Biochem ; : e30642, 2024 Aug 20.
Article in English | MEDLINE | ID: mdl-39164870

ABSTRACT

The Type III secretion effectors (T3SEs) are bacterial proteins synthesized by Gram-negative pathogens and delivered into host cells via the Type III secretion system (T3SS). These effectors usually play a pivotal role in the interactions between bacteria and hosts. Hence, the precise identification of T3SEs aids researchers in exploring the pathogenic mechanisms of bacterial infections. Since the diversity and complexity of T3SE sequences often make traditional experimental methods time-consuming, it is imperative to explore more efficient and convenient computational approaches for T3SE prediction. Inspired by the promising potential exhibited by pre-trained language models in protein recognition tasks, we proposed a method called PLM-T3SE that utilizes protein language models (PLMs) for effective recognition of T3SEs. First, we utilized PLM embeddings and evolutionary features from the position-specific scoring matrix (PSSM) profiles to transform protein sequences into fixed-length vectors for model training. Second, we employed the extreme gradient boosting (XGBoost) algorithm to rank these features based on their importance. Finally, a MLP neural network model was used to predict T3SEs based on the selected optimal feature set. Experimental results from the cross-validation and independent test demonstrated that our model exhibited superior performance compared to the existing models. Specifically, our model achieved an accuracy of 98.1%, which is 1.8%-42.4% higher than the state-of-the-art predictors based on the same independent data set test. These findings highlight the superiority of the PLM-T3SE and the remarkable characterization ability of PLM embeddings for T3SE prediction.

2.
Int J Mol Sci ; 25(15)2024 Jul 23.
Article in English | MEDLINE | ID: mdl-39125602

ABSTRACT

The benzofuran core inhibitors HCV-796, BMS-929075, MK-8876, compound 2, and compound 9B exhibit good pan-genotypic activity against various genotypes of NS5B polymerase. To elucidate their mechanism of action, multiple molecular simulation methods were used to investigate the complex systems of these inhibitors binding to GT1a, 1b, 2a, and 2b NS5B polymerases. The calculation results indicated that these five inhibitors can not only interact with the residues in the palm II subdomain of NS5B polymerase, but also with the residues in the palm I subdomain or the palm I/III overlap region. Interestingly, the binding of inhibitors with longer substituents at the C5 position (BMS-929075, MK-8876, compound 2, and compound 9B) to the GT1a and 2b NS5B polymerases exhibits different binding patterns compared to the binding to the GT1b and 2a NS5B polymerases. The interactions between the para-fluorophenyl groups at the C2 positions of the inhibitors and the residues at the binding pockets, together with the interactions between the substituents at the C5 positions and the residues at the reverse ß-fold (residues 441-456), play a key role in recognition and the induction of the binding. The relevant studies could provide valuable information for further research and development of novel anti-HCV benzofuran core pan-genotypic inhibitors.


Subject(s)
Antiviral Agents , Benzofurans , Genotype , Hepacivirus , Viral Nonstructural Proteins , Viral Nonstructural Proteins/antagonists & inhibitors , Viral Nonstructural Proteins/metabolism , Viral Nonstructural Proteins/chemistry , Benzofurans/chemistry , Benzofurans/pharmacology , Hepacivirus/drug effects , Hepacivirus/enzymology , Hepacivirus/genetics , Antiviral Agents/pharmacology , Antiviral Agents/chemistry , Molecular Dynamics Simulation , Molecular Docking Simulation , Binding Sites , Protein Binding , Humans , Enzyme Inhibitors/pharmacology , Enzyme Inhibitors/chemistry , RNA-Dependent RNA Polymerase
3.
Anal Biochem ; 694: 115603, 2024 Nov.
Article in English | MEDLINE | ID: mdl-38986796

ABSTRACT

The recognition of DNA-binding proteins (DBPs) is the crucial step to understanding their roles in various biological processes such as genetic regulation, gene expression, cell cycle control, DNA repair, and replication within cells. However, conventional experimental methods for identifying DBPs are usually time-consuming and expensive. Therefore, there is an urgent need to develop rapid and efficient computational methods for the prediction of DBPs. In this study, we proposed a novel predictor named PreDBP-PLMs to further improve the identification accuracy of DBPs by fusing the pre-trained protein language model (PLM) ProtT5 embedding with evolutionary features as input to the classic convolutional neural network (CNN) model. Firstly, the ProtT5 embedding was combined with different evolutionary features derived from the position-specific scoring matrix (PSSM) to represent protein sequences. Then, the optimal feature combination was selected and input to the CNN classifier for the prediction of DBPs. Finally, the 5-fold cross-validation (CV), the leave-one-out CV (LOOCV), and the independent set test were adopted to examine the performance of PreDBP-PLMs on the benchmark datasets. Compared to the existing state-of-the-art predictors, PreDBP-PLMs exhibits an accuracy improvement of 0.5 % and 5.2 % on the PDB186 and PDB2272 datasets, respectively. It demonstrated that the proposed method could serve as a useful tool for the recognition of DBPs.


Subject(s)
DNA-Binding Proteins , Neural Networks, Computer , DNA-Binding Proteins/metabolism , DNA-Binding Proteins/chemistry , Computational Biology/methods , Databases, Protein , Humans
4.
Molecules ; 29(11)2024 Jun 04.
Article in English | MEDLINE | ID: mdl-38893524

ABSTRACT

The stimulator of interferon genes (STING) plays a significant role in immune defense and protection against tumor proliferation. Many cyclic dinucleotide (CDN) analogues have been reported to regulate its activity, but the dynamic process involved when the ligands activate STING remains unclear. In this work, all-atom molecular dynamics simulations were performed to explore the binding mode between human STING (hSTING) and four cyclic adenosine-inosine monophosphate analogs (cAIMPs), as well as 2',3'-cGMP-AMP (2',3'-cGAMP). The results indicate that these cAIMPs adopt a U-shaped configuration within the binding pocket, forming extensive non-covalent interaction networks with hSTING. These interactions play a significant role in augmenting the binding, particularly in interactions with Tyr167, Arg238, Thr263, and Thr267. Additionally, the presence of hydrophobic interactions between the ligand and the receptor further contributes to the overall stability of the binding. In this work, the conformational changes in hSTING upon binding these cAIMPs were also studied and a significant tendency for hSTING to shift from open to closed state was observed after binding some of the cAIMP ligands.


Subject(s)
Membrane Proteins , Molecular Dynamics Simulation , Protein Binding , Humans , Membrane Proteins/chemistry , Membrane Proteins/metabolism , Binding Sites , Nucleotides, Cyclic/chemistry , Nucleotides, Cyclic/metabolism , Ligands , Hydrophobic and Hydrophilic Interactions
5.
Int J Mol Sci ; 25(8)2024 Apr 19.
Article in English | MEDLINE | ID: mdl-38674091

ABSTRACT

Identification of druggable proteins can greatly reduce the cost of discovering new potential drugs. Traditional experimental approaches to exploring these proteins are often costly, slow, and labor-intensive, making them impractical for large-scale research. In response, recent decades have seen a rise in computational methods. These alternatives support drug discovery by creating advanced predictive models. In this study, we proposed a fast and precise classifier for the identification of druggable proteins using a protein language model (PLM) with fine-tuned evolutionary scale modeling 2 (ESM-2) embeddings, achieving 95.11% accuracy on the benchmark dataset. Furthermore, we made a careful comparison to examine the predictive abilities of ESM-2 embeddings and position-specific scoring matrix (PSSM) features by using the same classifiers. The results suggest that ESM-2 embeddings outperformed PSSM features in terms of accuracy and efficiency. Recognizing the potential of language models, we also developed an end-to-end model based on the generative pre-trained transformers 2 (GPT-2) with modifications. To our knowledge, this is the first time a large language model (LLM) GPT-2 has been deployed for the recognition of druggable proteins. Additionally, a more up-to-date dataset, known as Pharos, was adopted to further validate the performance of the proposed model.


Subject(s)
Proteins , Proteins/metabolism , Computational Biology/methods , Drug Discovery/methods , Position-Specific Scoring Matrices , Databases, Protein , Humans , Algorithms
6.
Math Biosci Eng ; 21(1): 1472-1488, 2024 Jan.
Article in English | MEDLINE | ID: mdl-38303473

ABSTRACT

Non-classical secreted proteins (NCSPs) refer to a group of proteins that are located in the extracellular environment despite the absence of signal peptides and motifs. They usually play different roles in intercellular communication. Therefore, the accurate prediction of NCSPs is a critical step to understanding in depth their associated secretion mechanisms. Since the experimental recognition of NCSPs is often costly and time-consuming, computational methods are desired. In this study, we proposed an ensemble learning framework, termed NCSP-PLM, for the identification of NCSPs by extracting feature embeddings from pre-trained protein language models (PLMs) as input to several fine-tuned deep learning models. First, we compared the performance of nine PLM embeddings by training three neural networks: Multi-layer perceptron (MLP), attention mechanism and bidirectional long short-term memory network (BiLSTM) and selected the best network model for each PLM embedding. Then, four models were excluded due to their below-average accuracies, and the remaining five models were integrated to perform the prediction of NCSPs based on the weighted voting. Finally, the 5-fold cross validation and the independent test were conducted to evaluate the performance of NCSP-PLM on the benchmark datasets. Based on the same independent dataset, the sensitivity and specificity of NCSP-PLM were 91.18% and 97.06%, respectively. Particularly, the overall accuracy of our model achieved 94.12%, which was 7~16% higher than that of the existing state-of-the-art predictors. It indicated that NCSP-PLM could serve as a useful tool for the annotation of NCSPs.


Subject(s)
Deep Learning , Neural Networks, Computer , Proteins , Language , Sensitivity and Specificity
7.
Molecules ; 29(2)2024 Jan 22.
Article in English | MEDLINE | ID: mdl-38276629

ABSTRACT

Lysine-specific demethylase 1 (LSD1/KDM1A) has emerged as a promising therapeutic target for treating various cancers (such as breast cancer, liver cancer, etc.) and other diseases (blood diseases, cardiovascular diseases, etc.), owing to its observed overexpression, thereby presenting significant opportunities in drug development. Since its discovery in 2004, extensive research has been conducted on LSD1 inhibitors, with notable contributions from computational approaches. This review systematically summarizes LSD1 inhibitors investigated through computer-aided drug design (CADD) technologies since 2010, showcasing a diverse range of chemical scaffolds, including phenelzine derivatives, tranylcypromine (abbreviated as TCP or 2-PCPA) derivatives, nitrogen-containing heterocyclic (pyridine, pyrimidine, azole, thieno[3,2-b]pyrrole, indole, quinoline and benzoxazole) derivatives, natural products (including sanguinarine, phenolic compounds and resveratrol derivatives, flavonoids and other natural products) and others (including thiourea compounds, Fenoldopam and Raloxifene, (4-cyanophenyl)glycine derivatives, propargylamine and benzohydrazide derivatives and inhibitors discovered through AI techniques). Computational techniques, such as virtual screening, molecular docking and 3D-QSAR models, have played a pivotal role in elucidating the interactions between these inhibitors and LSD1. Moreover, the integration of cutting-edge technologies such as artificial intelligence holds promise in facilitating the discovery of novel LSD1 inhibitors. The comprehensive insights presented in this review aim to provide valuable information for advancing further research on LSD1 inhibitors.


Subject(s)
Biological Products , Enzyme Inhibitors , Enzyme Inhibitors/pharmacology , Enzyme Inhibitors/chemistry , Lysine , Molecular Docking Simulation , Artificial Intelligence , Drug Design , Histone Demethylases/metabolism , Structure-Activity Relationship
8.
J Phys Chem B ; 127(22): 4989-4997, 2023 06 08.
Article in English | MEDLINE | ID: mdl-37243666

ABSTRACT

CRISPR (clustered regularly interspaced short palindromic repeats)/CRISPR-associated protein (Cas9) has been widely used for gene editing. Not all guide RNAs can cleave the DNA efficiently remains a major challenge to CRISPR/Cas9-mediated genome engineering. Therefore, understanding how the Cas9 complex successfully and efficiently identifies specific functional targets through base-pairing has great implications for such applications. The 10-nt seed sequence at the 3' end of the guide RNA is critical to target recognition and cleavage. Here, through stretching molecular dynamics simulation, we studied the thermodynamics and kinetics of the binding-dissociation process of the seed base and the target DNA base with the Cas9 protein. The results showed that in the presence of Cas9 protein, the enthalpy change and entropy change in binding-dissociation of the seed base with the target are smaller than those without the Cas9 protein. The reduction of entropy penalty upon association with the protein resulted from the pre-organization of the seed base in an A-form helix, and the reduction of enthalpy change was due to the electrostatic attraction of the positively charged channel with the negative target DNA. The binding barrier coming from the entropy loss and the dissociation barrier resulting from the destruction of the base pair in the presence of Cas9 protein were lower than those without protein, which indicates that the seed region is crucial for efficiently searching the correct target by accelerating the binding rate and dissociating fast from the wrong target.


Subject(s)
CRISPR-Associated Protein 9 , CRISPR-Cas Systems , CRISPR-Associated Protein 9/genetics , CRISPR-Associated Protein 9/metabolism , Base Pairing , Gene Editing/methods , DNA/chemistry
9.
Molecules ; 28(5)2023 Mar 01.
Article in English | MEDLINE | ID: mdl-36903531

ABSTRACT

The subcellular localization of messenger RNA (mRNA) precisely controls where protein products are synthesized and where they function. However, obtaining an mRNA's subcellular localization through wet-lab experiments is time-consuming and expensive, and many existing mRNA subcellular localization prediction algorithms need to be improved. In this study, a deep neural network-based eukaryotic mRNA subcellular location prediction method, DeepmRNALoc, was proposed, utilizing a two-stage feature extraction strategy that featured bimodal information splitting and fusing for the first stage and a VGGNet-like CNN module for the second stage. The five-fold cross-validation accuracies of DeepmRNALoc in the cytoplasm, endoplasmic reticulum, extracellular region, mitochondria, and nucleus were 0.895, 0.594, 0.308, 0.944, and 0.865, respectively, demonstrating that it outperforms existing models and techniques.


Subject(s)
Deep Learning , Eukaryota , Eukaryota/metabolism , Proteins/metabolism , Endoplasmic Reticulum/metabolism , RNA, Messenger , Computational Biology/methods
10.
Phys Rev E ; 107(2-1): 024404, 2023 Feb.
Article in English | MEDLINE | ID: mdl-36932572

ABSTRACT

Mechanical force has been widely used to study RNA folding and unfolding. Understanding how the force affects the opening and closing of a single base pair, which is a basic step for RNA folding and unfolding and a fundamental behavior in some important biological activities, is crucial to understanding the mechanism of RNA folding and unfolding under mechanical force. In this work, we investigated the opening and closing process of an RNA base pair under mechanical force with constant-force stretching molecular dynamics simulations. It was found that high mechanical force results in overstretching, and the open state is a high-energy state. The enthalpy and entropy change of the base-pair opening-closing transition were obtained and the results at low forces were in good agreement with the nearest-neighbor model. The temperature and force dependence of the opening and closing rates were also obtained. The position of the transition state for the base-pair opening-closing transition under mechanical force was determined. The free energy barrier of opening a base pair without force is the enthalpy increase, and the work done by the force from the closed state to the transition state decreases the barrier and increases the opening rate. The free energy barrier of closing the base pair without force results from the entropy loss, and the work done by the force from the open state to the transition state increases the barrier and decreases the closing rate. The transition rates are strongly dependent on the temperature and force, while the transition path times are weakly dependent on force and temperature.


Subject(s)
Molecular Dynamics Simulation , RNA , Base Pairing , Thermodynamics , Mechanical Phenomena , Kinetics
11.
Molecules ; 27(23)2022 Nov 30.
Article in English | MEDLINE | ID: mdl-36500451

ABSTRACT

Lysine-specific demethylase 1 (LSD1) is a histone-modifying enzyme, which is a significant target for anticancer drug research. In this work, 40 reported tetrahydroquinoline-derivative inhibitors targeting LSD1 were studied to establish the three-dimensional quantitative structure-activity relationship (3D-QSAR). The established models CoMFA (Comparative Molecular Field Analysis (q2 = 0.778, Rpred2 = 0.709)) and CoMSIA (Comparative Molecular Similarity Index Analysis (q2 = 0.764, Rpred2 = 0.713)) yielded good statistical and predictive properties. Based on the corresponding contour maps, seven novel tetrahydroquinoline derivatives were designed. For more information, three of the compounds (D1, D4, and Z17) and the template molecule 18x were explored with molecular dynamics simulations, binding free energy calculations by MM/PBSA method as well as the ADME (absorption, distribution, metabolism, and excretion) prediction. The results suggested that D1, D4, and Z17 performed better than template molecule 18x due to the introduction of the amino and hydrophobic groups, especially for the D1 and D4, which will provide guidance for the design of LSD1 inhibitors.


Subject(s)
Antineoplastic Agents , Quantitative Structure-Activity Relationship , Molecular Docking Simulation , Molecular Dynamics Simulation , Hydrophobic and Hydrophilic Interactions , Antineoplastic Agents/pharmacology , Drug Design
12.
BMC Biol ; 20(1): 231, 2022 10 13.
Article in English | MEDLINE | ID: mdl-36224580

ABSTRACT

BACKGROUND: Antarctica harbors the bulk of the species diversity of the dominant teleost fish suborder-Notothenioidei. However, the forces that shape their evolution are still under debate. RESULTS: We sequenced the genome of an icefish, Chionodraco hamatus, and used population genomics and demographic modelling of sequenced genomes of 52 C. hamatus individuals collected mainly from two East Antarctic regions to investigate the factors driving speciation. Results revealed four icefish populations with clear reproduction separation were established 15 to 50 kya (kilo years ago) during the last glacial maxima (LGM). Selection sweeps in genes involving immune responses, cardiovascular development, and photoperception occurred differentially among the populations and were correlated with population-specific microbial communities and acquisition of distinct morphological features in the icefish taxa. Population and species-specific antifreeze glycoprotein gene expansion and glacial cycle-paced duplication/degeneration of the zona pellucida protein gene families indicated fluctuating thermal environments and periodic influence of glacial cycles on notothenioid divergence. CONCLUSIONS: We revealed a series of genomic evidence indicating differential adaptation of C. hamatus populations and notothenioid species divergence in the extreme and unique marine environment. We conclude that geographic separation and adaptation to heterogeneous pathogen, oxygen, and light conditions of local habitats, periodically shaped by the glacial cycles, were the key drivers propelling species diversity in Antarctica.


Subject(s)
Ice Cover , Perciformes , Animals , Antarctic Regions , Fishes/genetics , Genome , Metagenomics , Oxygen , Phylogeny
13.
Molecules ; 26(24)2021 Dec 07.
Article in English | MEDLINE | ID: mdl-34946497

ABSTRACT

An important reason of cancer proliferation is the change in DNA methylation patterns, characterized by the localized hypermethylation of the promoters of tumor-suppressor genes together with an overall decrease in the level of 5-methylcytosine (5mC). Therefore, identifying the 5mC sites in the promoters is a critical step towards further understanding the diverse functions of DNA methylation in genetic diseases such as cancers and aging. However, most wet-lab experimental techniques are often time consuming and laborious for detecting 5mC sites. In this study, we proposed a deep learning-based approach, called BiLSTM-5mC, for accurately identifying 5mC sites in genome-wide DNA promoters. First, we randomly divided the negative samples into 11 subsets of equal size, one of which can form the balance subset by combining with the positive samples in the same amount. Then, two types of feature vectors encoded by the one-hot method, and the nucleotide property and frequency (NPF) methods were fed into a bidirectional long short-term memory (BiLSTM) network and a full connection layer to train the 22 submodels. Finally, the outputs of these models were integrated to predict 5mC sites by using the majority vote strategy. Our experimental results demonstrated that BiLSTM-5mC outperformed existing methods based on the same independent dataset.


Subject(s)
5-Methylcytosine/analysis , Aging/metabolism , DNA/genetics , Deep Learning , Neoplasms/metabolism , 5-Methylcytosine/metabolism , Aging/genetics , DNA Methylation , Humans , Memory, Short-Term , Neoplasms/genetics , Promoter Regions, Genetic/genetics
14.
Comput Math Methods Med ; 2021: 5770981, 2021.
Article in English | MEDLINE | ID: mdl-34413898

ABSTRACT

Antioxidant proteins (AOPs) play important roles in the management and prevention of several human diseases due to their ability to neutralize excess free radicals. However, the identification of AOPs by using wet-lab experimental techniques is often time-consuming and expensive. In this study, we proposed an accurate computational model, called AOP-HMM, to predict AOPs by extracting discriminatory evolutionary features from hidden Markov model (HMM) profiles. First, auto cross-covariance (ACC) variables were applied to transform the HMM profiles into fixed-length feature vectors. Then, we performed the analysis of variance (ANOVA) method to reduce the dimensionality of the raw feature space. Finally, a support vector machine (SVM) classifier was adopted to conduct the prediction of AOPs. To comprehensively evaluate the performance of the proposed AOP-HMM model, the 10-fold cross-validation (CV), the jackknife CV, and the independent test were carried out on two widely used benchmark datasets. The experimental results demonstrated that AOP-HMM outperformed most of the existing methods and could be used to quickly annotate AOPs and guide the experimental process.


Subject(s)
Antioxidants/chemistry , Machine Learning , Peroxiredoxins/chemistry , Proteins/chemistry , Algorithms , Amino Acids/analysis , Antioxidants/classification , Computational Biology , Databases, Protein/statistics & numerical data , Evolution, Molecular , Humans , Markov Chains , Peroxiredoxins/classification , Proteins/classification
15.
Phys Rev E ; 103(4-1): 042409, 2021 Apr.
Article in English | MEDLINE | ID: mdl-34005973

ABSTRACT

Double stranded DNA can adopt different forms, the so-called A-, B-, and Z-DNA, which play different biological roles. In this work, the thermodynamic and the kinetic parameters for the base-pair closing and opening in A-DNA and B-DNA were calculated by all-atom molecular dynamics simulations at different temperatures. The thermodynamic parameters of the base pair in B-DNA were in good agreement with the experimental results. The free energy barrier of breaking a single base stack results from the enthalpy increase ΔH caused by the disruption of hydrogen bonding and base-stacking interactions, as well as water and base interactions. The free energy barrier of base pair closing comes from the unfavorable entropy loss ΔS caused by the restriction of torsional angles and hydration. It was found that the enthalpy change ΔH and the entropy change ΔS for the base pair in A-DNA are much larger than those in B-DNA, and the transition rates between the opening and the closing state for the base pair in A-DNA are much slower than those in B-DNA. The large difference of the enthalpy and entropy change for forming the base pair in A-DNA and B-DNA results from different hydration in A-DNA and B-DNA. The hydration pattern observed around DNA is an accompanying process for forming the base pair, rather than a follow-up of the conformation.


Subject(s)
DNA, A-Form , DNA, B-Form , Base Pairing , Molecular Dynamics Simulation , Thermodynamics
16.
Molecules ; 26(9)2021 Apr 24.
Article in English | MEDLINE | ID: mdl-33923273

ABSTRACT

Many gram-negative bacteria use type IV secretion systems to deliver effector molecules to a wide range of target cells. These substrate proteins, which are called type IV secreted effectors (T4SE), manipulate host cell processes during infection, often resulting in severe diseases or even death of the host. Therefore, identification of putative T4SEs has become a very active research topic in bioinformatics due to its vital roles in understanding host-pathogen interactions. PSI-BLAST profiles have been experimentally validated to provide important and discriminatory evolutionary information for various protein classification tasks. In the present study, an accurate computational predictor termed iT4SE-EP was developed for identifying T4SEs by extracting evolutionary features from the position-specific scoring matrix and the position-specific frequency matrix profiles. First, four types of encoding strategies were designed to transform protein sequences into fixed-length feature vectors based on the two profiles. Then, the feature selection technique based on the random forest algorithm was utilized to reduce redundant or irrelevant features without much loss of information. Finally, the optimal features were input into a support vector machine classifier to carry out the prediction of T4SEs. Our experimental results demonstrated that iT4SE-EP outperformed most of existing methods based on the independent dataset test.


Subject(s)
Evolution, Molecular , Gram-Negative Bacteria/genetics , Host-Pathogen Interactions/genetics , Type IV Secretion Systems/genetics , Amino Acid Sequence/genetics , Bacterial Infections/drug therapy , Bacterial Infections/genetics , Bacterial Infections/microbiology , Computational Biology , Gram-Negative Bacteria/pathogenicity , Humans , Type IV Secretion Systems/chemistry
17.
Comput Math Methods Med ; 2021: 6690299, 2021.
Article in English | MEDLINE | ID: mdl-33505516

ABSTRACT

Identification of bacterial type III secreted effectors (T3SEs) has become a popular research topic in the field of bioinformatics due to its crucial role in understanding host-pathogen interaction and developing better therapeutic targets against the pathogens. However, the recognition of all effector proteins by using traditional experimental approaches is often time-consuming and laborious. Therefore, development of computational methods to accurately predict putative novel effectors is important in reducing the number of biological experiments for validation. In this study, we proposed a method, called iT3SE-PX, to identify T3SEs solely based on protein sequences. First, three kinds of features were extracted from the position-specific scoring matrix (PSSM) profiles to help train a machine learning (ML) model. Then, the extreme gradient boosting (XGBoost) algorithm was performed to rank these features based on their classification ability. Finally, the optimal features were selected as inputs to a support vector machine (SVM) classifier to predict T3SEs. Based on the two benchmark datasets, we conducted a 100-time randomized 5-fold cross validation (CV) and an independent test, respectively. The experimental results demonstrated that the proposed method achieved superior performance compared to most of the existing methods and could serve as a useful tool for identifying putative T3SEs, given only the sequence information.


Subject(s)
Position-Specific Scoring Matrices , Support Vector Machine , Type III Secretion Systems/classification , Type III Secretion Systems/genetics , Algorithms , Amino Acid Sequence , Computational Biology , Databases, Protein , Machine Learning
18.
Biomed Res Int ; 2020: 7297631, 2020.
Article in English | MEDLINE | ID: mdl-32352006

ABSTRACT

DNA-binding proteins (DBPs) play vital roles in all aspects of genetic activities. However, the identification of DBPs by using wet-lab experimental approaches is often time-consuming and laborious. In this study, we develop a novel computational method, called PredDBP-Stack, to predict DBPs solely based on protein sequences. First, amino acid composition (AAC) and transition probability composition (TPC) extracted from the hidden markov model (HMM) profile are adopted to represent a protein. Next, we establish a stacked ensemble model to identify DBPs, which involves two stages of learning. In the first stage, the four base classifiers are trained with the features of HMM-based compositions. In the second stage, the prediction probabilities of these base classifiers are used as inputs to the meta-classifier to perform the final prediction of DBPs. Based on the PDB1075 benchmark dataset, we conduct a jackknife cross validation with the proposed PredDBP-Stack predictor and obtain a balanced sensitivity and specificity of 92.47% and 92.36%, respectively. This outcome outperforms most of the existing classifiers. Furthermore, our method also achieves superior performance and model robustness on the PDB186 independent dataset. This demonstrates that the PredDBP-Stack is an effective classifier for accurately identifying DBPs based on protein sequence information alone.


Subject(s)
DNA-Binding Proteins/chemistry , Databases, Protein , Software , Support Vector Machine
19.
Comput Math Methods Med ; 2020: 1384749, 2020.
Article in English | MEDLINE | ID: mdl-32300371

ABSTRACT

Prediction of DNA-binding proteins (DBPs) has become a popular research topic in protein science due to its crucial role in all aspects of biological activities. Even though considerable efforts have been devoted to developing powerful computational methods to solve this problem, it is still a challenging task in the field of bioinformatics. A hidden Markov model (HMM) profile has been proved to provide important clues for improving the prediction performance of DBPs. In this paper, we propose a method, called HMMPred, which extracts the features of amino acid composition and auto- and cross-covariance transformation from the HMM profiles, to help train a machine learning model for identification of DBPs. Then, a feature selection technique is performed based on the extreme gradient boosting (XGBoost) algorithm. Finally, the selected optimal features are fed into a support vector machine (SVM) classifier to predict DBPs. The experimental results tested on two benchmark datasets show that the proposed method is superior to most of the existing methods and could serve as an alternative tool to identify DBPs.


Subject(s)
Algorithms , DNA-Binding Proteins/chemistry , Machine Learning , Amino Acid Sequence , Amino Acids/analysis , Computational Biology , DNA-Binding Proteins/genetics , Databases, Protein/statistics & numerical data , Humans , Markov Chains , ROC Curve , Support Vector Machine
20.
RNA ; 26(4): 470-480, 2020 04.
Article in English | MEDLINE | ID: mdl-31988191

ABSTRACT

Due to the polyanionic nature of RNAs, the structural folding of RNAs are sensitive to solution salt conditions, while there is still lack of a deep understanding of the salt effect on the thermodynamics and kinetics of RNAs at a single base-pair level. In this work, the thermodynamic and the kinetic parameters for the base-pair AU closing/opening at different salt concentrations were calculated by 3-µsec all-atom molecular dynamics (MD) simulations at different temperatures. It was found that for the base-pair formation, the enthalpy change [Formula: see text] is nearly independent of salt concentration, while the entropy change [Formula: see text] exhibits a linear dependence on the logarithm of salt concentration, verifying the empirical assumption based on thermodynamic experiments. Our analyses revealed that such salt concentration dependence of the entropy change mainly results from the dependence of ion translational entropy change for the base pair closing/opening on salt concentration. Furthermore, the closing rate increases with the increasing of salt concentration, while the opening rate is nearly independent of salt concentration. Additionally, our analyses revealed that the free energy surface for describing the base-pair opening and closing dynamics becomes more rugged with the decrease of salt concentration.


Subject(s)
Molecular Dynamics Simulation , RNA/chemistry , Base Pairing , Osmolar Concentration , Sodium Chloride/chemistry
SELECTION OF CITATIONS
SEARCH DETAIL