Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 49
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
Int J Biol Macromol ; : 136147, 2024 Sep 30.
Artículo en Inglés | MEDLINE | ID: mdl-39357703

RESUMEN

Protein-DNA interactions play critical roles in various biological processes and are essential for drug discovery. However, traditional experimental methods are labor-intensive and unable to keep pace with the increasing volume of protein sequences, leading to a substantial number of proteins lacking DNA-binding annotations. Therefore, developing an efficient computational method to identify protein-DNA binding sites is crucial. Unfortunately, most existing computational methods rely on manually selected features or protein structure information, making these methods inapplicable to large-scale prediction tasks. In this study, we introduced PDNAPred, a sequence-based method that combines two pre-trained protein language models with a designed CNN-GRU network to identify DNA-binding sites. Additionally, to tackle the issue of imbalanced dataset samples, we employed focal loss. Our comprehensive experiments demonstrated that PDNAPred significantly improved the accuracy of DNA-binding site prediction, outperforming existing state-of-the-art sequence-based methods. Remarkably, PDNAPred also achieved results comparable to advanced structure-based methods. The designed CNN-GRU network enhances its capability to detect DNA-binding sites accurately. Furthermore, we validated the versatility of PDNAPred by training it on RNA-binding site datasets, showing its potential as a general framework for amino acid binding site prediction. Finally, we conducted model interpretability analysis to elucidate the reasons behind PDNAPred's outstanding performance.

2.
Int J Biol Macromol ; 280(Pt 3): 135762, 2024 Sep 23.
Artículo en Inglés | MEDLINE | ID: mdl-39322150

RESUMEN

Allergy is a prevalent phenomenon, involving allergens such as nuts and milk. Avoiding exposure to allergens is the most effective preventive measure against allergic reactions. However, current homology-based methods for identifying allergenic proteins encounter challenges when dealing with non-homologous data. Traditional machine learning approaches rely on manually extracted features, which lack important protein functional characteristics, including evolutionary information. Consequently, there is still considerable room for improvement in existing methods. In this study, we present PreAlgPro, a method for identifying allergenic proteins based on pre-trained protein language models and deep learning techniques. Specifically, we employed the ProtT5 model to extract protein embedding features, replacing the manual feature extraction step. Furthermore, we devised an Attention-CNN neural network architecture to identify potential features that contribute to the classification of allergenic proteins. The performance of our model was evaluated on four independent test sets, and the experimental results demonstrate that PreAlgPro surpasses existing state-of-the-art methods. Additionally, we collected allergenic protein samples to validate the robustness of the model and conducted an analysis of model interpretability.

3.
Int J Mol Sci ; 25(15)2024 Jul 23.
Artículo en Inglés | MEDLINE | ID: mdl-39125602

RESUMEN

The benzofuran core inhibitors HCV-796, BMS-929075, MK-8876, compound 2, and compound 9B exhibit good pan-genotypic activity against various genotypes of NS5B polymerase. To elucidate their mechanism of action, multiple molecular simulation methods were used to investigate the complex systems of these inhibitors binding to GT1a, 1b, 2a, and 2b NS5B polymerases. The calculation results indicated that these five inhibitors can not only interact with the residues in the palm II subdomain of NS5B polymerase, but also with the residues in the palm I subdomain or the palm I/III overlap region. Interestingly, the binding of inhibitors with longer substituents at the C5 position (BMS-929075, MK-8876, compound 2, and compound 9B) to the GT1a and 2b NS5B polymerases exhibits different binding patterns compared to the binding to the GT1b and 2a NS5B polymerases. The interactions between the para-fluorophenyl groups at the C2 positions of the inhibitors and the residues at the binding pockets, together with the interactions between the substituents at the C5 positions and the residues at the reverse ß-fold (residues 441-456), play a key role in recognition and the induction of the binding. The relevant studies could provide valuable information for further research and development of novel anti-HCV benzofuran core pan-genotypic inhibitors.


Asunto(s)
Antivirales , Benzofuranos , Genotipo , Hepacivirus , Proteínas no Estructurales Virales , Proteínas no Estructurales Virales/antagonistas & inhibidores , Proteínas no Estructurales Virales/metabolismo , Proteínas no Estructurales Virales/química , Benzofuranos/química , Benzofuranos/farmacología , Hepacivirus/efectos de los fármacos , Hepacivirus/enzimología , Hepacivirus/genética , Antivirales/farmacología , Antivirales/química , Simulación de Dinámica Molecular , Simulación del Acoplamiento Molecular , Sitios de Unión , Unión Proteica , Humanos , Inhibidores Enzimáticos/farmacología , Inhibidores Enzimáticos/química , ARN Polimerasa Dependiente del ARN
4.
J Cell Biochem ; : e30642, 2024 Aug 20.
Artículo en Inglés | MEDLINE | ID: mdl-39164870

RESUMEN

The Type III secretion effectors (T3SEs) are bacterial proteins synthesized by Gram-negative pathogens and delivered into host cells via the Type III secretion system (T3SS). These effectors usually play a pivotal role in the interactions between bacteria and hosts. Hence, the precise identification of T3SEs aids researchers in exploring the pathogenic mechanisms of bacterial infections. Since the diversity and complexity of T3SE sequences often make traditional experimental methods time-consuming, it is imperative to explore more efficient and convenient computational approaches for T3SE prediction. Inspired by the promising potential exhibited by pre-trained language models in protein recognition tasks, we proposed a method called PLM-T3SE that utilizes protein language models (PLMs) for effective recognition of T3SEs. First, we utilized PLM embeddings and evolutionary features from the position-specific scoring matrix (PSSM) profiles to transform protein sequences into fixed-length vectors for model training. Second, we employed the extreme gradient boosting (XGBoost) algorithm to rank these features based on their importance. Finally, a MLP neural network model was used to predict T3SEs based on the selected optimal feature set. Experimental results from the cross-validation and independent test demonstrated that our model exhibited superior performance compared to the existing models. Specifically, our model achieved an accuracy of 98.1%, which is 1.8%-42.4% higher than the state-of-the-art predictors based on the same independent data set test. These findings highlight the superiority of the PLM-T3SE and the remarkable characterization ability of PLM embeddings for T3SE prediction.

5.
Anal Biochem ; 694: 115603, 2024 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-38986796

RESUMEN

The recognition of DNA-binding proteins (DBPs) is the crucial step to understanding their roles in various biological processes such as genetic regulation, gene expression, cell cycle control, DNA repair, and replication within cells. However, conventional experimental methods for identifying DBPs are usually time-consuming and expensive. Therefore, there is an urgent need to develop rapid and efficient computational methods for the prediction of DBPs. In this study, we proposed a novel predictor named PreDBP-PLMs to further improve the identification accuracy of DBPs by fusing the pre-trained protein language model (PLM) ProtT5 embedding with evolutionary features as input to the classic convolutional neural network (CNN) model. Firstly, the ProtT5 embedding was combined with different evolutionary features derived from the position-specific scoring matrix (PSSM) to represent protein sequences. Then, the optimal feature combination was selected and input to the CNN classifier for the prediction of DBPs. Finally, the 5-fold cross-validation (CV), the leave-one-out CV (LOOCV), and the independent set test were adopted to examine the performance of PreDBP-PLMs on the benchmark datasets. Compared to the existing state-of-the-art predictors, PreDBP-PLMs exhibits an accuracy improvement of 0.5 % and 5.2 % on the PDB186 and PDB2272 datasets, respectively. It demonstrated that the proposed method could serve as a useful tool for the recognition of DBPs.


Asunto(s)
Proteínas de Unión al ADN , Redes Neurales de la Computación , Proteínas de Unión al ADN/metabolismo , Proteínas de Unión al ADN/química , Biología Computacional/métodos , Bases de Datos de Proteínas , Humanos
6.
Molecules ; 29(11)2024 Jun 04.
Artículo en Inglés | MEDLINE | ID: mdl-38893524

RESUMEN

The stimulator of interferon genes (STING) plays a significant role in immune defense and protection against tumor proliferation. Many cyclic dinucleotide (CDN) analogues have been reported to regulate its activity, but the dynamic process involved when the ligands activate STING remains unclear. In this work, all-atom molecular dynamics simulations were performed to explore the binding mode between human STING (hSTING) and four cyclic adenosine-inosine monophosphate analogs (cAIMPs), as well as 2',3'-cGMP-AMP (2',3'-cGAMP). The results indicate that these cAIMPs adopt a U-shaped configuration within the binding pocket, forming extensive non-covalent interaction networks with hSTING. These interactions play a significant role in augmenting the binding, particularly in interactions with Tyr167, Arg238, Thr263, and Thr267. Additionally, the presence of hydrophobic interactions between the ligand and the receptor further contributes to the overall stability of the binding. In this work, the conformational changes in hSTING upon binding these cAIMPs were also studied and a significant tendency for hSTING to shift from open to closed state was observed after binding some of the cAIMP ligands.


Asunto(s)
Proteínas de la Membrana , Simulación de Dinámica Molecular , Unión Proteica , Humanos , Proteínas de la Membrana/química , Proteínas de la Membrana/metabolismo , Sitios de Unión , Nucleótidos Cíclicos/química , Nucleótidos Cíclicos/metabolismo , Ligandos , Interacciones Hidrofóbicas e Hidrofílicas
7.
Int J Mol Sci ; 25(8)2024 Apr 19.
Artículo en Inglés | MEDLINE | ID: mdl-38674091

RESUMEN

Identification of druggable proteins can greatly reduce the cost of discovering new potential drugs. Traditional experimental approaches to exploring these proteins are often costly, slow, and labor-intensive, making them impractical for large-scale research. In response, recent decades have seen a rise in computational methods. These alternatives support drug discovery by creating advanced predictive models. In this study, we proposed a fast and precise classifier for the identification of druggable proteins using a protein language model (PLM) with fine-tuned evolutionary scale modeling 2 (ESM-2) embeddings, achieving 95.11% accuracy on the benchmark dataset. Furthermore, we made a careful comparison to examine the predictive abilities of ESM-2 embeddings and position-specific scoring matrix (PSSM) features by using the same classifiers. The results suggest that ESM-2 embeddings outperformed PSSM features in terms of accuracy and efficiency. Recognizing the potential of language models, we also developed an end-to-end model based on the generative pre-trained transformers 2 (GPT-2) with modifications. To our knowledge, this is the first time a large language model (LLM) GPT-2 has been deployed for the recognition of druggable proteins. Additionally, a more up-to-date dataset, known as Pharos, was adopted to further validate the performance of the proposed model.


Asunto(s)
Proteínas , Proteínas/metabolismo , Biología Computacional/métodos , Descubrimiento de Drogas/métodos , Posición Específica de Matrices de Puntuación , Bases de Datos de Proteínas , Humanos , Algoritmos
8.
Math Biosci Eng ; 21(1): 1472-1488, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-38303473

RESUMEN

Non-classical secreted proteins (NCSPs) refer to a group of proteins that are located in the extracellular environment despite the absence of signal peptides and motifs. They usually play different roles in intercellular communication. Therefore, the accurate prediction of NCSPs is a critical step to understanding in depth their associated secretion mechanisms. Since the experimental recognition of NCSPs is often costly and time-consuming, computational methods are desired. In this study, we proposed an ensemble learning framework, termed NCSP-PLM, for the identification of NCSPs by extracting feature embeddings from pre-trained protein language models (PLMs) as input to several fine-tuned deep learning models. First, we compared the performance of nine PLM embeddings by training three neural networks: Multi-layer perceptron (MLP), attention mechanism and bidirectional long short-term memory network (BiLSTM) and selected the best network model for each PLM embedding. Then, four models were excluded due to their below-average accuracies, and the remaining five models were integrated to perform the prediction of NCSPs based on the weighted voting. Finally, the 5-fold cross validation and the independent test were conducted to evaluate the performance of NCSP-PLM on the benchmark datasets. Based on the same independent dataset, the sensitivity and specificity of NCSP-PLM were 91.18% and 97.06%, respectively. Particularly, the overall accuracy of our model achieved 94.12%, which was 7~16% higher than that of the existing state-of-the-art predictors. It indicated that NCSP-PLM could serve as a useful tool for the annotation of NCSPs.


Asunto(s)
Aprendizaje Profundo , Redes Neurales de la Computación , Proteínas , Lenguaje , Sensibilidad y Especificidad
9.
Molecules ; 29(2)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38276629

RESUMEN

Lysine-specific demethylase 1 (LSD1/KDM1A) has emerged as a promising therapeutic target for treating various cancers (such as breast cancer, liver cancer, etc.) and other diseases (blood diseases, cardiovascular diseases, etc.), owing to its observed overexpression, thereby presenting significant opportunities in drug development. Since its discovery in 2004, extensive research has been conducted on LSD1 inhibitors, with notable contributions from computational approaches. This review systematically summarizes LSD1 inhibitors investigated through computer-aided drug design (CADD) technologies since 2010, showcasing a diverse range of chemical scaffolds, including phenelzine derivatives, tranylcypromine (abbreviated as TCP or 2-PCPA) derivatives, nitrogen-containing heterocyclic (pyridine, pyrimidine, azole, thieno[3,2-b]pyrrole, indole, quinoline and benzoxazole) derivatives, natural products (including sanguinarine, phenolic compounds and resveratrol derivatives, flavonoids and other natural products) and others (including thiourea compounds, Fenoldopam and Raloxifene, (4-cyanophenyl)glycine derivatives, propargylamine and benzohydrazide derivatives and inhibitors discovered through AI techniques). Computational techniques, such as virtual screening, molecular docking and 3D-QSAR models, have played a pivotal role in elucidating the interactions between these inhibitors and LSD1. Moreover, the integration of cutting-edge technologies such as artificial intelligence holds promise in facilitating the discovery of novel LSD1 inhibitors. The comprehensive insights presented in this review aim to provide valuable information for advancing further research on LSD1 inhibitors.


Asunto(s)
Productos Biológicos , Inhibidores Enzimáticos , Inhibidores Enzimáticos/farmacología , Inhibidores Enzimáticos/química , Lisina , Simulación del Acoplamiento Molecular , Inteligencia Artificial , Diseño de Fármacos , Histona Demetilasas/metabolismo , Relación Estructura-Actividad
10.
J Phys Chem B ; 127(22): 4989-4997, 2023 06 08.
Artículo en Inglés | MEDLINE | ID: mdl-37243666

RESUMEN

CRISPR (clustered regularly interspaced short palindromic repeats)/CRISPR-associated protein (Cas9) has been widely used for gene editing. Not all guide RNAs can cleave the DNA efficiently remains a major challenge to CRISPR/Cas9-mediated genome engineering. Therefore, understanding how the Cas9 complex successfully and efficiently identifies specific functional targets through base-pairing has great implications for such applications. The 10-nt seed sequence at the 3' end of the guide RNA is critical to target recognition and cleavage. Here, through stretching molecular dynamics simulation, we studied the thermodynamics and kinetics of the binding-dissociation process of the seed base and the target DNA base with the Cas9 protein. The results showed that in the presence of Cas9 protein, the enthalpy change and entropy change in binding-dissociation of the seed base with the target are smaller than those without the Cas9 protein. The reduction of entropy penalty upon association with the protein resulted from the pre-organization of the seed base in an A-form helix, and the reduction of enthalpy change was due to the electrostatic attraction of the positively charged channel with the negative target DNA. The binding barrier coming from the entropy loss and the dissociation barrier resulting from the destruction of the base pair in the presence of Cas9 protein were lower than those without protein, which indicates that the seed region is crucial for efficiently searching the correct target by accelerating the binding rate and dissociating fast from the wrong target.


Asunto(s)
Proteína 9 Asociada a CRISPR , Sistemas CRISPR-Cas , Proteína 9 Asociada a CRISPR/genética , Proteína 9 Asociada a CRISPR/metabolismo , Emparejamiento Base , Edición Génica/métodos , ADN/química
11.
Molecules ; 28(5)2023 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-36903531

RESUMEN

The subcellular localization of messenger RNA (mRNA) precisely controls where protein products are synthesized and where they function. However, obtaining an mRNA's subcellular localization through wet-lab experiments is time-consuming and expensive, and many existing mRNA subcellular localization prediction algorithms need to be improved. In this study, a deep neural network-based eukaryotic mRNA subcellular location prediction method, DeepmRNALoc, was proposed, utilizing a two-stage feature extraction strategy that featured bimodal information splitting and fusing for the first stage and a VGGNet-like CNN module for the second stage. The five-fold cross-validation accuracies of DeepmRNALoc in the cytoplasm, endoplasmic reticulum, extracellular region, mitochondria, and nucleus were 0.895, 0.594, 0.308, 0.944, and 0.865, respectively, demonstrating that it outperforms existing models and techniques.


Asunto(s)
Aprendizaje Profundo , Eucariontes , Eucariontes/metabolismo , Proteínas/metabolismo , Retículo Endoplásmico/metabolismo , ARN Mensajero , Biología Computacional/métodos
12.
Phys Rev E ; 107(2-1): 024404, 2023 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-36932572

RESUMEN

Mechanical force has been widely used to study RNA folding and unfolding. Understanding how the force affects the opening and closing of a single base pair, which is a basic step for RNA folding and unfolding and a fundamental behavior in some important biological activities, is crucial to understanding the mechanism of RNA folding and unfolding under mechanical force. In this work, we investigated the opening and closing process of an RNA base pair under mechanical force with constant-force stretching molecular dynamics simulations. It was found that high mechanical force results in overstretching, and the open state is a high-energy state. The enthalpy and entropy change of the base-pair opening-closing transition were obtained and the results at low forces were in good agreement with the nearest-neighbor model. The temperature and force dependence of the opening and closing rates were also obtained. The position of the transition state for the base-pair opening-closing transition under mechanical force was determined. The free energy barrier of opening a base pair without force is the enthalpy increase, and the work done by the force from the closed state to the transition state decreases the barrier and increases the opening rate. The free energy barrier of closing the base pair without force results from the entropy loss, and the work done by the force from the open state to the transition state increases the barrier and decreases the closing rate. The transition rates are strongly dependent on the temperature and force, while the transition path times are weakly dependent on force and temperature.


Asunto(s)
Simulación de Dinámica Molecular , ARN , Emparejamiento Base , Termodinámica , Fenómenos Mecánicos , Cinética
13.
Molecules ; 27(23)2022 Nov 30.
Artículo en Inglés | MEDLINE | ID: mdl-36500451

RESUMEN

Lysine-specific demethylase 1 (LSD1) is a histone-modifying enzyme, which is a significant target for anticancer drug research. In this work, 40 reported tetrahydroquinoline-derivative inhibitors targeting LSD1 were studied to establish the three-dimensional quantitative structure-activity relationship (3D-QSAR). The established models CoMFA (Comparative Molecular Field Analysis (q2 = 0.778, Rpred2 = 0.709)) and CoMSIA (Comparative Molecular Similarity Index Analysis (q2 = 0.764, Rpred2 = 0.713)) yielded good statistical and predictive properties. Based on the corresponding contour maps, seven novel tetrahydroquinoline derivatives were designed. For more information, three of the compounds (D1, D4, and Z17) and the template molecule 18x were explored with molecular dynamics simulations, binding free energy calculations by MM/PBSA method as well as the ADME (absorption, distribution, metabolism, and excretion) prediction. The results suggested that D1, D4, and Z17 performed better than template molecule 18x due to the introduction of the amino and hydrophobic groups, especially for the D1 and D4, which will provide guidance for the design of LSD1 inhibitors.


Asunto(s)
Antineoplásicos , Relación Estructura-Actividad Cuantitativa , Simulación del Acoplamiento Molecular , Simulación de Dinámica Molecular , Interacciones Hidrofóbicas e Hidrofílicas , Antineoplásicos/farmacología , Diseño de Fármacos
14.
BMC Biol ; 20(1): 231, 2022 10 13.
Artículo en Inglés | MEDLINE | ID: mdl-36224580

RESUMEN

BACKGROUND: Antarctica harbors the bulk of the species diversity of the dominant teleost fish suborder-Notothenioidei. However, the forces that shape their evolution are still under debate. RESULTS: We sequenced the genome of an icefish, Chionodraco hamatus, and used population genomics and demographic modelling of sequenced genomes of 52 C. hamatus individuals collected mainly from two East Antarctic regions to investigate the factors driving speciation. Results revealed four icefish populations with clear reproduction separation were established 15 to 50 kya (kilo years ago) during the last glacial maxima (LGM). Selection sweeps in genes involving immune responses, cardiovascular development, and photoperception occurred differentially among the populations and were correlated with population-specific microbial communities and acquisition of distinct morphological features in the icefish taxa. Population and species-specific antifreeze glycoprotein gene expansion and glacial cycle-paced duplication/degeneration of the zona pellucida protein gene families indicated fluctuating thermal environments and periodic influence of glacial cycles on notothenioid divergence. CONCLUSIONS: We revealed a series of genomic evidence indicating differential adaptation of C. hamatus populations and notothenioid species divergence in the extreme and unique marine environment. We conclude that geographic separation and adaptation to heterogeneous pathogen, oxygen, and light conditions of local habitats, periodically shaped by the glacial cycles, were the key drivers propelling species diversity in Antarctica.


Asunto(s)
Cubierta de Hielo , Perciformes , Animales , Regiones Antárticas , Peces/genética , Genoma , Metagenómica , Oxígeno , Filogenia
15.
Molecules ; 26(24)2021 Dec 07.
Artículo en Inglés | MEDLINE | ID: mdl-34946497

RESUMEN

An important reason of cancer proliferation is the change in DNA methylation patterns, characterized by the localized hypermethylation of the promoters of tumor-suppressor genes together with an overall decrease in the level of 5-methylcytosine (5mC). Therefore, identifying the 5mC sites in the promoters is a critical step towards further understanding the diverse functions of DNA methylation in genetic diseases such as cancers and aging. However, most wet-lab experimental techniques are often time consuming and laborious for detecting 5mC sites. In this study, we proposed a deep learning-based approach, called BiLSTM-5mC, for accurately identifying 5mC sites in genome-wide DNA promoters. First, we randomly divided the negative samples into 11 subsets of equal size, one of which can form the balance subset by combining with the positive samples in the same amount. Then, two types of feature vectors encoded by the one-hot method, and the nucleotide property and frequency (NPF) methods were fed into a bidirectional long short-term memory (BiLSTM) network and a full connection layer to train the 22 submodels. Finally, the outputs of these models were integrated to predict 5mC sites by using the majority vote strategy. Our experimental results demonstrated that BiLSTM-5mC outperformed existing methods based on the same independent dataset.


Asunto(s)
5-Metilcitosina/análisis , Envejecimiento/metabolismo , ADN/genética , Aprendizaje Profundo , Neoplasias/metabolismo , 5-Metilcitosina/metabolismo , Envejecimiento/genética , Metilación de ADN , Humanos , Memoria a Corto Plazo , Neoplasias/genética , Regiones Promotoras Genéticas/genética
16.
Comput Math Methods Med ; 2021: 5770981, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34413898

RESUMEN

Antioxidant proteins (AOPs) play important roles in the management and prevention of several human diseases due to their ability to neutralize excess free radicals. However, the identification of AOPs by using wet-lab experimental techniques is often time-consuming and expensive. In this study, we proposed an accurate computational model, called AOP-HMM, to predict AOPs by extracting discriminatory evolutionary features from hidden Markov model (HMM) profiles. First, auto cross-covariance (ACC) variables were applied to transform the HMM profiles into fixed-length feature vectors. Then, we performed the analysis of variance (ANOVA) method to reduce the dimensionality of the raw feature space. Finally, a support vector machine (SVM) classifier was adopted to conduct the prediction of AOPs. To comprehensively evaluate the performance of the proposed AOP-HMM model, the 10-fold cross-validation (CV), the jackknife CV, and the independent test were carried out on two widely used benchmark datasets. The experimental results demonstrated that AOP-HMM outperformed most of the existing methods and could be used to quickly annotate AOPs and guide the experimental process.


Asunto(s)
Antioxidantes/química , Aprendizaje Automático , Peroxirredoxinas/química , Proteínas/química , Algoritmos , Aminoácidos/análisis , Antioxidantes/clasificación , Biología Computacional , Bases de Datos de Proteínas/estadística & datos numéricos , Evolución Molecular , Humanos , Cadenas de Markov , Peroxirredoxinas/clasificación , Proteínas/clasificación
17.
Phys Rev E ; 103(4-1): 042409, 2021 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-34005973

RESUMEN

Double stranded DNA can adopt different forms, the so-called A-, B-, and Z-DNA, which play different biological roles. In this work, the thermodynamic and the kinetic parameters for the base-pair closing and opening in A-DNA and B-DNA were calculated by all-atom molecular dynamics simulations at different temperatures. The thermodynamic parameters of the base pair in B-DNA were in good agreement with the experimental results. The free energy barrier of breaking a single base stack results from the enthalpy increase ΔH caused by the disruption of hydrogen bonding and base-stacking interactions, as well as water and base interactions. The free energy barrier of base pair closing comes from the unfavorable entropy loss ΔS caused by the restriction of torsional angles and hydration. It was found that the enthalpy change ΔH and the entropy change ΔS for the base pair in A-DNA are much larger than those in B-DNA, and the transition rates between the opening and the closing state for the base pair in A-DNA are much slower than those in B-DNA. The large difference of the enthalpy and entropy change for forming the base pair in A-DNA and B-DNA results from different hydration in A-DNA and B-DNA. The hydration pattern observed around DNA is an accompanying process for forming the base pair, rather than a follow-up of the conformation.


Asunto(s)
ADN de Forma A , ADN Forma B , Emparejamiento Base , Simulación de Dinámica Molecular , Termodinámica
18.
Molecules ; 26(9)2021 Apr 24.
Artículo en Inglés | MEDLINE | ID: mdl-33923273

RESUMEN

Many gram-negative bacteria use type IV secretion systems to deliver effector molecules to a wide range of target cells. These substrate proteins, which are called type IV secreted effectors (T4SE), manipulate host cell processes during infection, often resulting in severe diseases or even death of the host. Therefore, identification of putative T4SEs has become a very active research topic in bioinformatics due to its vital roles in understanding host-pathogen interactions. PSI-BLAST profiles have been experimentally validated to provide important and discriminatory evolutionary information for various protein classification tasks. In the present study, an accurate computational predictor termed iT4SE-EP was developed for identifying T4SEs by extracting evolutionary features from the position-specific scoring matrix and the position-specific frequency matrix profiles. First, four types of encoding strategies were designed to transform protein sequences into fixed-length feature vectors based on the two profiles. Then, the feature selection technique based on the random forest algorithm was utilized to reduce redundant or irrelevant features without much loss of information. Finally, the optimal features were input into a support vector machine classifier to carry out the prediction of T4SEs. Our experimental results demonstrated that iT4SE-EP outperformed most of existing methods based on the independent dataset test.


Asunto(s)
Evolución Molecular , Bacterias Gramnegativas/genética , Interacciones Huésped-Patógeno/genética , Sistemas de Secreción Tipo IV/genética , Secuencia de Aminoácidos/genética , Infecciones Bacterianas/tratamiento farmacológico , Infecciones Bacterianas/genética , Infecciones Bacterianas/microbiología , Biología Computacional , Bacterias Gramnegativas/patogenicidad , Humanos , Sistemas de Secreción Tipo IV/química
19.
Comput Math Methods Med ; 2021: 6690299, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33505516

RESUMEN

Identification of bacterial type III secreted effectors (T3SEs) has become a popular research topic in the field of bioinformatics due to its crucial role in understanding host-pathogen interaction and developing better therapeutic targets against the pathogens. However, the recognition of all effector proteins by using traditional experimental approaches is often time-consuming and laborious. Therefore, development of computational methods to accurately predict putative novel effectors is important in reducing the number of biological experiments for validation. In this study, we proposed a method, called iT3SE-PX, to identify T3SEs solely based on protein sequences. First, three kinds of features were extracted from the position-specific scoring matrix (PSSM) profiles to help train a machine learning (ML) model. Then, the extreme gradient boosting (XGBoost) algorithm was performed to rank these features based on their classification ability. Finally, the optimal features were selected as inputs to a support vector machine (SVM) classifier to predict T3SEs. Based on the two benchmark datasets, we conducted a 100-time randomized 5-fold cross validation (CV) and an independent test, respectively. The experimental results demonstrated that the proposed method achieved superior performance compared to most of the existing methods and could serve as a useful tool for identifying putative T3SEs, given only the sequence information.


Asunto(s)
Posición Específica de Matrices de Puntuación , Máquina de Vectores de Soporte , Sistemas de Secreción Tipo III/clasificación , Sistemas de Secreción Tipo III/genética , Algoritmos , Secuencia de Aminoácidos , Biología Computacional , Bases de Datos de Proteínas , Aprendizaje Automático
20.
Biomed Res Int ; 2020: 7297631, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32352006

RESUMEN

DNA-binding proteins (DBPs) play vital roles in all aspects of genetic activities. However, the identification of DBPs by using wet-lab experimental approaches is often time-consuming and laborious. In this study, we develop a novel computational method, called PredDBP-Stack, to predict DBPs solely based on protein sequences. First, amino acid composition (AAC) and transition probability composition (TPC) extracted from the hidden markov model (HMM) profile are adopted to represent a protein. Next, we establish a stacked ensemble model to identify DBPs, which involves two stages of learning. In the first stage, the four base classifiers are trained with the features of HMM-based compositions. In the second stage, the prediction probabilities of these base classifiers are used as inputs to the meta-classifier to perform the final prediction of DBPs. Based on the PDB1075 benchmark dataset, we conduct a jackknife cross validation with the proposed PredDBP-Stack predictor and obtain a balanced sensitivity and specificity of 92.47% and 92.36%, respectively. This outcome outperforms most of the existing classifiers. Furthermore, our method also achieves superior performance and model robustness on the PDB186 independent dataset. This demonstrates that the PredDBP-Stack is an effective classifier for accurately identifying DBPs based on protein sequence information alone.


Asunto(s)
Proteínas de Unión al ADN/química , Bases de Datos de Proteínas , Programas Informáticos , Máquina de Vectores de Soporte
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...