Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 43
Filtrar
1.
Int J Mol Sci ; 25(8)2024 Apr 19.
Artigo em Inglês | MEDLINE | ID: mdl-38674091

RESUMO

Identification of druggable proteins can greatly reduce the cost of discovering new potential drugs. Traditional experimental approaches to exploring these proteins are often costly, slow, and labor-intensive, making them impractical for large-scale research. In response, recent decades have seen a rise in computational methods. These alternatives support drug discovery by creating advanced predictive models. In this study, we proposed a fast and precise classifier for the identification of druggable proteins using a protein language model (PLM) with fine-tuned evolutionary scale modeling 2 (ESM-2) embeddings, achieving 95.11% accuracy on the benchmark dataset. Furthermore, we made a careful comparison to examine the predictive abilities of ESM-2 embeddings and position-specific scoring matrix (PSSM) features by using the same classifiers. The results suggest that ESM-2 embeddings outperformed PSSM features in terms of accuracy and efficiency. Recognizing the potential of language models, we also developed an end-to-end model based on the generative pre-trained transformers 2 (GPT-2) with modifications. To our knowledge, this is the first time a large language model (LLM) GPT-2 has been deployed for the recognition of druggable proteins. Additionally, a more up-to-date dataset, known as Pharos, was adopted to further validate the performance of the proposed model.


Assuntos
Proteínas , Proteínas/metabolismo , Biologia Computacional/métodos , Descoberta de Drogas/métodos , Matrizes de Pontuação de Posição Específica , Bases de Dados de Proteínas , Humanos , Algoritmos
2.
Math Biosci Eng ; 21(1): 1472-1488, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38303473

RESUMO

Non-classical secreted proteins (NCSPs) refer to a group of proteins that are located in the extracellular environment despite the absence of signal peptides and motifs. They usually play different roles in intercellular communication. Therefore, the accurate prediction of NCSPs is a critical step to understanding in depth their associated secretion mechanisms. Since the experimental recognition of NCSPs is often costly and time-consuming, computational methods are desired. In this study, we proposed an ensemble learning framework, termed NCSP-PLM, for the identification of NCSPs by extracting feature embeddings from pre-trained protein language models (PLMs) as input to several fine-tuned deep learning models. First, we compared the performance of nine PLM embeddings by training three neural networks: Multi-layer perceptron (MLP), attention mechanism and bidirectional long short-term memory network (BiLSTM) and selected the best network model for each PLM embedding. Then, four models were excluded due to their below-average accuracies, and the remaining five models were integrated to perform the prediction of NCSPs based on the weighted voting. Finally, the 5-fold cross validation and the independent test were conducted to evaluate the performance of NCSP-PLM on the benchmark datasets. Based on the same independent dataset, the sensitivity and specificity of NCSP-PLM were 91.18% and 97.06%, respectively. Particularly, the overall accuracy of our model achieved 94.12%, which was 7~16% higher than that of the existing state-of-the-art predictors. It indicated that NCSP-PLM could serve as a useful tool for the annotation of NCSPs.


Assuntos
Aprendizado Profundo , Redes Neurais de Computação , Proteínas , Idioma , Sensibilidade e Especificidade
3.
Molecules ; 29(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38276629

RESUMO

Lysine-specific demethylase 1 (LSD1/KDM1A) has emerged as a promising therapeutic target for treating various cancers (such as breast cancer, liver cancer, etc.) and other diseases (blood diseases, cardiovascular diseases, etc.), owing to its observed overexpression, thereby presenting significant opportunities in drug development. Since its discovery in 2004, extensive research has been conducted on LSD1 inhibitors, with notable contributions from computational approaches. This review systematically summarizes LSD1 inhibitors investigated through computer-aided drug design (CADD) technologies since 2010, showcasing a diverse range of chemical scaffolds, including phenelzine derivatives, tranylcypromine (abbreviated as TCP or 2-PCPA) derivatives, nitrogen-containing heterocyclic (pyridine, pyrimidine, azole, thieno[3,2-b]pyrrole, indole, quinoline and benzoxazole) derivatives, natural products (including sanguinarine, phenolic compounds and resveratrol derivatives, flavonoids and other natural products) and others (including thiourea compounds, Fenoldopam and Raloxifene, (4-cyanophenyl)glycine derivatives, propargylamine and benzohydrazide derivatives and inhibitors discovered through AI techniques). Computational techniques, such as virtual screening, molecular docking and 3D-QSAR models, have played a pivotal role in elucidating the interactions between these inhibitors and LSD1. Moreover, the integration of cutting-edge technologies such as artificial intelligence holds promise in facilitating the discovery of novel LSD1 inhibitors. The comprehensive insights presented in this review aim to provide valuable information for advancing further research on LSD1 inhibitors.


Assuntos
Produtos Biológicos , Inibidores Enzimáticos , Inibidores Enzimáticos/farmacologia , Inibidores Enzimáticos/química , Lisina , Simulação de Acoplamento Molecular , Inteligência Artificial , Desenho de Fármacos , Histona Desmetilases/metabolismo , Relação Estrutura-Atividade
4.
J Phys Chem B ; 127(22): 4989-4997, 2023 06 08.
Artigo em Inglês | MEDLINE | ID: mdl-37243666

RESUMO

CRISPR (clustered regularly interspaced short palindromic repeats)/CRISPR-associated protein (Cas9) has been widely used for gene editing. Not all guide RNAs can cleave the DNA efficiently remains a major challenge to CRISPR/Cas9-mediated genome engineering. Therefore, understanding how the Cas9 complex successfully and efficiently identifies specific functional targets through base-pairing has great implications for such applications. The 10-nt seed sequence at the 3' end of the guide RNA is critical to target recognition and cleavage. Here, through stretching molecular dynamics simulation, we studied the thermodynamics and kinetics of the binding-dissociation process of the seed base and the target DNA base with the Cas9 protein. The results showed that in the presence of Cas9 protein, the enthalpy change and entropy change in binding-dissociation of the seed base with the target are smaller than those without the Cas9 protein. The reduction of entropy penalty upon association with the protein resulted from the pre-organization of the seed base in an A-form helix, and the reduction of enthalpy change was due to the electrostatic attraction of the positively charged channel with the negative target DNA. The binding barrier coming from the entropy loss and the dissociation barrier resulting from the destruction of the base pair in the presence of Cas9 protein were lower than those without protein, which indicates that the seed region is crucial for efficiently searching the correct target by accelerating the binding rate and dissociating fast from the wrong target.


Assuntos
Proteína 9 Associada à CRISPR , Sistemas CRISPR-Cas , Proteína 9 Associada à CRISPR/genética , Proteína 9 Associada à CRISPR/metabolismo , Pareamento de Bases , Edição de Genes/métodos , DNA/química
5.
Phys Rev E ; 107(2-1): 024404, 2023 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-36932572

RESUMO

Mechanical force has been widely used to study RNA folding and unfolding. Understanding how the force affects the opening and closing of a single base pair, which is a basic step for RNA folding and unfolding and a fundamental behavior in some important biological activities, is crucial to understanding the mechanism of RNA folding and unfolding under mechanical force. In this work, we investigated the opening and closing process of an RNA base pair under mechanical force with constant-force stretching molecular dynamics simulations. It was found that high mechanical force results in overstretching, and the open state is a high-energy state. The enthalpy and entropy change of the base-pair opening-closing transition were obtained and the results at low forces were in good agreement with the nearest-neighbor model. The temperature and force dependence of the opening and closing rates were also obtained. The position of the transition state for the base-pair opening-closing transition under mechanical force was determined. The free energy barrier of opening a base pair without force is the enthalpy increase, and the work done by the force from the closed state to the transition state decreases the barrier and increases the opening rate. The free energy barrier of closing the base pair without force results from the entropy loss, and the work done by the force from the open state to the transition state increases the barrier and decreases the closing rate. The transition rates are strongly dependent on the temperature and force, while the transition path times are weakly dependent on force and temperature.


Assuntos
Simulação de Dinâmica Molecular , RNA , Pareamento de Bases , Termodinâmica , Fenômenos Mecânicos , Cinética
6.
Molecules ; 28(5)2023 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-36903531

RESUMO

The subcellular localization of messenger RNA (mRNA) precisely controls where protein products are synthesized and where they function. However, obtaining an mRNA's subcellular localization through wet-lab experiments is time-consuming and expensive, and many existing mRNA subcellular localization prediction algorithms need to be improved. In this study, a deep neural network-based eukaryotic mRNA subcellular location prediction method, DeepmRNALoc, was proposed, utilizing a two-stage feature extraction strategy that featured bimodal information splitting and fusing for the first stage and a VGGNet-like CNN module for the second stage. The five-fold cross-validation accuracies of DeepmRNALoc in the cytoplasm, endoplasmic reticulum, extracellular region, mitochondria, and nucleus were 0.895, 0.594, 0.308, 0.944, and 0.865, respectively, demonstrating that it outperforms existing models and techniques.


Assuntos
Aprendizado Profundo , Eucariotos , Eucariotos/metabolismo , Proteínas/metabolismo , Retículo Endoplasmático/metabolismo , RNA Mensageiro , Biologia Computacional/métodos
7.
Molecules ; 27(23)2022 Nov 30.
Artigo em Inglês | MEDLINE | ID: mdl-36500451

RESUMO

Lysine-specific demethylase 1 (LSD1) is a histone-modifying enzyme, which is a significant target for anticancer drug research. In this work, 40 reported tetrahydroquinoline-derivative inhibitors targeting LSD1 were studied to establish the three-dimensional quantitative structure-activity relationship (3D-QSAR). The established models CoMFA (Comparative Molecular Field Analysis (q2 = 0.778, Rpred2 = 0.709)) and CoMSIA (Comparative Molecular Similarity Index Analysis (q2 = 0.764, Rpred2 = 0.713)) yielded good statistical and predictive properties. Based on the corresponding contour maps, seven novel tetrahydroquinoline derivatives were designed. For more information, three of the compounds (D1, D4, and Z17) and the template molecule 18x were explored with molecular dynamics simulations, binding free energy calculations by MM/PBSA method as well as the ADME (absorption, distribution, metabolism, and excretion) prediction. The results suggested that D1, D4, and Z17 performed better than template molecule 18x due to the introduction of the amino and hydrophobic groups, especially for the D1 and D4, which will provide guidance for the design of LSD1 inhibitors.


Assuntos
Antineoplásicos , Relação Quantitativa Estrutura-Atividade , Simulação de Acoplamento Molecular , Simulação de Dinâmica Molecular , Interações Hidrofóbicas e Hidrofílicas , Antineoplásicos/farmacologia , Desenho de Fármacos
8.
BMC Biol ; 20(1): 231, 2022 10 13.
Artigo em Inglês | MEDLINE | ID: mdl-36224580

RESUMO

BACKGROUND: Antarctica harbors the bulk of the species diversity of the dominant teleost fish suborder-Notothenioidei. However, the forces that shape their evolution are still under debate. RESULTS: We sequenced the genome of an icefish, Chionodraco hamatus, and used population genomics and demographic modelling of sequenced genomes of 52 C. hamatus individuals collected mainly from two East Antarctic regions to investigate the factors driving speciation. Results revealed four icefish populations with clear reproduction separation were established 15 to 50 kya (kilo years ago) during the last glacial maxima (LGM). Selection sweeps in genes involving immune responses, cardiovascular development, and photoperception occurred differentially among the populations and were correlated with population-specific microbial communities and acquisition of distinct morphological features in the icefish taxa. Population and species-specific antifreeze glycoprotein gene expansion and glacial cycle-paced duplication/degeneration of the zona pellucida protein gene families indicated fluctuating thermal environments and periodic influence of glacial cycles on notothenioid divergence. CONCLUSIONS: We revealed a series of genomic evidence indicating differential adaptation of C. hamatus populations and notothenioid species divergence in the extreme and unique marine environment. We conclude that geographic separation and adaptation to heterogeneous pathogen, oxygen, and light conditions of local habitats, periodically shaped by the glacial cycles, were the key drivers propelling species diversity in Antarctica.


Assuntos
Camada de Gelo , Perciformes , Animais , Regiões Antárticas , Peixes/genética , Genoma , Metagenômica , Oxigênio , Filogenia
9.
Molecules ; 26(24)2021 Dec 07.
Artigo em Inglês | MEDLINE | ID: mdl-34946497

RESUMO

An important reason of cancer proliferation is the change in DNA methylation patterns, characterized by the localized hypermethylation of the promoters of tumor-suppressor genes together with an overall decrease in the level of 5-methylcytosine (5mC). Therefore, identifying the 5mC sites in the promoters is a critical step towards further understanding the diverse functions of DNA methylation in genetic diseases such as cancers and aging. However, most wet-lab experimental techniques are often time consuming and laborious for detecting 5mC sites. In this study, we proposed a deep learning-based approach, called BiLSTM-5mC, for accurately identifying 5mC sites in genome-wide DNA promoters. First, we randomly divided the negative samples into 11 subsets of equal size, one of which can form the balance subset by combining with the positive samples in the same amount. Then, two types of feature vectors encoded by the one-hot method, and the nucleotide property and frequency (NPF) methods were fed into a bidirectional long short-term memory (BiLSTM) network and a full connection layer to train the 22 submodels. Finally, the outputs of these models were integrated to predict 5mC sites by using the majority vote strategy. Our experimental results demonstrated that BiLSTM-5mC outperformed existing methods based on the same independent dataset.


Assuntos
5-Metilcitosina/análise , Envelhecimento/metabolismo , DNA/genética , Aprendizado Profundo , Neoplasias/metabolismo , 5-Metilcitosina/metabolismo , Envelhecimento/genética , Metilação de DNA , Humanos , Memória de Curto Prazo , Neoplasias/genética , Regiões Promotoras Genéticas/genética
10.
Comput Math Methods Med ; 2021: 5770981, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34413898

RESUMO

Antioxidant proteins (AOPs) play important roles in the management and prevention of several human diseases due to their ability to neutralize excess free radicals. However, the identification of AOPs by using wet-lab experimental techniques is often time-consuming and expensive. In this study, we proposed an accurate computational model, called AOP-HMM, to predict AOPs by extracting discriminatory evolutionary features from hidden Markov model (HMM) profiles. First, auto cross-covariance (ACC) variables were applied to transform the HMM profiles into fixed-length feature vectors. Then, we performed the analysis of variance (ANOVA) method to reduce the dimensionality of the raw feature space. Finally, a support vector machine (SVM) classifier was adopted to conduct the prediction of AOPs. To comprehensively evaluate the performance of the proposed AOP-HMM model, the 10-fold cross-validation (CV), the jackknife CV, and the independent test were carried out on two widely used benchmark datasets. The experimental results demonstrated that AOP-HMM outperformed most of the existing methods and could be used to quickly annotate AOPs and guide the experimental process.


Assuntos
Antioxidantes/química , Aprendizado de Máquina , Peroxirredoxinas/química , Proteínas/química , Algoritmos , Aminoácidos/análise , Antioxidantes/classificação , Biologia Computacional , Bases de Dados de Proteínas/estatística & dados numéricos , Evolução Molecular , Humanos , Cadeias de Markov , Peroxirredoxinas/classificação , Proteínas/classificação
11.
Phys Rev E ; 103(4-1): 042409, 2021 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-34005973

RESUMO

Double stranded DNA can adopt different forms, the so-called A-, B-, and Z-DNA, which play different biological roles. In this work, the thermodynamic and the kinetic parameters for the base-pair closing and opening in A-DNA and B-DNA were calculated by all-atom molecular dynamics simulations at different temperatures. The thermodynamic parameters of the base pair in B-DNA were in good agreement with the experimental results. The free energy barrier of breaking a single base stack results from the enthalpy increase ΔH caused by the disruption of hydrogen bonding and base-stacking interactions, as well as water and base interactions. The free energy barrier of base pair closing comes from the unfavorable entropy loss ΔS caused by the restriction of torsional angles and hydration. It was found that the enthalpy change ΔH and the entropy change ΔS for the base pair in A-DNA are much larger than those in B-DNA, and the transition rates between the opening and the closing state for the base pair in A-DNA are much slower than those in B-DNA. The large difference of the enthalpy and entropy change for forming the base pair in A-DNA and B-DNA results from different hydration in A-DNA and B-DNA. The hydration pattern observed around DNA is an accompanying process for forming the base pair, rather than a follow-up of the conformation.


Assuntos
DNA Forma A , DNA de Forma B , Pareamento de Bases , Simulação de Dinâmica Molecular , Termodinâmica
12.
Molecules ; 26(9)2021 Apr 24.
Artigo em Inglês | MEDLINE | ID: mdl-33923273

RESUMO

Many gram-negative bacteria use type IV secretion systems to deliver effector molecules to a wide range of target cells. These substrate proteins, which are called type IV secreted effectors (T4SE), manipulate host cell processes during infection, often resulting in severe diseases or even death of the host. Therefore, identification of putative T4SEs has become a very active research topic in bioinformatics due to its vital roles in understanding host-pathogen interactions. PSI-BLAST profiles have been experimentally validated to provide important and discriminatory evolutionary information for various protein classification tasks. In the present study, an accurate computational predictor termed iT4SE-EP was developed for identifying T4SEs by extracting evolutionary features from the position-specific scoring matrix and the position-specific frequency matrix profiles. First, four types of encoding strategies were designed to transform protein sequences into fixed-length feature vectors based on the two profiles. Then, the feature selection technique based on the random forest algorithm was utilized to reduce redundant or irrelevant features without much loss of information. Finally, the optimal features were input into a support vector machine classifier to carry out the prediction of T4SEs. Our experimental results demonstrated that iT4SE-EP outperformed most of existing methods based on the independent dataset test.


Assuntos
Evolução Molecular , Bactérias Gram-Negativas/genética , Interações Hospedeiro-Patógeno/genética , Sistemas de Secreção Tipo IV/genética , Sequência de Aminoácidos/genética , Infecções Bacterianas/tratamento farmacológico , Infecções Bacterianas/genética , Infecções Bacterianas/microbiologia , Biologia Computacional , Bactérias Gram-Negativas/patogenicidade , Humanos , Sistemas de Secreção Tipo IV/química
13.
Comput Math Methods Med ; 2021: 6690299, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33505516

RESUMO

Identification of bacterial type III secreted effectors (T3SEs) has become a popular research topic in the field of bioinformatics due to its crucial role in understanding host-pathogen interaction and developing better therapeutic targets against the pathogens. However, the recognition of all effector proteins by using traditional experimental approaches is often time-consuming and laborious. Therefore, development of computational methods to accurately predict putative novel effectors is important in reducing the number of biological experiments for validation. In this study, we proposed a method, called iT3SE-PX, to identify T3SEs solely based on protein sequences. First, three kinds of features were extracted from the position-specific scoring matrix (PSSM) profiles to help train a machine learning (ML) model. Then, the extreme gradient boosting (XGBoost) algorithm was performed to rank these features based on their classification ability. Finally, the optimal features were selected as inputs to a support vector machine (SVM) classifier to predict T3SEs. Based on the two benchmark datasets, we conducted a 100-time randomized 5-fold cross validation (CV) and an independent test, respectively. The experimental results demonstrated that the proposed method achieved superior performance compared to most of the existing methods and could serve as a useful tool for identifying putative T3SEs, given only the sequence information.


Assuntos
Matrizes de Pontuação de Posição Específica , Máquina de Vetores de Suporte , Sistemas de Secreção Tipo III/classificação , Sistemas de Secreção Tipo III/genética , Algoritmos , Sequência de Aminoácidos , Biologia Computacional , Bases de Dados de Proteínas , Aprendizado de Máquina
14.
Biomed Res Int ; 2020: 7297631, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32352006

RESUMO

DNA-binding proteins (DBPs) play vital roles in all aspects of genetic activities. However, the identification of DBPs by using wet-lab experimental approaches is often time-consuming and laborious. In this study, we develop a novel computational method, called PredDBP-Stack, to predict DBPs solely based on protein sequences. First, amino acid composition (AAC) and transition probability composition (TPC) extracted from the hidden markov model (HMM) profile are adopted to represent a protein. Next, we establish a stacked ensemble model to identify DBPs, which involves two stages of learning. In the first stage, the four base classifiers are trained with the features of HMM-based compositions. In the second stage, the prediction probabilities of these base classifiers are used as inputs to the meta-classifier to perform the final prediction of DBPs. Based on the PDB1075 benchmark dataset, we conduct a jackknife cross validation with the proposed PredDBP-Stack predictor and obtain a balanced sensitivity and specificity of 92.47% and 92.36%, respectively. This outcome outperforms most of the existing classifiers. Furthermore, our method also achieves superior performance and model robustness on the PDB186 independent dataset. This demonstrates that the PredDBP-Stack is an effective classifier for accurately identifying DBPs based on protein sequence information alone.


Assuntos
Proteínas de Ligação a DNA/química , Bases de Dados de Proteínas , Software , Máquina de Vetores de Suporte
15.
Comput Math Methods Med ; 2020: 1384749, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32300371

RESUMO

Prediction of DNA-binding proteins (DBPs) has become a popular research topic in protein science due to its crucial role in all aspects of biological activities. Even though considerable efforts have been devoted to developing powerful computational methods to solve this problem, it is still a challenging task in the field of bioinformatics. A hidden Markov model (HMM) profile has been proved to provide important clues for improving the prediction performance of DBPs. In this paper, we propose a method, called HMMPred, which extracts the features of amino acid composition and auto- and cross-covariance transformation from the HMM profiles, to help train a machine learning model for identification of DBPs. Then, a feature selection technique is performed based on the extreme gradient boosting (XGBoost) algorithm. Finally, the selected optimal features are fed into a support vector machine (SVM) classifier to predict DBPs. The experimental results tested on two benchmark datasets show that the proposed method is superior to most of the existing methods and could serve as an alternative tool to identify DBPs.


Assuntos
Algoritmos , Proteínas de Ligação a DNA/química , Aprendizado de Máquina , Sequência de Aminoácidos , Aminoácidos/análise , Biologia Computacional , Proteínas de Ligação a DNA/genética , Bases de Dados de Proteínas/estatística & dados numéricos , Humanos , Cadeias de Markov , Curva ROC , Máquina de Vetores de Suporte
16.
RNA ; 26(4): 470-480, 2020 04.
Artigo em Inglês | MEDLINE | ID: mdl-31988191

RESUMO

Due to the polyanionic nature of RNAs, the structural folding of RNAs are sensitive to solution salt conditions, while there is still lack of a deep understanding of the salt effect on the thermodynamics and kinetics of RNAs at a single base-pair level. In this work, the thermodynamic and the kinetic parameters for the base-pair AU closing/opening at different salt concentrations were calculated by 3-µsec all-atom molecular dynamics (MD) simulations at different temperatures. It was found that for the base-pair formation, the enthalpy change [Formula: see text] is nearly independent of salt concentration, while the entropy change [Formula: see text] exhibits a linear dependence on the logarithm of salt concentration, verifying the empirical assumption based on thermodynamic experiments. Our analyses revealed that such salt concentration dependence of the entropy change mainly results from the dependence of ion translational entropy change for the base pair closing/opening on salt concentration. Furthermore, the closing rate increases with the increasing of salt concentration, while the opening rate is nearly independent of salt concentration. Additionally, our analyses revealed that the free energy surface for describing the base-pair opening and closing dynamics becomes more rugged with the decrease of salt concentration.


Assuntos
Simulação de Dinâmica Molecular , RNA/química , Pareamento de Bases , Concentração Osmolar , Cloreto de Sódio/química
17.
Int J Mol Sci ; 20(9)2019 May 11.
Artigo em Inglês | MEDLINE | ID: mdl-31083553

RESUMO

To reveal the working pattern of programmed cell death, knowledge of the subcellular location of apoptosis proteins is essential. Besides the costly and time-consuming method of experimental determination, research into computational locating schemes, focusing mainly on the innovation of representation techniques on protein sequences and the selection of classification algorithms, has become popular in recent decades. In this study, a novel tri-gram encoding model is proposed, which is based on using the protein overlapping property matrix (POPM) for predicting apoptosis protein subcellular location. Next, a 1000-dimensional feature vector is built to represent a protein. Finally, with the help of support vector machine-recursive feature elimination (SVM-RFE), we select the optimal features and put them into a support vector machine (SVM) classifier for predictions. The results of jackknife tests on two benchmark datasets demonstrate that our proposed method can achieve satisfactory prediction performance level with less computing capacity required and could work as a promising tool to predict the subcellular locations of apoptosis proteins.


Assuntos
Algoritmos , Proteínas Reguladoras de Apoptose/metabolismo , Apoptose , Aminoácidos/metabolismo , Bases de Dados de Proteínas , Transporte Proteico , Máquina de Vetores de Suporte
18.
RNA ; 25(5): 620-629, 2019 05.
Artigo em Inglês | MEDLINE | ID: mdl-30770397

RESUMO

The small interfering RNAs (siRNA) or microRNAs (miRNA) incorporated into the RNA-induced silencing complex with the Argonaute (Ago) protein associates with target mRNAs through base-pairing, which leads to the cleavage or knockdown of the target mRNA. The seed region of the s(m)iRNA is crucial for target recognition. In this work, a molecular dynamic simulation was utilized to study the thermodynamics and kinetic properties of the third seed base binding to the target in the presence of the PIWI/MID domain of Ago. The results showed that in the presence of the PIWI/MID domain, the entropy and enthalpy changes for the association of the seed base with the target were smaller than those in the absence of protein. The binding affinity was increased due to the reduced entropy penalty, which resulted from the preorganization of the seed base into the A-helix form. In the presence of the protein, the association barrier resulting from the unfavorable entropy loss and the dissociation barrier coming from the destruction of hydrogen bonding and base-stacking interactions were lower than those in the absence of the protein. These results indicate that the seed region is crucial for fast recognition and association with the correct target.


Assuntos
Proteínas Argonautas/química , Fatores de Iniciação em Eucariotos/química , MicroRNAs/química , Proteínas Argonautas/genética , Proteínas Argonautas/metabolismo , Sítios de Ligação , Cristalografia por Raios X , Fatores de Iniciação em Eucariotos/genética , Fatores de Iniciação em Eucariotos/metabolismo , Humanos , Ligação de Hidrogênio , Cinética , MicroRNAs/genética , MicroRNAs/metabolismo , Simulação de Dinâmica Molecular , Conformação de Ácido Nucleico , Ligação Proteica , Conformação Proteica em alfa-Hélice , Conformação Proteica em Folha beta , Domínios e Motivos de Interação entre Proteínas , Termodinâmica
19.
RNA ; 24(9): 1229-1240, 2018 09.
Artigo em Inglês | MEDLINE | ID: mdl-29954950

RESUMO

Hepatitis delta virus (HDV) ribozyme performs the self-cleavage activity through folding to a double pseudoknot structure. The folding of functional RNA structures is often coupled with the transcription process. In this work, we developed a new approach for predicting the cotranscriptional folding kinetics of RNA secondary structures with pseudoknots. We theoretically studied the cotranscriptional folding behavior of the 99-nucleotide (nt) HDV sequence, two upstream flanking sequences, and one downstream flanking sequence. During transcription, the 99-nt HDV can effectively avoid the trap intermediates and quickly fold to the cleavage-active state. It is different from its refolding kinetics, which folds into an intermediate trap state. For all the sequences, the ribozyme regions (from 1 to 73) all fold to the same structure during transcription. However, the existence of the 30-nt upstream flanking sequence can inhibit the ribozyme region folding into the active native state through forming an alternative helix Alt1 with the segments 70-90. The longer upstream flanking sequence of 54 nt itself forms a stable hairpin structure, which sequesters the formation of the Alt1 helix and leads to rapid formation of the cleavage-active structure. Although the 55-nt downstream flanking sequence could invade the already folded active structure during transcription by forming a more stable helix with the ribozyme region, the slow transition rate could keep the structure in the cleavage-active structure to perform the activity.


Assuntos
Vírus Delta da Hepatite/genética , RNA Catalítico/química , RNA Catalítico/genética , Transcrição Gênica , Domínio Catalítico , Vírus Delta da Hepatite/química , Cinética , Modelos Moleculares , Conformação de Ácido Nucleico , Dobramento de RNA , RNA Viral/química , RNA Viral/genética
20.
BMC Genomics ; 19(1): 315, 2018 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-29720106

RESUMO

BACKGROUND: Temperature adaptation of biological molecules is fundamental in evolutionary studies but remains unsolved. Fishes living in cold water are adapted to low temperatures through adaptive modification of their biological molecules, which enables their functioning in extreme cold. To study nucleotide and amino acid preference in cold-water fishes, we investigated the substitution asymmetry of codons and amino acids in protein-coding DNA sequences between cold-water fishes and tropical fishes., The former includes two Antarctic fishes, Dissostichus mawsoni (Antarctic toothfish), Gymnodraco acuticeps (Antarctic dragonfish), and two temperate fishes, Gadus morhua (Atlantic cod) and Gasterosteus aculeatus (stickleback), and the latter includes three tropical fishes, including Danio rerio (zebrafish), Oreochromis niloticus (Nile tilapia) and Xiphophorus maculatus (Platyfish). RESULTS: Cold-water fishes showed preference for Guanines and cytosines (GCs) in both synonymous and nonsynonymous codon substitution when compared with tropical fishes. Amino acids coded by GC-rich codons are favored in the temperate fishes, while those coded by AT-rich codons are disfavored. Similar trends were discovered in Antarctic fishes but were statistically weaker. The preference of GC rich codons in nonsynonymous substitution tends to increase ratio of small amino acid in proteins, which was demonstrated by biased small amino acid substitutions in the cold-water species when compared with the tropical species, especially in the temperate species. Prediction and comparison of secondary structure of the proteomes showed that frequency of random coils are significantly larger in the cold-water fish proteomes than those of the tropical fishes. CONCLUSIONS: Our results suggested that natural selection in cold temperature might favor biased GC content in the coding DNA sequences, which lead to increased frequency of small amino acids and consequently increased random coils in the proteomes of cold-water fishes.


Assuntos
Temperatura Baixa , Proteínas de Peixes/química , Proteínas de Peixes/genética , Peixes/genética , Sequência Rica em GC , Sequência de Aminoácidos , Substituição de Aminoácidos , Animais , Estrutura Secundária de Proteína/genética , Alinhamento de Sequência , Análise de Sequência de RNA
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA