Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 404
Filtrar
1.
Appl Bionics Biomech ; 2022: 5483115, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35465187

RESUMO

In the domain of genome annotation, the identification of DNA-binding protein is one of the crucial challenges. DNA is considered a blueprint for the cell. It contained all necessary information for building and maintaining the trait of an organism. It is DNA, which makes a living thing, a living thing. Protein interaction with DNA performs an essential role in regulating DNA functions such as DNA repair, transcription, and regulation. Identification of these proteins is a crucial task for understanding the regulation of genes. Several methods have been developed to identify the binding sites of DNA and protein depending upon the structures and sequences, but they were costly and time-consuming. Therefore, we propose a methodology named "DNAPred_Prot", which uses various position and frequency-dependent features from protein sequences for efficient and effective prediction of DNA-binding proteins. Using testing techniques like 10-fold cross-validation and jackknife testing an accuracy of 94.95% and 95.11% was yielded, respectively. The results of SVM and ANN were also compared with those of a random forest classifier. The robustness of the proposed model was evaluated by using the independent dataset PDB186, and an accuracy of 91.47% was achieved by it. From these results, it can be predicted that the suggested methodology performs better than other extant methods for the identification of DNA-binding proteins.

2.
IEEE/ACM Trans Comput Biol Bioinform ; 18(5): 2045-2056, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-31985438

RESUMO

Glycosylation of proteins in eukaryote cells is an important and complicated post-translation modification due to its pivotal role and association with crucial physiological functions within most of the proteins. Identification of glycosylation sites in a polypeptide chain is not an easy task due to multiple impediments. Analytical identification of these sites is expensive and laborious. There is a dire need to develop a reliable computational method for precise determination of such sites which can help researchers to save time and effort. Herein, we propose a novel predictor namely iGlycoS-PseAAC by integrating the Chou's Pseudo Amino Acid Composition (PseAAC) and relative/absolute position-based features. The self-consistency results show that the accuracy revealed by the model using the benchmark dataset for prediction of O-linked glycosylation having serine sites is 98.8 percent. The overall accuracy of predictor achieved through 10-fold cross validation by combining the positive and negative results is 97.2 percent. The overall accuracy achieved through Jackknife test is 96.195 percent by aggregating of all the prediction results. Thus the proposed predictor can help in predicting the O-linked glycosylated serine sites in an efficient and accurate way. The overall results show that the accuracy of the iGlycoS-PseAAC is higher than the existing tools.


Assuntos
Biologia Computacional/métodos , Glicoproteínas , Serina , Algoritmos , Glicoproteínas/química , Glicoproteínas/metabolismo , Glicosilação , Processamento de Proteína Pós-Traducional/fisiologia , Serina/química , Serina/metabolismo
3.
Artigo em Inglês | MEDLINE | ID: mdl-31144645

RESUMO

Protein phosphorylation is one of the key mechanism in prokaryotes and eukaryotes and is responsible for various biological functions such as protein degradation, intracellular localization, the multitude of cellular processes, molecular association, cytoskeletal dynamics, and enzymatic inhibition/activation. Phosphohistidine (PhosH) has a key role in a number of biological processes, including central metabolism to signalling in eukaryotes and bacteria. Thus, identification of phosphohistidine sites in a protein sequence is crucial, and experimental identification can be expensive, time-taking, and laborious. To address this problem, here, we propose a novel computational model namely iPhosH-PseAAC for prediction of phosphohistidine sites in a given protein sequence using pseudo amino acid composition (PseAAC), statistical moments, and position relative features. The results of the proposed predictor are validated through self-consistency testing, 10-fold cross-validation, and jackknife testing. The self-consistency validation gave the 100 percent accuracy, whereas, for cross-validation, the accuracy achieved is 94.26 percent. Moreover, jackknife testing gave 97.07 percent accuracy for the proposed model. Thus, the proposed model iPhosH-PseAAC for prediction of iPhosH site has the great ability to predict the PhosH sites in given proteins.


Assuntos
Biologia Computacional/métodos , Histidina/análogos & derivados , Redes Neurais de Computação , Proteínas/química , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Histidina/química , Modelos Estatísticos , Fosforilação
4.
Curr Genomics ; 21(7): 536-545, 2020 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-33214770

RESUMO

INTRODUCTION: Hydroxylation is one of the most important post-translational modifications (PTM) in cellular functions and is linked to various diseases. The addition of one of the hydroxyl groups (OH) to the lysine sites produces hydroxylysine when undergoes chemical modification. METHODS: The method which is used in this study for identifying hydroxylysine sites based on powerful mathematical and statistical methodology incorporating the sequence-order effect and composition of each object within protein sequences. This predictor is called "iHyd-LysSite (EPSV)" (identifying hydroxylysine sites by extracting enhanced position and sequence variant technique). The prediction of hydroxylysine sites by experimental methods is difficult, laborious and highly expensive. In silico technique is an alternative approach to identify hydroxylysine sites in proteins. RESULTS: The experimental results require that the predictive model should have high sensitivity and specificity values and must be more accurate. The self-consistency, independent, 10-fold cross-validation and jackknife tests are performed for validation purposes. These tests are resulted by using three renowned classifiers, Neural Networks (NN), Random Forest (RF) and Support Vector Machine (SVM) with the demanding prediction rate. The overall predictive outcomes are extraordinarily superior to the results obtained by previous predictors. The proposed model contributed an excellent prediction rate in the system for NN, RF, and SVM classifiers. The sensitivity and specificity results using all these classifiers for jackknife test are 96.08%, 94.99%, 98.16% and 97.52%, 98.52%, 80.95%. CONCLUSION: The results obtained by the proposed tool show that this method may meet the future demand of hydroxylysine sites with a better prediction rate over the existing methods.

7.
Mol Genet Genomics ; 295(2): 261-274, 2020 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-31894399

RESUMO

Facing the explosive growth of biological sequences unearthed in the post-genomic age, one of the most important but also most difficult problems in computational biology is how to express a biological sequence with a discrete model or a vector, but still keep it with considerable sequence-order information or its special pattern. To deal with such a challenging problem, the ideas of "pseudo amino acid components" and "pseudo K-tuple nucleotide composition" have been proposed. The ideas and their approaches have further stimulated the birth for "distorted key theory", "wenxing diagram", and substantially strengthening the power in treating the multi-label systems, as well as the establishment of the famous "5-steps rule". All these logic developments are quite natural that are very useful not only for theoretical scientists but also for experimental scientists in conducting genetics/genomics analysis and drug development. Presented in this review paper are also their future perspectives; i.e., their impacts will become even more significant and propounding.


Assuntos
Desenvolvimento de Medicamentos/tendências , Genoma Humano/genética , Genômica/tendências , Algoritmos , Biologia Computacional/tendências , Humanos , Software
8.
Genomics ; 112(1): 837-847, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-31150762

RESUMO

BACKGROUND: Glioma is the most lethal nervous system cancer. Recent studies have made great efforts to study the occurrence and development of glioma, but the molecular mechanisms are still unclear. This study was designed to reveal the molecular mechanisms of glioma based on protein-protein interaction network combined with machine learning methods. Key differentially expressed genes (DEGs) were screened and selected by using the protein-protein interaction (PPI) networks. RESULTS: As a result, 19 genes between grade I and grade II, 21 genes between grade II and grade III, and 20 genes between grade III and grade IV. Then, five machine learning methods were employed to predict the gliomas stages based on the selected key genes. After comparison, Complement Naive Bayes classifier was employed to build the prediction model for grade II-III with accuracy 72.8%. And Random forest was employed to build the prediction model for grade I-II and grade III-VI with accuracy 97.1% and 83.2%, respectively. Finally, the selected genes were analyzed by PPI networks, Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, and the results improve our understanding of the biological functions of select DEGs involved in glioma growth. We expect that the key genes expressed have a guiding significance for the occurrence of gliomas or, at the very least, that they are useful for tumor researchers. CONCLUSION: Machine learning combined with PPI networks, GO and KEGG analyses of selected DEGs improve our understanding of the biological functions involved in glioma growth.


Assuntos
Neoplasias Encefálicas/genética , Neoplasias Encefálicas/metabolismo , Glioma/genética , Glioma/metabolismo , Aprendizado de Máquina , Mapeamento de Interação de Proteínas , Neoplasias Encefálicas/diagnóstico , Expressão Gênica , Ontologia Genética , Glioma/diagnóstico , Estadiamento de Neoplasias
9.
Brief Bioinform ; 21(3): 1047-1057, 2020 05 21.
Artigo em Inglês | MEDLINE | ID: mdl-31067315

RESUMO

With the explosive growth of biological sequences generated in the post-genomic era, one of the most challenging problems in bioinformatics and computational biology is to computationally characterize sequences, structures and functions in an efficient, accurate and high-throughput manner. A number of online web servers and stand-alone tools have been developed to address this to date; however, all these tools have their limitations and drawbacks in terms of their effectiveness, user-friendliness and capacity. Here, we present iLearn, a comprehensive and versatile Python-based toolkit, integrating the functionality of feature extraction, clustering, normalization, selection, dimensionality reduction, predictor construction, best descriptor/model selection, ensemble learning and results visualization for DNA, RNA and protein sequences. iLearn was designed for users that only want to upload their data set and select the functions they need calculated from it, while all necessary procedures and optimal settings are completed automatically by the software. iLearn includes a variety of descriptors for DNA, RNA and proteins, and four feature output formats are supported so as to facilitate direct output usage or communication with other computational tools. In total, iLearn encompasses 16 different types of feature clustering, selection, normalization and dimensionality reduction algorithms, and five commonly used machine-learning algorithms, thereby greatly facilitating feature analysis and predictor construction. iLearn is made freely available via an online web server and a stand-alone toolkit.


Assuntos
DNA/química , Aprendizado de Máquina , Proteínas/química , RNA/química , Análise de Sequência/métodos , Algoritmos , Internet
10.
Protein Pept Lett ; 27(3): 178-186, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-31577193

RESUMO

BACKGROUND: N-Glycosylation is one of the most important post-translational mechanisms in eukaryotes. N-glycosylation predominantly occurs in N-X-[S/T] sequon where X is any amino acid other than proline. However, not all N-X-[S/T] sequons in proteins are glycosylated. Therefore, accurate prediction of N-glycosylation sites is essential to understand Nglycosylation mechanism. OBJECTIVE: In this article, our motivation is to develop a computational method to predict Nglycosylation sites in eukaryotic protein sequences. METHODS: In this article, we report a random forest method, Nglyc, to predict N-glycosylation site from protein sequence, using 315 sequence features. The method was trained using a dataset of 600 N-glycosylation sites and 600 non-glycosylation sites and tested on the dataset containing 295 Nglycosylation sites and 253 non-glycosylation sites. Nglyc prediction was compared with NetNGlyc, EnsembleGly and GPP methods. Further, the performance of Nglyc was evaluated using human and mouse N-glycosylation sites. RESULT: Nglyc method achieved an overall training accuracy of 0.8033 with all 315 features. Performance comparison with NetNGlyc, EnsembleGly and GPP methods shows that Nglyc performs better than the other methods with high sensitivity and specificity rate. CONCLUSION: Our method achieved an overall accuracy of 0.8248 with 0.8305 sensitivity and 0.8182 specificity. Comparison study shows that our method performs better than the other methods. Applicability and success of our method was further evaluated using human and mouse N-glycosylation sites. Nglyc method is freely available at https://github.com/bioinformaticsML/ Ngly.


Assuntos
Biologia Computacional/métodos , Proteínas/química , Análise de Sequência de Proteína/métodos , Animais , Bases de Dados de Proteínas , Glicosilação , Humanos , Camundongos , Software
11.
Anal Biochem ; 588: 113477, 2020 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-31654612

RESUMO

Proteases are a type of enzymes, which perform the process of proteolysis. Proteolysis normally refers to protein and peptide degradation which is crucial for the survival, growth and wellbeing of a cell. Moreover, proteases have a strong association with therapeutics and drug development. The proteases are classified into five different types according to their nature and physiochemical characteristics. Mostly the methods used to differentiate protease from other proteins and identify their class requires a clinical test which is usually time-consuming and operator dependent. Herein, we report a classifier named iProtease-PseAAC (2L) for identifying proteases and their classes. The predictor is developed employing the flow of 5-step rule, initiating from the collection of benchmark dataset and terminating at the development of predictor. Rigorous verification and validation tests are performed and metrics are collected to calculate the authenticity of the trained model. The self-consistency validation gives the 98.32% accuracy, for cross-validation the accuracy is 90.71% and jackknife gives 96.07% accuracy. The average accuracy for level-2 i.e. protease classification is 95.77%. Based on the above-mentioned results, it is concluded that iProtease-PseAAC (2L) has the great ability to identify the proteases and their classes using a given protein sequence.


Assuntos
Algoritmos , Biologia Computacional/métodos , Peptídeo Hidrolases/classificação , Proteínas/classificação , Software , Bases de Dados de Proteínas
12.
Curr Pharm Des ; 25(40): 4223-4234, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31782354

RESUMO

OBJECTIVE: One of the most challenging and also the most difficult problems is how to formulate a biological sequence with a vector but considerably keep its sequence order information. METHODS: To address such a problem, the approach of Pseudo Amino Acid Components or PseAAC has been developed. RESULTS AND CONCLUSION: It has become increasingly clear via the 10-year recollection that the aforementioned proposal has been indeed very powerful.


Assuntos
Algoritmos , Biologia Computacional , Análise de Sequência de Proteína/métodos , Aminoácidos , Conformação Proteica
13.
Curr Top Med Chem ; 19(25): 2283-2300, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31648642

RESUMO

Stimulated by the 5-steps rule during the last decade or so, computational proteomics has achieved remarkable progresses in the following three areas: (1) protein structural class prediction; (2) protein subcellular location prediction; (3) post-translational modification (PTM) site prediction. The results obtained by these predictions are very useful not only for an in-depth study of the functions of proteins and their biological processes in a cell, but also for developing novel drugs against major diseases such as cancers, Alzheimer's, and Parkinson's. Moreover, since the targets to be predicted may have the multi-label feature, two sets of metrics are introduced: one is for inspecting the global prediction quality, while the other for the local prediction quality. All the predictors covered in this review have a userfriendly web-server, through which the majority of experimental scientists can easily obtain their desired data without the need to go through the complicated mathematics.


Assuntos
Aminoácidos/análise , Biologia Computacional , Proteoma , Proteômica , Aminoácidos/metabolismo , Animais , Humanos , Processamento de Proteína Pós-Traducional , Software
14.
Genomics ; 2019 Aug 30.
Artigo em Inglês | MEDLINE | ID: mdl-31476433

RESUMO

This article has been withdrawn at the request of the Editor-in-Chief. After a thorough investigation, the Editor has concluded that the acceptance of this article was partly based upon the positive advice of two illegitimate reviewer reports. The reports were submitted from email accounts which were provided to the journal as suggested reviewers during the submission of the article. Although purportedly real reviewer accounts, the Editor has concluded that these were not of appropriate, independent reviewers. Also, the article duplicates significant parts of a paper that had already appeared in Current Medicinal Chemistry 26 (2019) 4918-4943 https://doi.org/10.2174/0929867326666190507082559. This represents a clear violation of the fundamentals of peer review, our publishing policies, and publishing ethics standards. Apologies are offered to the readers of the journal that this was not detected during the submission process. The full Elsevier Policy on Article Withdrawal can be found at https://www.elsevier.com/about/our-business/policies/article-withdrawal.

15.
Curr Genomics ; 20(2): 124-133, 2019 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-31555063

RESUMO

BACKGROUND: In various biological processes and cell functions, Post Translational Modifications (PTMs) bear critical significance. Hydroxylation of proline residue is one kind of PTM, which occurs following protein synthesis. The experimental determination of hydroxyproline sites in an uncharacterized protein sequence requires extensive, time-consuming and expensive tests. METHODS: With the torrential slide of protein sequences produced in the post-genomic age, certain remarkable computational strategies are desired to overwhelm the issue. Keeping in view the composition and sequence order effect within polypeptide chains, an innovative in-silico> predictor via a mathematical model is proposed. RESULTS: Later, it was stringently verified using self-consistency, cross-validation and jackknife tests on benchmark datasets. It was established after a rigorous jackknife test that the new predictor values are superior to the values predicted by previous methodologies. CONCLUSION: This new mathematical technique is the most appropriate and encouraging as compared with the existing models.

16.
Genomics ; 2019 Sep 05.
Artigo em Inglês | MEDLINE | ID: mdl-31494196

RESUMO

During the last three decades or so, many efforts have been made to study the protein cleavage sites by some disease-causing enzyme, such as HIV (human immunodeficiency virus) protease and SARS (severe acute respiratory syndrome) coronavirus main proteinase. It has become increasingly clear via this minireview that the motivation driving the aforementioned studies is quite wise, and that the results acquired through these studies are very rewarding, particularly for developing peptide drugs.

17.
Genomics ; 2019 Aug 28.
Artigo em Inglês | MEDLINE | ID: mdl-31472241

RESUMO

This article has been withdrawn at the request of the Editor-in-Chief. After a thorough investigation, the Editor has concluded that the acceptance of this article was partly based upon the positive advice of two illegitimate reviewer reports. The reports were submitted from email accounts which were provided to the journal as suggested reviewers during the submission of the article. Although purportedly real reviewer accounts, the Editor has concluded that these were not of appropriate, independent reviewers. Also, the article duplicates significant parts of a paper that had already appeared in Current Medicinal Chemistry 26 (2019) 4918-4943 https://doi.org/10.2174/0929867326666190507082559. This represents a clear violation of the fundamentals of peer review, our publishing policies, and publishing ethics standards. Apologies are offered to the readers of the journal that this was not detected during the submission process. The full Elsevier Policy on Article Withdrawal can be found at http://www.elsevier.com/locate/withdrawalpolicy.

18.
Curr Med Chem ; 26(26): 4918-4943, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31060481

RESUMO

The smallest unit of life is a cell, which contains numerous protein molecules. Most of the functions critical to the cell's survival are performed by these proteins located in its different organelles, usually called ''subcellular locations". Information of subcellular localization for a protein can provide useful clues about its function. To reveal the intricate pathways at the cellular level, knowledge of the subcellular localization of proteins in a cell is prerequisite. Therefore, one of the fundamental goals in molecular cell biology and proteomics is to determine the subcellular locations of proteins in an entire cell. It is also indispensable for prioritizing and selecting the right targets for drug development. Unfortunately, it is both timeconsuming and costly to determine the subcellular locations of proteins purely based on experiments. With the avalanche of protein sequences generated in the post-genomic age, it is highly desired to develop computational methods for rapidly and effectively identifying the subcellular locations of uncharacterized proteins based on their sequences information alone. Actually, considerable progresses have been achieved in this regard. This review is focused on those methods, which have the capacity to deal with multi-label proteins that may simultaneously exist in two or more subcellular location sites. Protein molecules with this kind of characteristic are vitally important for finding multi-target drugs, a current hot trend in drug development. Focused in this review are also those methods that have use-friendly web-servers established so that the majority of experimental scientists can use them to get the desired results without the need to go through the detailed mathematics involved.

19.
J Theor Biol ; 471: 74-81, 2019 06 21.
Artigo em Inglês | MEDLINE | ID: mdl-30928350

RESUMO

The humanized cytotoxic T lymphocyte-associated antigen 4 immunoglobulin (CTLA-4-Ig) has been used to treat Lupus nephritis (LN) based on CTLA-4s negative regulation of T-cell activation through competent to binding with CD80/CD86, the inherent genetic factors influencing the CTLA-4-Ig treatment efficacy are widely unknown. Here, 62 nonsynonymous single nucleotide variants (nsSNVs) of CTLA-4 gene, 184 of CD80 and 201 of CD86 were identified and validated within both EMBL-EBI and dbSNP databases. Next, the nsSNVs rs1466152724 in CTLA-4, rs1196816748, rs765515058, rs1157880125, rs1022857991, and rs142547094 in CD80 and rs1203132714 in CD86 were consistently suggested to be deleterious by SIFT, PolyPhen-2, PROVEAN and meta LR. Based on the 3D structure stability analysis, the variant rs765515058 causing G167V in CD80 was found to reduce the protein's stability through changing the characters of constructed structure of complete CD80 apo form and stabilizing amino acid residues of CD80 holo form in a great degree. Furthermore, the interaction energy analysis results suggested that rs1022857991 causing C50F may reduce the binding energy of CTLA-4 with CD80. Along with the increasing variants, these nsSNVs' effects on the interaction of CTLA-4 with CD80/CD86 will increase, and thus influence the CTLA-4-Ig treatment efficacy against LN.


Assuntos
Abatacepte , Antígeno B7-1 , Antígeno B7-2 , Antígeno CTLA-4 , Simulação por Computador , Nefrite Lúpica/tratamento farmacológico , Abatacepte/química , Abatacepte/genética , Abatacepte/uso terapêutico , Antígeno B7-1/química , Antígeno B7-1/genética , Antígeno B7-2/química , Antígeno B7-2/genética , Antígeno CTLA-4/química , Antígeno CTLA-4/genética , Humanos
20.
BMC Bioinformatics ; 20(1): 112, 2019 Mar 06.
Artigo em Inglês | MEDLINE | ID: mdl-30841845

RESUMO

BACKGROUND: As an important type of post-translational modification (PTM), protein glycosylation plays a crucial role in protein stability and protein function. The abundance and ubiquity of protein glycosylation across three domains of life involving Eukarya, Bacteria and Archaea demonstrate its roles in regulating a variety of signalling and metabolic pathways. Mutations on and in the proximity of glycosylation sites are highly associated with human diseases. Accordingly, accurate prediction of glycosylation can complement laboratory-based methods and greatly benefit experimental efforts for characterization and understanding of functional roles of glycosylation. For this purpose, a number of supervised-learning approaches have been proposed to identify glycosylation sites, demonstrating a promising predictive performance. To train a conventional supervised-learning model, both reliable positive and negative samples are required. However, in practice, a large portion of negative samples (i.e. non-glycosylation sites) are mislabelled due to the limitation of current experimental technologies. Moreover, supervised algorithms often fail to take advantage of large volumes of unlabelled data, which can aid in model learning in conjunction with positive samples (i.e. experimentally verified glycosylation sites). RESULTS: In this study, we propose a positive unlabelled (PU) learning-based method, PA2DE (V2.0), based on the AlphaMax algorithm for protein glycosylation site prediction. The predictive performance of this proposed method was evaluated by a range of glycosylation data collected over a ten-year period based on an interval of three years. Experiments using both benchmarking and independent tests show that our method outperformed the representative supervised-learning algorithms (including support vector machines and random forests) and one-class learners, as well as currently available prediction methods in terms of F1 score, accuracy and AUC measures. In addition, we developed an online web server as an implementation of the optimized model (available at http://glycomine.erc.monash.edu/Lab/GlycoMine_PU/ ) to facilitate community-wide efforts for accurate prediction of protein glycosylation sites. CONCLUSION: The proposed PU learning approach achieved a competitive predictive performance compared with currently available methods. This PU learning schema may also be effectively employed and applied to address the prediction problems of other important types of protein PTM site and functional sites.


Assuntos
Biologia Computacional/métodos , Proteoma/metabolismo , Coloração e Rotulagem , Bases de Dados de Proteínas , Glicosilação , Humanos , Processamento de Proteína Pós-Traducional , Curva ROC , Máquina de Vetores de Suporte
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA