Pesquisa | BVS - MINISTÉRIO DA SAÚDE

ProkDBP: Toward more precise identification of prokaryotic DNA binding proteins.

Pradhan, Upendra Kumar; Meher, Prabina Kumar; Naha, Sanchita; Das, Ritwika; Gupta, Ajit; Parsad, Rajender.

Protein Sci ; 33(6): e5015, 2024 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-38747369

RESUMO

Prokaryotic DNA binding proteins (DBPs) play pivotal roles in governing gene regulation, DNA replication, and various cellular functions. Accurate computational models for predicting prokaryotic DBPs hold immense promise in accelerating the discovery of novel proteins, fostering a deeper understanding of prokaryotic biology, and facilitating the development of therapeutics targeting for potential disease interventions. However, existing generic prediction models often exhibit lower accuracy in predicting prokaryotic DBPs. To address this gap, we introduce ProkDBP, a novel machine learning-driven computational model for prediction of prokaryotic DBPs. For prediction, a total of nine shallow learning algorithms and five deep learning models were utilized, with the shallow learning models demonstrating higher performance metrics compared to their deep learning counterparts. The light gradient boosting machine (LGBM), coupled with evolutionarily significant features selected via random forest variable importance measure (RF-VIM) yielded the highest five-fold cross-validation accuracy. The model achieved the highest auROC (0.9534) and auPRC (0.9575) among the 14 machine learning models evaluated. Additionally, ProkDBP demonstrated substantial performance with an independent dataset, exhibiting higher values of auROC (0.9332) and auPRC (0.9371). Notably, when benchmarked against several cutting-edge existing models, ProkDBP showcased superior predictive accuracy. Furthermore, to promote accessibility and usability, ProkDBP (https://iasri-sg.icar.gov.in/prokdbp/) is available as an online prediction tool, enabling free access to interested users. This tool stands as a significant contribution, enhancing the repertoire of resources for accurate and efficient prediction of prokaryotic DBPs.

Assuntos

Proteínas de Bactérias , Proteínas de Ligação a DNA , Aprendizado de Máquina , Algoritmos , Proteínas de Bactérias/química , Proteínas de Bactérias/metabolismo , Proteínas de Bactérias/genética , Biologia Computacional/métodos , Proteínas de Ligação a DNA/química , Proteínas de Ligação a DNA/metabolismo

RBProkCNN: Deep learning on appropriate contextual evolutionary information for RNA binding protein discovery in prokaryotes.

Pradhan, Upendra Kumar; Naha, Sanchita; Das, Ritwika; Gupta, Ajit; Parsad, Rajender; Meher, Prabina Kumar.

Comput Struct Biotechnol J ; 23: 1631-1640, 2024 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-38660008

RESUMO

RNA-binding proteins (RBPs) are central to key functions such as post-transcriptional regulation, mRNA stability, and adaptation to varied environmental conditions in prokaryotes. While the majority of research has concentrated on eukaryotic RBPs, recent developments underscore the crucial involvement of prokaryotic RBPs. Although computational methods have emerged in recent years to identify RBPs, they have fallen short in accurately identifying prokaryotic RBPs due to their generic nature. To bridge this gap, we introduce RBProkCNN, a novel machine learning-driven computational model meticulously designed for the accurate prediction of prokaryotic RBPs. The prediction process involves the utilization of eight shallow learning algorithms and four deep learning models, incorporating PSSM-based evolutionary features. By leveraging a convolutional neural network (CNN) and evolutionarily significant features selected through extreme gradient boosting variable importance measure, RBProkCNN achieved the highest accuracy in five-fold cross-validation, yielding 98.04% auROC and 98.19% auPRC. Furthermore, RBProkCNN demonstrated robust performance with an independent dataset, showcasing a commendable 95.77% auROC and 95.78% auPRC. Noteworthy is its superior predictive accuracy when compared to several state-of-the-art existing models. RBProkCNN is available as an online prediction tool (https://iasri-sg.icar.gov.in/rbprokcnn/), offering free access to interested users. This tool represents a substantial contribution, enriching the array of resources available for the accurate and efficient prediction of prokaryotic RBPs.

ASPTF: A computational tool to predict abiotic stress-responsive transcription factors in plants by employing machine learning algorithms.

Pradhan, Upendra Kumar; Mahapatra, Anuradha; Naha, Sanchita; Gupta, Ajit; Parsad, Rajender; Gahlaut, Vijay; Rath, Surya Narayan; Meher, Prabina Kumar.

Biochim Biophys Acta Gen Subj ; 1868(6): 130597, 2024 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-38490467

RESUMO

BACKGROUND: Abiotic stresses pose serious threat to the growth and yield of crop plants. Several studies suggest that in plants, transcription factors (TFs) are important regulators of gene expression, especially when it comes to coping with abiotic stresses. Therefore, it is crucial to identify TFs associated with abiotic stress response for breeding of abiotic stress tolerant crop cultivars. METHODS: Based on a machine learning framework, a computational model was envisaged to predict TFs associated with abiotic stress response in plants. To numerically encode TF sequences, four distinct sequence derived features were generated. The prediction was performed using ten shallow learning and four deep learning algorithms. For prediction using more pertinent and informative features, feature selection techniques were also employed. RESULTS: Using the features chosen by the light-gradient boosting machine-variable importance measure (LGBM-VIM), the LGBM achieved the highest cross-validation performance metrics (accuracy: 86.81%, auROC: 92.98%, and auPRC: 94.03%). Further evaluation of the proposed model (LGBM prediction method + LGBM-VIM selected features) was also done using an independent test dataset, where the accuracy, auROC and auPRC were observed 81.98%, 90.65% and 91.30%, respectively. CONCLUSIONS: To facilitate the adoption of the proposed strategy by users, the approach was implemented as a prediction server called ASPTF, accessible at https://iasri-sg.icar.gov.in/asptf/. The developed approach and the corresponding web application are anticipated to supplement experimental methods in the identification of transcription factors (TFs) responsive to abiotic stress in plants.

Assuntos

Aprendizado de Máquina , Estresse Fisiológico , Fatores de Transcrição , Fatores de Transcrição/metabolismo , Fatores de Transcrição/genética , Algoritmos , Regulação da Expressão Gênica de Plantas , Biologia Computacional/métodos , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Plantas/metabolismo , Plantas/genética

DBPMod: a supervised learning model for computational recognition of DNA-binding proteins in model organisms.

Pradhan, Upendra K; Meher, Prabina K; Naha, Sanchita; Sharma, Nitesh K; Agarwal, Aarushi; Gupta, Ajit; Parsad, Rajender.

Brief Funct Genomics ; 2023 Aug 31.

Artigo em Inglês | MEDLINE | ID: mdl-37651627

RESUMO

DNA-binding proteins (DBPs) play critical roles in many biological processes, including gene expression, DNA replication, recombination and repair. Understanding the molecular mechanisms underlying these processes depends on the precise identification of DBPs. In recent times, several computational methods have been developed to identify DBPs. However, because of the generic nature of the models, these models are unable to identify species-specific DBPs with higher accuracy. Therefore, a species-specific computational model is needed to predict species-specific DBPs. In this paper, we introduce the computational DBPMod method, which makes use of a machine learning approach to identify species-specific DBPs. For prediction, both shallow learning algorithms and deep learning models were used, with shallow learning models achieving higher accuracy. Additionally, the evolutionary features outperformed sequence-derived features in terms of accuracy. Five model organisms, including Caenorhabditis elegans, Drosophila melanogaster, Escherichia coli, Homo sapiens and Mus musculus, were used to assess the performance of DBPMod. Five-fold cross-validation and independent test set analyses were used to evaluate the prediction accuracy in terms of area under receiver operating characteristic curve (auROC) and area under precision-recall curve (auPRC), which was found to be ~89-92% and ~89-95%, respectively. The comparative results demonstrate that the DBPMod outperforms 12 current state-of-the-art computational approaches in identifying the DBPs for all five model organisms. We further developed the web server of DBPMod to make it easier for researchers to detect DBPs and is publicly available at https://iasri-sg.icar.gov.in/dbpmod/. DBPMod is expected to be an invaluable tool for discovering DBPs, supplementing the current experimental and computational methods.

RBPLight: a computational tool for discovery of plant-specific RNA-binding proteins using light gradient boosting machine and ensemble of evolutionary features.

Pradhan, Upendra K; Meher, Prabina K; Naha, Sanchita; Pal, Soumen; Gupta, Sagar; Gupta, Ajit; Parsad, Rajender.

Brief Funct Genomics ; 22(5): 401-410, 2023 11 10.

Artigo em Inglês | MEDLINE | ID: mdl-37158175

RESUMO

RNA-binding proteins (RBPs) are essential for post-transcriptional gene regulation in eukaryotes, including splicing control, mRNA transport and decay. Thus, accurate identification of RBPs is important to understand gene expression and regulation of cell state. In order to detect RBPs, a number of computational models have been developed. These methods made use of datasets from several eukaryotic species, specifically from mice and humans. Although some models have been tested on Arabidopsis, these techniques fall short of correctly identifying RBPs for other plant species. Therefore, the development of a powerful computational model for identifying plant-specific RBPs is needed. In this study, we presented a novel computational model for locating RBPs in plants. Five deep learning models and ten shallow learning algorithms were utilized for prediction with 20 sequence-derived and 20 evolutionary feature sets. The highest repeated five-fold cross-validation accuracy, 91.24% AU-ROC and 91.91% AU-PRC, was achieved by light gradient boosting machine. While evaluated using an independent dataset, the developed approach achieved 94.00% AU-ROC and 94.50% AU-PRC. The proposed model achieved significantly higher accuracy for predicting plant-specific RBPs as compared to the currently available state-of-art RBP prediction models. Despite the fact that certain models have already been trained and assessed on the model organism Arabidopsis, this is the first comprehensive computer model for the discovery of plant-specific RBPs. The web server RBPLight was also developed, which is publicly accessible at https://iasri-sg.icar.gov.in/rbplight/, for the convenience of researchers to identify RBPs in plants.

Assuntos

Arabidopsis , Humanos , Animais , Camundongos , Arabidopsis/genética , Arabidopsis/metabolismo , Algoritmos , Evolução Biológica , Proteínas de Ligação a RNA/genética , Proteínas de Ligação a RNA/química , Proteínas de Ligação a RNA/metabolismo , Biologia Computacional/métodos , Sítios de Ligação

ASLncR: a novel computational tool for prediction of abiotic stress-responsive long non-coding RNAs in plants.

Pradhan, Upendra Kumar; Meher, Prabina Kumar; Naha, Sanchita; Rao, Atmakuri Ramakrishna; Gupta, Ajit.

Funct Integr Genomics ; 23(2): 113, 2023 Mar 31.

Artigo em Inglês | MEDLINE | ID: mdl-37000299

RESUMO

Abiotic stresses are detrimental to plant growth and development and have a major negative impact on crop yields. A growing body of evidence indicates that a large number of long non-coding RNAs (lncRNAs) are key to many abiotic stress responses. Thus, identifying abiotic stress-responsive lncRNAs is essential in crop breeding programs in order to develop crop cultivars resistant to abiotic stresses. In this study, we have developed the first machine learning-based computational model for predicting abiotic stress-responsive lncRNAs. The lncRNA sequences which were responsive and non-responsive to abiotic stresses served as the two classes of the dataset for binary classification using the machine learning algorithms. The training dataset was created using 263 stress-responsive and 263 non-stress-responsive sequences, whereas the independent test set consists of 101 sequences from both classes. As the machine learning model can adopt only the numeric data, the Kmer features ranging from sizes 1 to 6 were utilized to represent lncRNAs in numeric form. To select important features, four different feature selection strategies were utilized. Among the seven learning algorithms, the support vector machine (SVM) achieved the highest cross-validation accuracy with the selected feature sets. The observed 5-fold cross-validation accuracy, AU-ROC, and AU-PRC were found to be 68.84, 72.78, and 75.86%, respectively. Furthermore, the robustness of the developed model (SVM with the selected feature) was evaluated using an independent test dataset, where the overall accuracy, AU-ROC, and AU-PRC were found to be 76.23, 87.71, and 88.49%, respectively. The developed computational approach was also implemented in an online prediction tool ASLncR accessible at https://iasri-sg.icar.gov.in/aslncr/ . The proposed computational model and the developed prediction tool are believed to supplement the existing effort for the identification of abiotic stress-responsive lncRNAs in plants.

Assuntos

RNA Longo não Codificante , RNA Longo não Codificante/genética , Biologia Computacional , Melhoramento Vegetal , Algoritmos , Plantas/genética , Estresse Fisiológico/genética

ASmiR: a machine learning framework for prediction of abiotic stress-specific miRNAs in plants.

Pradhan, Upendra Kumar; Meher, Prabina Kumar; Naha, Sanchita; Rao, Atmakuri Ramakrishna; Kumar, Upendra; Pal, Soumen; Gupta, Ajit.

Funct Integr Genomics ; 23(2): 92, 2023 Mar 20.

Artigo em Inglês | MEDLINE | ID: mdl-36939943

RESUMO

Abiotic stresses have become a major challenge in recent years due to their pervasive nature and shocking impacts on plant growth, development, and quality. MicroRNAs (miRNAs) play a significant role in plant response to different abiotic stresses. Thus, identification of specific abiotic stress-responsive miRNAs holds immense importance in crop breeding programmes to develop cultivars resistant to abiotic stresses. In this study, we developed a machine learning-based computational model for prediction of miRNAs associated with four specific abiotic stresses such as cold, drought, heat and salt. The pseudo K-tuple nucleotide compositional features of Kmer size 1 to 5 were used to represent miRNAs in numeric form. Feature selection strategy was employed to select important features. With the selected feature sets, support vector machine (SVM) achieved the highest cross-validation accuracy in all four abiotic stress conditions. The highest cross-validated prediction accuracies in terms of area under precision-recall curve were found to be 90.15, 90.09, 87.71, and 89.25% for cold, drought, heat and salt respectively. Overall prediction accuracies for the independent dataset were respectively observed 84.57, 80.62, 80.38 and 82.78%, for the abiotic stresses. The SVM was also seen to outperform different deep learning models for prediction of abiotic stress-responsive miRNAs. To implement our method with ease, an online prediction server "ASmiR" has been established at https://iasri-sg.icar.gov.in/asmir/ . The proposed computational model and the developed prediction tool are believed to supplement the existing effort for identification of specific abiotic stress-responsive miRNAs in plants.

Assuntos

MicroRNAs , MicroRNAs/genética , Melhoramento Vegetal , Plantas/genética , Aprendizado de Máquina , Cloreto de Sódio , Estresse Fisiológico/genética , Regulação da Expressão Gênica de Plantas

PlDBPred: a novel computational model for discovery of DNA binding proteins in plants.

Pradhan, Upendra Kumar; Meher, Prabina Kumar; Naha, Sanchita; Pal, Soumen; Gupta, Ajit; Parsad, Rajender.

Brief Bioinform ; 24(1)2023 01 19.

Artigo em Inglês | MEDLINE | ID: mdl-36416116

RESUMO

DNA-binding proteins (DBPs) play crucial roles in numerous cellular processes including nucleotide recognition, transcriptional control and the regulation of gene expression. Majority of the existing computational techniques for identifying DBPs are mainly applicable to human and mouse datasets. Even though some models have been tested on Arabidopsis, they produce poor accuracy when applied to other plant species. Therefore, it is imperative to develop an effective computational model for predicting plant DBPs. In this study, we developed a comprehensive computational model for plant specific DBPs identification. Five shallow learning and six deep learning models were initially used for prediction, where shallow learning methods outperformed deep learning algorithms. In particular, support vector machine achieved highest repeated 5-fold cross-validation accuracy of 94.0% area under receiver operating characteristic curve (AUC-ROC) and 93.5% area under precision recall curve (AUC-PR). With an independent dataset, the developed approach secured 93.8% AUC-ROC and 94.6% AUC-PR. While compared with the state-of-art existing tools by using an independent dataset, the proposed model achieved much higher accuracy. Overall results suggest that the developed computational model is more efficient and reliable as compared to the existing models for the prediction of DBPs in plants. For the convenience of the majority of experimental scientists, the developed prediction server PlDBPred is publicly accessible at https://iasri-sg.icar.gov.in/pldbpred/.The source code is also provided at https://iasri-sg.icar.gov.in/pldbpred/source_code.php for prediction using a large-size dataset.

Assuntos

Arabidopsis , Proteínas de Ligação a DNA , Algoritmos , Arabidopsis/genética , Arabidopsis/metabolismo , Biologia Computacional/métodos , Simulação por Computador , Proteínas de Ligação a DNA/genética , Proteínas de Ligação a DNA/metabolismo , Curva ROC , Software

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA