Search | VHL Regional Portal

DeepPRMS: advanced deep learning model to predict protein arginine methylation sites.

Khandelwal, Monika; Kumar Rout, Ranjeet.

Brief Funct Genomics ; 23(4): 452-463, 2024 Jul 19.

Article in English | MEDLINE | ID: mdl-38267081

ABSTRACT

Protein methylation is a form of post-translational modifications of protein, which is crucial for various cellular processes, including transcription activity and DNA repair. Correctly predicting protein methylation sites is fundamental for research and drug discovery. Some experimental techniques, such as methyl-specific antibodies, chromatin immune precipitation and mass spectrometry, exist for predicting protein methylation sites, but these techniques are time-consuming and costly. The ability to predict methylation sites using in silico techniques may help researchers identify potential candidate sites for future examination and make it easier to carry out site-specific investigations and downstream characterizations. In this research, we proposed a novel deep learning-based predictor, named DeepPRMS, to identify protein methylation sites in primary sequences. The DeepPRMS utilizes the gated recurrent unit (GRU) and convolutional neural network (CNN) algorithms to extract the sequential and spatial information from the primary sequences. GRU is used to extract sequential information, while CNN is used for spatial information. We combined the latent representation of GRU and CNN models to have a better interaction among them. Based on the independent test data set, DeepPRMS obtained an accuracy of 85.32%, a specificity of 84.94%, Matthew's correlation coefficient of 0.71 and a sensitivity of 85.80%. The results indicate that DeepPRMS can predict protein methylation sites with high accuracy and outperform the state-of-the-art models. The DeepPRMS is expected to effectively guide future research experiments for identifying potential methylated protein sites. The web server is available at http://deepprms.nitsri.ac.in/.

Subject(s)

Arginine , Deep Learning , Methylation , Arginine/metabolism , Protein Processing, Post-Translational , Proteins/metabolism , Proteins/chemistry , Neural Networks, Computer , Algorithms , Computational Biology/methods , Humans

PRMxAI: protein arginine methylation sites prediction based on amino acid spatial distribution using explainable artificial intelligence.

Khandelwal, Monika; Rout, Ranjeet Kumar.

BMC Bioinformatics ; 24(1): 376, 2023 Oct 04.

Article in English | MEDLINE | ID: mdl-37794362

ABSTRACT

BACKGROUND: Protein methylation, a post-translational modification, is crucial in regulating various cellular functions. Arginine methylation is required to understand crucial biochemical activities and biological functions, like gene regulation, signal transduction, etc. However, some experimental methods, including Chip-Chip, mass spectrometry, and methylation-specific antibodies, exist for the prediction of methylated proteins. These experimental methods are expensive and tedious. As a result, computational methods based on machine learning play an efficient role in predicting arginine methylation sites. RESULTS: In this research, a novel method called PRMxAI has been proposed to predict arginine methylation sites. The proposed PRMxAI extract sequence-based features, such as dipeptide composition, physicochemical properties, amino acid composition, and information theory-based features (Arimoto, Havrda-Charvat, Renyi, and Shannon entropy), to represent the protein sequences into numerical format. Various machine learning algorithms are implemented to select the better classifier, such as Decision trees, Naive Bayes, Random Forest, Support vector machines, and K-nearest neighbors. The random forest algorithm is selected as the underlying classifier for the PRMxAI model. The performance of PRMxAI is evaluated by employing 10-fold cross-validation, and it yields 87.17% and 90.40% accuracy on mono-methylarginine and di-methylarginine data sets, respectively. This research also examines the impact of various features on both data sets using explainable artificial intelligence. CONCLUSIONS: The proposed PRMxAI shows the effectiveness of the features for predicting arginine methylation sites. Additionally, the SHapley Additive exPlanation method is used to interpret the predictive mechanism of the proposed model. The results indicate that the proposed PRMxAI model outperforms other state-of-the-art predictors.

Subject(s)

Amino Acids , Arginine , Amino Acids/metabolism , Arginine/chemistry , Arginine/metabolism , Methylation , Artificial Intelligence , Bayes Theorem , Protein Processing, Post-Translational , Algorithms

Identification of discriminant features from stationary pattern of nucleotide bases and their application to essential gene classification.

Rout, Ranjeet Kumar; Umer, Saiyed; Khandelwal, Monika; Pati, Smitarani; Mallik, Saurav; Balabantaray, Bunil Kumar; Qin, Hong.

Front Genet ; 14: 1154120, 2023.

Article in English | MEDLINE | ID: mdl-37152988

ABSTRACT

Introduction: Essential genes are essential for the survival of various species. These genes are a family linked to critical cellular activities for species survival. These genes are coded for proteins that regulate central metabolism, gene translation, deoxyribonucleic acid replication, and fundamental cellular structure and facilitate intracellular and extracellular transport. Essential genes preserve crucial genomics information that may hold the key to a detailed knowledge of life and evolution. Essential gene studies have long been regarded as a vital topic in computational biology due to their relevance. An essential gene is composed of adenine, guanine, cytosine, and thymine and its various combinations. Methods: This paper presents a novel method of extracting information on the stationary patterns of nucleotides such as adenine, guanine, cytosine, and thymine in each gene. For this purpose, some co-occurrence matrices are derived that provide the statistical distribution of stationary patterns of nucleotides in the genes, which is helpful in establishing the relationship between the nucleotides. For extracting discriminant features from each co-occurrence matrix, energy, entropy, homogeneity, contrast, and dissimilarity features are computed, which are extracted from all co-occurrence matrices and then concatenated to form a feature vector representing each essential gene. Finally, supervised machine learning algorithms are applied for essential gene classification based on the extracted fixed-dimensional feature vectors. Results: For comparison, some existing state-of-the-art feature representation techniques such as Shannon entropy (SE), Hurst exponent (HE), fractal dimension (FD), and their combinations have been utilized. Discussion: An extensive experiment has been performed for classifying the essential genes of five species that show the robustness and effectiveness of the proposed methodology.

Multifactorial feature extraction and site prognosis model for protein methylation data.

Khandelwal, Monika; Kumar Rout, Ranjeet; Umer, Saiyed; Mallik, Saurav; Li, Aimin.

Brief Funct Genomics ; 22(1): 20-30, 2023 01 20.

Article in English | MEDLINE | ID: mdl-36310537

ABSTRACT

Integrated studies (multi-omics studies) comprising genetic, proteomic and epigenetic data analyses have become an emerging topic in biomedical research. Protein methylation is a posttranslational modification that plays an essential role in various cellular activities. The prediction of methylation sites (arginine and lysine) is vital to understand the molecular processes of protein methylation. However, current experimental techniques used for methylation site predictions are tedious and expensive. Hence, computational techniques for predicting methylation sites in proteins are necessary. For predicting methylation sites, various computational methods have been proposed in recent years. Most existing methods require structural and evolutionary information for retrieving features, acquiring this information is not always convenient. Thus, we proposed a novel method, called multi-factorial feature extraction and site prognosis model (MufeSPM), for the prediction of protein methylation sites based on information theory features (Renyi, Shannon, Havrda-Charvat and Arimoto entropy), amino acid composition and physicochemical properties acquired from protein methylation data. A random forest algorithm was used to predict methylation sites in protein sequences. This paper also studied the impact of different features and classifiers on arginine and lysine methylation data sets. For the R methylation data set, MufeSPM yielded 82.45%($\pm $ 3.47) accuracy, and for the K methylation data set, it provided an average accuracy of 71.94%($\pm $ 2.12). Additionally, the area under the receiver operating characteristic curve for different classifiers in predicting methylation site was provided. The experimental results signify that MufeSPM performs better than the state-of-the-art predictors.

Subject(s)

Lysine , Proteomics , Lysine/chemistry , Lysine/metabolism , Methylation , Protein Processing, Post-Translational , Arginine/chemistry , Arginine/metabolism , Prognosis , Algorithms , Computational Biology/methods

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL