Pesquisa | Portal Regional da BVS

1.

Meta-2OM: A multi-classifier meta-model for the accurate prediction of RNA 2'-O-methylation sites in human RNA.

Harun-Or-Roshid, Md; Pham, Nhat Truong; Manavalan, Balachandran; Kurata, Hiroyuki.

PLoS One ; 19(6): e0305406, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38924058

RESUMO

2'-O-methylation (2-OM or Nm) is a widespread RNA modification observed in various RNA types like tRNA, mRNA, rRNA, miRNA, piRNA, and snRNA, which plays a crucial role in several biological functional mechanisms and innate immunity. To comprehend its modification mechanisms and potential epigenetic regulation, it is necessary to accurately identify 2-OM sites. However, biological experiments can be tedious, time-consuming, and expensive. Furthermore, currently available computational methods face challenges due to inadequate datasets and limited classification capabilities. To address these challenges, we proposed Meta-2OM, a cutting-edge predictor that can accurately identify 2-OM sites in human RNA. In brief, we applied a meta-learning approach that considered eight conventional machine learning algorithms, including tree-based classifiers and decision boundary-based classifiers, and eighteen different feature encoding algorithms that cover physicochemical, compositional, position-specific and natural language processing information. The predicted probabilities of 2-OM sites from the baseline models are then combined and trained using logistic regression to generate the final prediction. Consequently, Meta-2OM achieved excellent performance in both 5-fold cross-validation training and independent testing, outperforming all existing state-of-the-art methods. Specifically, on the independent test set, Meta-2OM achieved an overall accuracy of 0.870, sensitivity of 0.836, specificity of 0.904, and Matthew's correlation coefficient of 0.743. To facilitate its use, a user-friendly web server and standalone program have been developed and freely available at http://kurata35.bio.kyutech.ac.jp/Meta-2OM and https://github.com/kuratahiroyuki/Meta-2OM.

Assuntos

Algoritmos , RNA , Humanos , RNA/genética , RNA/química , Metilação , Aprendizado de Máquina , Software , Biologia Computacional/métodos

2.

APLpred: A machine learning-based tool for accurate prediction and characterization of asparagine peptide lyases using sequence-derived optimal features.

Malik, Adeel; Kamli, Majid Rasool; Sabir, Jamal S M; Rather, Irfan Ahmad; Phan, Le Thi; Kim, Chang-Bae; Manavalan, Balachandran.

Methods ; 2024 Jun 27.

Artigo em Inglês | MEDLINE | ID: mdl-38944134

RESUMO

Asparagine peptide lyase (APL) is among the seven groups of proteases, also known as proteolytic enzymes, which are classified according to their catalytic residue. APLs are synthesized as precursors or propeptides that undergo self-cleavage through autoproteolytic reaction. At present, APLs are grouped into 10 families belonging to six different clans of proteases. Recognizing their critical roles in many biological processes including virus maturation, and virulence, accurate identification and characterization of APLs is indispensable. Experimental identification and characterization of APLs is laborious and time-consuming. Here, we developed APLpred, a novel support vector machine (SVM) based predictor that can predict APLs from the primary sequences. APLpred was developed using Boruta-based optimal features derived from seven encodings and subsequently trained using five machine learning algorithms. After evaluating each model on an independent dataset, we selected APLpred (an SVM-based model) due to its consistent performance during cross-validation and independent evaluation. We anticipate APLpred will be an effective tool for identifying APLs. This could aid in designing inhibitors against these enzymes and exploring their functions. The APLpred web server is freely available at https://procarb.org/APLpred/.

3.

SEP-AlgPro: An efficient allergen prediction tool utilizing traditional machine learning and deep learning techniques with protein language model features.

Basith, Shaherin; Pham, Nhat Truong; Manavalan, Balachandran; Lee, Gwang.

Int J Biol Macromol ; 273(Pt 2): 133085, 2024 Jun 11.

Artigo em Inglês | MEDLINE | ID: mdl-38871100

RESUMO

Allergy is a hypersensitive condition in which individuals develop objective symptoms when exposed to harmless substances at a dose that would cause no harm to a "normal" person. Most current computational methods for allergen identification rely on homology or conventional machine learning using limited set of feature descriptors or validation on specific datasets, making them inefficient and inaccurate. Here, we propose SEP-AlgPro for the accurate identification of allergen protein from sequence information. We analyzed 10 conventional protein-based features and 14 different features derived from protein language models to gauge their effectiveness in differentiating allergens from non-allergens using 15 different classifiers. However, the final optimized model employs top 10 feature descriptors with top seven machine learning classifiers. Results show that the features derived from protein language models exhibit superior discriminative capabilities compared to traditional feature sets. This enabled us to select the most discriminatory baseline models, whose predicted outputs were aggregated and used as input to a deep neural network for the final allergen prediction. Extensive case studies showed that SEP-AlgPro outperforms state-of-the-art predictors in accurately identifying allergens. A user-friendly web server was developed and made freely available at https://balalab-skku.org/SEP-AlgPro/, making it a powerful tool for identifying potential allergens.

4.

MLm5C: A high-precision human RNA 5-methylcytosine sites predictor based on a combination of hybrid machine learning models.

Kurata, Hiroyuki; Harun-Or-Roshid, Md; Mehedi Hasan, Md; Tsukiyama, Sho; Maeda, Kazuhiro; Manavalan, Balachandran.

Methods ; 227: 37-47, 2024 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-38729455

RESUMO

RNA modification serves as a pivotal component in numerous biological processes. Among the prevalent modifications, 5-methylcytosine (m5C) significantly influences mRNA export, translation efficiency and cell differentiation and are also associated with human diseases, including Alzheimer's disease, autoimmune disease, cancer, and cardiovascular diseases. Identification of m5C is critically responsible for understanding the RNA modification mechanisms and the epigenetic regulation of associated diseases. However, the large-scale experimental identification of m5C present significant challenges due to labor intensity and time requirements. Several computational tools, using machine learning, have been developed to supplement experimental methods, but identifying these sites lack accuracy and efficiency. In this study, we introduce a new predictor, MLm5C, for precise prediction of m5C sites using sequence data. Briefly, we evaluated eleven RNA sequence-derived features with four basic machine learning algorithms to generate baseline models. From these 44 models, we ranked them based on their performance and subsequently stacked the Top 20 baseline models as the best model, named MLm5C. The MLm5C outperformed the-state-of-the-art predictors. Notably, the optimization of the sequence length surrounding the modification sites significantly improved the prediction performance. MLm5C is an invaluable tool in accelerating the detection of m5C sites within the human genome, thereby facilitating in the characterization of their roles in post-transcriptional regulation.

Assuntos

5-Metilcitosina , Aprendizado de Máquina , RNA , Humanos , 5-Metilcitosina/metabolismo , 5-Metilcitosina/química , RNA/genética , RNA/química , RNA/metabolismo , Biologia Computacional/métodos , Processamento Pós-Transcricional do RNA , Algoritmos

5.

Computational prediction of phosphorylation sites of SARS-CoV-2 infection using feature fusion and optimization strategies.

Sabir, Mumdooh J; Kamli, Majid Rasool; Atef, Ahmed; Alhibshi, Alawiah M; Edris, Sherif; Hajarah, Nahid H; Bahieldin, Ahmed; Manavalan, Balachandran; Sabir, Jamal S M.

Methods ; 229: 1-8, 2024 May 18.

Artigo em Inglês | MEDLINE | ID: mdl-38768932

RESUMO

SARS-CoV-2's global spread has instigated a critical health and economic emergency, impacting countless individuals. Understanding the virus's phosphorylation sites is vital to unravel the molecular intricacies of the infection and subsequent changes in host cellular processes. Several computational methods have been proposed to identify phosphorylation sites, typically focusing on specific residue (S/T) or Y phosphorylation sites. Unfortunately, current predictive tools perform best on these specific residues and may not extend their efficacy to other residues, emphasizing the urgent need for enhanced methodologies. In this study, we developed a novel predictor that integrated all the residues (STY) phosphorylation sites information. We extracted ten different feature descriptors, primarily derived from composition, evolutionary, and position-specific information, and assessed their discriminative power through five classifiers. Our results indicated that Light Gradient Boosting (LGB) showed superior performance, and five descriptors displayed excellent discriminative capabilities. Subsequently, we identified the top two integrated features have high discriminative capability and trained with LGB to develop the final prediction model, LGB-IPs. The proposed approach shows an excellent performance on 10-fold cross-validation with an ACC, MCC, and AUC values of 0.831, 0.662, 0.907, respectively. Notably, these performances are replicated in the independent evaluation. Consequently, our approach may provide valuable insights into the phosphorylation mechanisms in SARS-CoV-2 infection for biomedical researchers.

6.

ac4C-AFL: A high-precision identification of human mRNA N4-acetylcytidine sites based on adaptive feature representation learning.

Pham, Nhat Truong; Terrance, Annie Terrina; Jeon, Young-Jun; Rakkiyappan, Rajan; Manavalan, Balachandran.

Mol Ther Nucleic Acids ; 35(2): 102192, 2024 Jun 11.

Artigo em Inglês | MEDLINE | ID: mdl-38779332

RESUMO

RNA N4-acetylcytidine (ac4C) is a highly conserved RNA modification that plays a crucial role in controlling mRNA stability, processing, and translation. Consequently, accurate identification of ac4C sites across the genome is critical for understanding gene expression regulation mechanisms. In this study, we have developed ac4C-AFL, a bioinformatics tool that precisely identifies ac4C sites from primary RNA sequences. In ac4C-AFL, we identified the optimal sequence length for model building and implemented an adaptive feature representation strategy that is capable of extracting the most representative features from RNA. To identify the most relevant features, we proposed a novel ensemble feature importance scoring strategy to rank features effectively. We then used this information to conduct the sequential forward search, which individually determine the optimal feature set from the 16 sequence-derived feature descriptors. Utilizing these optimal feature descriptors, we constructed 176 baseline models using 11 popular classifiers. The most efficient baseline models were identified using the two-step feature selection approach, whose predicted scores were integrated and trained with the appropriate classifier to develop the final prediction model. Our rigorous cross-validations and independent tests demonstrate that ac4C-AFL surpasses contemporary tools in predicting ac4C sites. Moreover, we have developed a publicly accessible web server at https://balalab-skku.org/ac4C-AFL/.

7.

CODENET: A deep learning model for COVID-19 detection.

Ju, Hong; Cui, Yanyan; Su, Qiaosen; Juan, Liran; Manavalan, Balachandran.

Comput Biol Med ; 171: 108229, 2024 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-38447500

RESUMO

Conventional COVID-19 testing methods have some flaws: they are expensive and time-consuming. Chest X-ray (CXR) diagnostic approaches can alleviate these flaws to some extent. However, there is no accurate and practical automatic diagnostic framework with good interpretability. The application of artificial intelligence (AI) technology to medical radiography can help to accurately detect the disease, reduce the burden on healthcare organizations, and provide good interpretability. Therefore, this study proposes a new deep neural network (CNN) based on CXR for COVID-19 diagnosis - CodeNet. This method uses contrastive learning to make full use of latent image data to enhance the model's ability to extract features and generalize across different data domains. On the evaluation dataset, the proposed method achieves an accuracy as high as 94.20%, outperforming several other existing methods used for comparison. Ablation studies validate the efficacy of the proposed method, while interpretability analysis shows that the method can effectively guide clinical professionals. This work demonstrates the superior detection performance of a CNN using contrastive learning techniques on CXR images, paving the way for computer vision and artificial intelligence technologies to leverage massive medical data for disease diagnosis.

Assuntos

COVID-19 , Aprendizado Profundo , Humanos , COVID-19/diagnóstico por imagem , Teste para COVID-19 , Inteligência Artificial , Redes Neurais de Computação

8.

Unveiling local and global conformational changes and allosteric communications in SOD1 systems using molecular dynamics simulation and network analyses.

Basith, Shaherin; Manavalan, Balachandran; Lee, Gwang.

Comput Biol Med ; 168: 107688, 2024 01.

Artigo em Inglês | MEDLINE | ID: mdl-37988788

RESUMO

BACKGROUND: Amyotrophic lateral sclerosis (ALS) is a serious neurodegenerative disorder affecting nerve cells in the brain and spinal cord that is caused by mutations in the superoxide dismutase 1 (SOD1) enzyme. ALS-related mutations cause misfolding, dimerisation instability, and increased formation of aggregates. The underlying allosteric mechanisms, however, remain obscure as far as details of their fundamental atomistic structure are concerned. Hence, this gap in knowledge limits the development of novel SOD1 inhibitors and the understanding of how disease-associated mutations in distal sites affect enzyme activity. METHODS: We combined microsecond-scale based unbiased molecular dynamics (MD) simulation with network analysis to elucidate the local and global conformational changes and allosteric communications in SOD1 Apo (unmetallated form), Holo, Apo_CallA (mutant and unmetallated form), and Holo_CallA (mutant form) systems. To identify hotspot residues involved in SOD1 signalling and allosteric communications, we performed network centrality, community network, and path analyses. RESULTS: Structural analyses showed that unmetallated SOD1 systems and cysteine mutations displayed large structural variations in the catalytic sites, affecting structural stability. Inter- and intra H-bond analyses identified several important residues crucial for maintaining interfacial stability, structural stability, and enzyme catalysis. Dynamic motion analysis demonstrated more balanced atomic displacement and highly correlated motions in the Holo system. The rationale for structural disparity observed in the disulfide bond formation and R143 configuration in Apo and Holo systems were elucidated using distance and dihedral probability distribution analyses. CONCLUSION: Our study highlights the efficiency of combining extensive MD simulations with network analyses to unravel the features of protein allostery.

Assuntos

Esclerose Lateral Amiotrófica , Simulação de Dinâmica Molecular , Humanos , Superóxido Dismutase-1/genética , Superóxido Dismutase-1/metabolismo , Superóxido Dismutase/química , Superóxido Dismutase/genética , Superóxido Dismutase/metabolismo , Esclerose Lateral Amiotrófica/genética , Mutação , Dobramento de Proteína

9.

Stack-DHUpred: Advancing the accuracy of dihydrouridine modification sites detection via stacking approach.

Harun-Or-Roshid, Md; Maeda, Kazuhiro; Phan, Le Thi; Manavalan, Balachandran; Kurata, Hiroyuki.

Comput Biol Med ; 169: 107848, 2024 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-38145601

RESUMO

Dihydrouridine (DHU, D) is one of the most abundant post-transcriptional uridine modifications found in tRNA, mRNA, and snoRNA, closely associated with disease pathogenesis and various biological processes in eukaryotes. Identifying D sites is important for understanding the modification mechanisms and/or epigenetic regulation. However, biological experiments for detecting D sites are time-consuming and expensive. Given these challenges, computational methods have been developed for accurately identifying the D sites in genome-wide datasets. However, existing methods have some limitations, and their prediction performance needs to be improved. In this work, we have developed a new computational predictor for accurately identifying D sites called Stack-DHUpred. Briefly, we trained 66 baseline models or single-feature models by connecting six machine learning classifiers with eleven different feature encoding methods and stacked different baseline models to build stacked ensemble learning models. Subsequently, the optimal combination of the baseline models was identified for the construction of the final stacked model. Remarkably, the Stack-DHUpred outperformed the existing predictors on our new independent dataset, indicating that the stacking approach significantly improved the prediction performance. We have made Stack-DHUpred available to the public through a web server (http://kurata35.bio.kyutech.ac.jp/Stack-DHUpred) and a standalone program (https://github.com/kuratahiroyuki/Stack-DHUpred). We believe that Stack-DHUpred will be a valuable tool for accelerating the discovery of D modifications and understanding their role in post-transcriptional regulation.

Assuntos

Epigênese Genética , Genoma , RNA Mensageiro , Biologia Computacional

10.

Advancing the accuracy of SARS-CoV-2 phosphorylation site detection via meta-learning approach.

Pham, Nhat Truong; Phan, Le Thi; Seo, Jimin; Kim, Yeonwoo; Song, Minkyung; Lee, Sukchan; Jeon, Young-Jun; Manavalan, Balachandran.

Brief Bioinform ; 25(1)2023 11 22.

Artigo em Inglês | MEDLINE | ID: mdl-38058187

RESUMO

The worldwide appearance of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has generated significant concern and posed a considerable challenge to global health. Phosphorylation is a common post-translational modification that affects many vital cellular functions and is closely associated with SARS-CoV-2 infection. Precise identification of phosphorylation sites could provide more in-depth insight into the processes underlying SARS-CoV-2 infection and help alleviate the continuing COVID-19 crisis. Currently, available computational tools for predicting these sites lack accuracy and effectiveness. In this study, we designed an innovative meta-learning model, Meta-Learning for Serine/Threonine Phosphorylation (MeL-STPhos), to precisely identify protein phosphorylation sites. We initially performed a comprehensive assessment of 29 unique sequence-derived features, establishing prediction models for each using 14 renowned machine learning methods, ranging from traditional classifiers to advanced deep learning algorithms. We then selected the most effective model for each feature by integrating the predicted values. Rigorous feature selection strategies were employed to identify the optimal base models and classifier(s) for each cell-specific dataset. To the best of our knowledge, this is the first study to report two cell-specific models and a generic model for phosphorylation site prediction by utilizing an extensive range of sequence-derived features and machine learning algorithms. Extensive cross-validation and independent testing revealed that MeL-STPhos surpasses existing state-of-the-art tools for phosphorylation site prediction. We also developed a publicly accessible platform at https://balalab-skku.org/MeL-STPhos. We believe that MeL-STPhos will serve as a valuable tool for accelerating the discovery of serine/threonine phosphorylation sites and elucidating their role in post-translational regulation.

Assuntos

COVID-19 , SARS-CoV-2 , Humanos , Fosforilação , SARS-CoV-2/metabolismo , Serina/metabolismo , Treonina/metabolismo

11.

ADP-Fuse: A novel two-layer machine learning predictor to identify antidiabetic peptides and diabetes types using multiview information.

Basith, Shaherin; Pham, Nhat Truong; Song, Minkyung; Lee, Gwang; Manavalan, Balachandran.

Comput Biol Med ; 165: 107386, 2023 10.

Artigo em Inglês | MEDLINE | ID: mdl-37619323

RESUMO

Diabetes mellitus has become a major public health concern associated with high mortality and reduced life expectancy and can cause blindness, heart attacks, kidney failure, lower limb amputations, and strokes. A new generation of antidiabetic peptides (ADPs) that act on ß-cells or T-cells to regulate insulin production is being developed to alleviate the effects of diabetes. However, the lack of effective peptide-mining tools has hampered the discovery of these promising drugs. Hence, novel computational tools need to be developed urgently. In this study, we present ADP-Fuse, a novel two-layer prediction framework capable of accurately identifying ADPs or non-ADPs and categorizing them into type 1 and type 2 ADPs. First, we comprehensively evaluated 22 peptide sequence-derived features coupled with eight notable machine learning algorithms. Subsequently, the most suitable feature descriptors and classifiers for both layers were identified. The output of these single-feature models, embedded with multiview information, was trained with an appropriate classifier to provide the final prediction. Comprehensive cross-validation and independent tests substantiate that ADP-Fuse surpasses single-feature models and the feature fusion approach for the prediction of ADPs and their types. In addition, the SHapley Additive exPlanation method was used to elucidate the contributions of individual features to the prediction of ADPs and their types. Finally, a user-friendly web server for ADP-Fuse was developed and made publicly accessible (https://balalab-skku.org/ADP-Fuse), enabling the swift screening and identification of novel ADPs and their types. This framework is expected to contribute significantly to antidiabetic peptide identification.

Assuntos

Diabetes Mellitus , Hipoglicemiantes , Peptídeos , Sequência de Aminoácidos , Algoritmos , Aprendizado de Máquina , Biologia Computacional

12.

Protection of c-Fos from autophagic degradation by PRMT1-mediated methylation fosters gastric tumorigenesis.

Kim, Eunji; Rahmawati, Laily; Aziz, Nur; Kim, Han Gyung; Kim, Ji Hye; Kim, Kyung-Hee; Yoo, Byong Chul; Parameswaran, Narayana; Kang, Jong-Sun; Hur, Hoon; Manavalan, Balachandran; Lee, Jongsung; Cho, Jae Youl.

Int J Biol Sci ; 19(12): 3640-3660, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37564212

RESUMO

Both AP-1 and PRMT1 are vital molecules in variety of cellular progresssion, but the interaction between these proteins in the context of cellular functions is less clear. Gastric cancer (GC) is one of the pernicious diseases worldwide. An in-depth understanding of the molecular mode of action underlying gastric tumorigenesis is still elusive. In this study, we found that PRMT1 directly interacts with c-Fos and enhances AP-1 activation. PRMT1-mediated arginine methylation (mono- and dimethylation) of c-Fos synergistically enhances c-Fos-mediated AP-1 liveliness and consequently increases c-Fos protein stabilization. Consistent with this finding, PRMT1 knockdown decreases the protein level of c-Fos. We discovered that the c-Fos protein undergoes autophagic degradation and found that PRMT1-mediated methylation at R287 protects c-Fos from autophagosomal degradation and is linked to clinicopathologic variables as well as prognosis in stomach tumor. Together, our data demonstrate that PRMT1-mediated c-Fos protein stabilization promotes gastric tumorigenesis. We contend that targeting this modification could constitute a new therapeutic strategy in gastric cancer.

Assuntos

Proteínas Proto-Oncogênicas c-fos , Neoplasias Gástricas , Humanos , Metilação , Proteínas Proto-Oncogênicas c-fos/genética , Proteínas Proto-Oncogênicas c-fos/metabolismo , Neoplasias Gástricas/genética , Fator de Transcrição AP-1/metabolismo , Proteína-Arginina N-Metiltransferases/genética , Proteína-Arginina N-Metiltransferases/metabolismo , Carcinogênese/genética , Transformação Celular Neoplásica , Arginina , Proteínas Repressoras/genética , Proteínas Repressoras/metabolismo

13.

Identification of SH2 domain-containing proteins and motifs prediction by a deep learning method.

Wu, Duanzhi; Fang, Xin; Luan, Kai; Xu, Qijin; Lin, Shiqi; Sun, Shiying; Yang, Jiaying; Dong, Bingying; Manavalan, Balachandran; Liao, Zhijun.

Comput Biol Med ; 162: 107065, 2023 08.

Artigo em Inglês | MEDLINE | ID: mdl-37267826

RESUMO

The Src Homology 2 (SH2) domain plays an important role in the signal transmission mechanism in organisms. It mediates the protein-protein interactions based on the combination between phosphotyrosine and motifs in SH2 domain. In this study, we designed a method to identify SH2 domain-containing proteins and non-SH2 domain-containing proteins through deep learning technology. Firstly, we collected SH2 and non-SH2 domain-containing protein sequences including multiple species. We built six deep learning models through DeepBIO after data preprocessing and compared their performance. Secondly, we selected the model with the strongest comprehensive ability to conduct training and test separately again, and analyze the results visually. It was found that 288-dimensional (288D) feature could effectively identify two types of proteins. Finally, motifs analysis discovered the specific motif YKIR and revealed its function in signal transduction. In summary, we successfully identified SH2 domain and non-SH2 domain proteins through deep learning method, and obtained 288D features that perform best. In addition, we found a new motif YKIR in SH2 domain, and analyzed its function which helps to further understand the signaling mechanisms within the organism.

Assuntos

Aprendizado Profundo , Domínios de Homologia de src/fisiologia , Proteínas/genética , Proteínas/metabolismo , Transdução de Sinais/fisiologia , Fosfotirosina/metabolismo , Ligação Proteica , Sítios de Ligação

14.

DrugormerDTI: Drug Graphormer for drug-target interaction prediction.

Hu, Jiayue; Yu, Wang; Pang, Chao; Jin, Junru; Pham, Nhat Truong; Manavalan, Balachandran; Wei, Leyi.

Comput Biol Med ; 161: 106946, 2023 07.

Artigo em Inglês | MEDLINE | ID: mdl-37244151

RESUMO

Drug-target interactions (DTI) prediction is a crucial task in drug discovery. Existing computational methods accelerate the drug discovery in this respect. However, most of them suffer from low feature representation ability, significantly affecting the predictive performance. To address the problem, we propose a novel neural network architecture named DrugormerDTI, which uses Graph Transformer to learn both sequential and topological information through the input molecule graph and Resudual2vec to learn the underlying relation between residues from proteins. By conducting ablation experiments, we verify the importance of each part of the DrugormerDTI. We also demonstrate the good feature extraction and expression capabilities of our model via comparing the mapping results of the attention layer and molecular docking results. Experimental results show that our proposed model performs better than baseline methods on four benchmarks. We demonstrate that the introduction of Graph Transformer and the design of residue are appropriate for drug-target prediction.

Assuntos

Desenvolvimento de Medicamentos , Redes Neurais de Computação , Simulação de Acoplamento Molecular , Desenvolvimento de Medicamentos/métodos , Descoberta de Drogas/métodos , Proteínas/química , Interações Medicamentosas

15.

VirPipe: an easy-to-use and customizable pipeline for detecting viral genomes from Nanopore sequencing.

Kim, Kijin; Park, Kyungmin; Lee, Seonghyeon; Baek, Seung-Hwan; Lim, Tae-Hun; Kim, Jongwoo; Manavalan, Balachandran; Song, Jin-Won; Kim, Won-Keun.

Bioinformatics ; 39(5)2023 05 04.

Artigo em Inglês | MEDLINE | ID: mdl-37129547

RESUMO

Detection and analysis of viral genomes with Nanopore sequencing has shown great promise in the surveillance of pathogen outbreaks. However, the number of virus detection pipelines supporting Nanopore sequencing is very limited. Here, we present VirPipe, a new pipeline for the detection of viral genomes from Nanopore or Illumina sequencing input featuring streamlined installation and customization. AVAILABILITY AND IMPLEMENTATION: VirPipe source code and documentation are freely available for download at https://github.com/KijinKims/VirPipe, implemented in Python and Nextflow.

Assuntos

Sequenciamento por Nanoporos , Nanoporos , Software , Genoma Viral , Sequenciamento de Nucleotídeos em Larga Escala

16.

A comprehensive revisit of the machine-learning tools developed for the identification of enhancers in the human genome.

Phan, Le Thi; Oh, Changmin; He, Tao; Manavalan, Balachandran.

Proteomics ; 23(13-14): e2200409, 2023 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-37021401

RESUMO

Enhancers are non-coding DNA elements that play a crucial role in enhancing the transcription rate of a specific gene in the genome. Experiments for identifying enhancers can be restricted by their conditions and involve complicated, time-consuming, laborious, and costly steps. To overcome these challenges, computational platforms have been developed to complement experimental methods that enable high-throughput identification of enhancers. Over the last few years, the development of various enhancer computational tools has resulted in significant progress in predicting putative enhancers. Thus, researchers are now able to use a variety of strategies to enhance and advance enhancer study. In this review, an overview of machine learning (ML)-based prediction methods for enhancer identification and related databases has been provided. The existing enhancer-prediction methods have also been reviewed regarding their algorithms, feature selection processes, validation techniques, and software utility. In addition, the advantages and drawbacks of these ML approaches and guidelines for developing bioinformatic tools have been highlighted for a more efficient enhancer prediction. This review will serve as a useful resource for experimentalists in selecting the appropriate ML tool for their study, and for bioinformaticians in developing more accurate and advanced ML-based predictors.

Assuntos

Elementos Facilitadores Genéticos , Genoma Humano , Humanos , Biologia Computacional/métodos , Algoritmos , Aprendizado de Máquina

17.

An Effective Integrated Machine Learning Framework for Identifying Severity of Tomato Yellow Leaf Curl Virus and Their Experimental Validation.

Bupi, Nattanong; Sangaraju, Vinoth Kumar; Phan, Le Thi; Lal, Aamir; Vo, Thuy Thi Bich; Ho, Phuong Thi; Qureshi, Muhammad Amir; Tabassum, Marjia; Lee, Sukchan; Manavalan, Balachandran.

Research (Wash D C) ; 6: 0016, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-36930763

RESUMO

Tomato yellow leaf curl virus (TYLCV) dispersed across different countries, specifically to subtropical regions, associated with more severe symptoms. Since TYLCV was first isolated in 1931, it has been a menace to tomato industrial production worldwide over the past century. Three groups were newly isolated from TYLCV-resistant tomatoes in 2022; however, their functions are unknown. The development of machine learning (ML)-based models using characterized sequences and evaluating blind predictions is one of the major challenges in interdisciplinary research. The purpose of this study was to develop an integrated computational framework for the accurate identification of symptoms (mild or severe) based on TYLCV sequences (isolated in Korea). For the development of the framework, we first extracted 11 different feature encodings and hybrid features from the training data and then explored 8 different classifiers and developed their respective prediction models by using randomized 10-fold cross-validation. Subsequently, we carried out a systematic evaluation of these 96 developed models and selected the top 90 models, whose predicted class labels were combined and considered as reduced features. On the basis of these features, a multilayer perceptron was applied and developed the final prediction model (IML-TYLCVs). We conducted blind prediction on 3 groups using IML-TYLCVs, and the results indicated that 2 groups were severe and 1 group was mild. Furthermore, we confirmed the prediction with virus-challenging experiments of tomato plant phenotypes using infectious clones from 3 groups. Plant virologists and plant breeding professionals can access the user-friendly online IML-TYLCVs web server at https://balalab-skku.org/IML-TYLCVs, which can guide them in developing new protection strategies for newly emerging viruses.

18.

How well does a data-driven prediction method distinguish dihydrouridine from tRNA and mRNA?

Basith, Shaherin; Manavalan, Balachandran.

Mol Ther Nucleic Acids ; 31: 744-745, 2023 Mar 14.

Artigo em Inglês | MEDLINE | ID: mdl-36937622

19.

PSRQSP: An effective approach for the interpretable prediction of quorum sensing peptide using propensity score representation learning.

Charoenkwan, Phasit; Chumnanpuen, Pramote; Schaduangrat, Nalini; Oh, Changmin; Manavalan, Balachandran; Shoombuatong, Watshara.

Comput Biol Med ; 158: 106784, 2023 05.

Artigo em Inglês | MEDLINE | ID: mdl-36989748

RESUMO

Quorum sensing peptides (QSPs) are microbial signaling molecules involved in several cellular processes, such as cellular communication, virulence expression, bioluminescence, and swarming, in various bacterial species. Understanding QSPs is essential for identifying novel drug targets for controlling bacterial populations and pathogenicity. In this study, we present a novel computational approach (PSRQSP) for improving the prediction and analysis of QSPs. In PSRQSP, we develop a novel propensity score representation learning (PSR) scheme. Specifically, we utilized the PSR approach to extract and learn a comprehensive set of estimated propensities of 20 amino acids, 400 dipeptides, and 400 g-gap dipeptides from a pool of scoring card method-based models. Finally, to maximize the utility of the propensity scores, we explored a set of optimal propensity scores and combined them to construct a final meta-predictor. Our experimental results showed that combining multiview propensity scores was more beneficial for identifying QSPs than the conventional feature descriptors. Moreover, extensive benchmarking experiments based on the independent test were sufficient to demonstrate the predictive capability and effectiveness of PSRQSP by outperforming the conventional ML-based and existing methods, with an accuracy of 94.44% and AUC of 0.967. PSR-derived propensity scores were employed to determine the crucial physicochemical properties for a better understanding of the functional mechanisms of QSPs. Finally, we constructed an easy-to-use web server for the PSRQSP (http://pmlabstack.pythonanywhere.com/PSRQSP). PSRQSP is anticipated to be an efficient computational tool for accelerating the data-driven discovery of potential QSPs for drug discovery and development.

Assuntos

Peptídeos , Percepção de Quorum , Pontuação de Propensão , Peptídeos/química , Dipeptídeos/química , Bactérias

20.

Pretoria: An effective computational approach for accurate and high-throughput identification of CD8⁺ t-cell epitopes of eukaryotic pathogens.

Charoenkwan, Phasit; Schaduangrat, Nalini; Pham, Nhat Truong; Manavalan, Balachandran; Shoombuatong, Watshara.

Int J Biol Macromol ; 238: 124228, 2023 May 31.

Artigo em Inglês | MEDLINE | ID: mdl-36996953

RESUMO

T-cells recognize antigenic epitopes present on major histocompatibility complex (MHC) molecules, triggering an adaptive immune response in the host. T-cell epitope (TCE) identification is challenging because of the extensive number of undetermined proteins found in eukaryotic pathogens, as well as MHC polymorphisms. In addition, conventional experimental approaches for TCE identification are time-consuming and expensive. Thus, computational approaches that can accurately and rapidly identify CD8+ T-cell epitopes (TCEs) of eukaryotic pathogens based solely on sequence information may facilitate the discovery of novel CD8+ TCEs in a cost-effective manner. Here, Pretoria (Predictor of CD8+ TCEs of eukaryotic pathogens) is proposed as the first stack-based approach for accurate and large-scale identification of CD8+ TCEs of eukaryotic pathogens. In particular, Pretoria enabled the extraction and exploration of crucial information embedded in CD8+ TCEs by employing a comprehensive set of 12 well-known feature descriptors extracted from multiple groups, including physicochemical properties, composition-transition-distribution, pseudo-amino acid composition, and amino acid composition. These feature descriptors were then utilized to construct a pool of 144 different machine learning (ML)-based classifiers based on 12 popular ML algorithms. Finally, the feature selection method was used to effectively determine the important ML classifiers for the construction of our stacked model. The experimental results indicated that Pretoria is an accurate and effective computational approach for CD8+ TCE prediction; it was superior to several conventional ML classifiers and the existing method in terms of the independent test, with an accuracy of 0.866, MCC of 0.732, and AUC of 0.921. Additionally, to maximize user convenience for high-throughput identification of CD8+ TCEs of eukaryotic pathogens, a user-friendly web server of Pretoria (http://pmlabstack.pythonanywhere.com/Pretoria) was developed and made freely available.

Assuntos

Epitopos de Linfócito T , Eucariotos , África do Sul , Linfócitos T CD8-Positivos , Algoritmos , Proteínas , Aminoácidos/química , Biologia Computacional

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA