Your browser doesn't support javascript.
loading
MLDSPP: Bacterial Promoter Prediction Tool Using DNA Structural Properties with Machine Learning and Explainable AI.
Paul, Subhojit; Olymon, Kaushika; Martinez, Gustavo Sganzerla; Sarkar, Sharmilee; Yella, Venkata Rajesh; Kumar, Aditya.
Afiliação
  • Paul S; Department of Molecular Biology and Biotechnology, Tezpur University, Tezpur 784028, Assam, India.
  • Olymon K; Department of Molecular Biology and Biotechnology, Tezpur University, Tezpur 784028, Assam, India.
  • Martinez GS; Microbiology and Immunology, Dalhousie University, Halifax, Nova Scotia B3H 4H7, Canada.
  • Sarkar S; Pediatrics, Izaak Walton Killam (IWK) Health Center, Canadian Center for Vaccinology (CCfV), Halifax, Nova Scotia B3H 4H7, Canada.
  • Yella VR; Department of Molecular Biology and Biotechnology, Tezpur University, Tezpur 784028, Assam, India.
  • Kumar A; Department of Biotechnology, Koneru Lakshmaiah Education Foundation, Guntur 522302, Andhra Pradesh, India.
J Chem Inf Model ; 64(7): 2705-2719, 2024 04 08.
Article em En | MEDLINE | ID: mdl-38258978
ABSTRACT
Bacterial promoters play a crucial role in gene expression by serving as docking sites for the transcription initiation machinery. However, accurately identifying promoter regions in bacterial genomes remains a challenge due to their diverse architecture and variations. In this study, we propose MLDSPP (Machine Learning and Duplex Stability based Promoter prediction in Prokaryotes), a machine learning-based promoter prediction tool, to comprehensively screen bacterial promoter regions in 12 diverse genomes. We leveraged biologically relevant and informative DNA structural properties, such as DNA duplex stability and base stacking, and state-of-the-art machine learning (ML) strategies to gain insights into promoter characteristics. We evaluated several machine learning models, including Support Vector Machines, Random Forests, and XGBoost, and assessed their performance using accuracy, precision, recall, specificity, F1 score, and MCC metrics. Our findings reveal that XGBoost outperformed other models and current state-of-the-art promoter prediction tools, namely Sigma70pred and iPromoter2L, achieving F1-scores >95% in most systems. Significantly, the use of one-hot encoding for representing nucleotide sequences complements these structural features, enhancing our XGBoost model's predictive capabilities. To address the challenge of model interpretability, we incorporated explainable AI techniques using Shapley values. This enhancement allows for a better understanding and interpretation of the predictions of our model. In conclusion, our study presents MLDSPP as a novel, generic tool for predicting promoter regions in bacteria, utilizing original downstream sequences as nonpromoter controls. This tool has the potential to significantly advance the field of bacterial genomics and contribute to our understanding of gene regulation in diverse bacterial systems.
Assuntos

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Comportamento de Utilização de Ferramentas Tipo de estudo: Prognostic_studies / Risk_factors_studies Idioma: En Revista: J Chem Inf Model Assunto da revista: INFORMATICA MEDICA / QUIMICA Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Índia País de publicação: Estados Unidos

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Comportamento de Utilização de Ferramentas Tipo de estudo: Prognostic_studies / Risk_factors_studies Idioma: En Revista: J Chem Inf Model Assunto da revista: INFORMATICA MEDICA / QUIMICA Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Índia País de publicação: Estados Unidos