Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 8 de 8
Filter
Add more filters










Database
Language
Publication year range
1.
J Chem Inf Model ; 64(7): 2705-2719, 2024 Apr 08.
Article in English | MEDLINE | ID: mdl-38258978

ABSTRACT

Bacterial promoters play a crucial role in gene expression by serving as docking sites for the transcription initiation machinery. However, accurately identifying promoter regions in bacterial genomes remains a challenge due to their diverse architecture and variations. In this study, we propose MLDSPP (Machine Learning and Duplex Stability based Promoter prediction in Prokaryotes), a machine learning-based promoter prediction tool, to comprehensively screen bacterial promoter regions in 12 diverse genomes. We leveraged biologically relevant and informative DNA structural properties, such as DNA duplex stability and base stacking, and state-of-the-art machine learning (ML) strategies to gain insights into promoter characteristics. We evaluated several machine learning models, including Support Vector Machines, Random Forests, and XGBoost, and assessed their performance using accuracy, precision, recall, specificity, F1 score, and MCC metrics. Our findings reveal that XGBoost outperformed other models and current state-of-the-art promoter prediction tools, namely Sigma70pred and iPromoter2L, achieving F1-scores >95% in most systems. Significantly, the use of one-hot encoding for representing nucleotide sequences complements these structural features, enhancing our XGBoost model's predictive capabilities. To address the challenge of model interpretability, we incorporated explainable AI techniques using Shapley values. This enhancement allows for a better understanding and interpretation of the predictions of our model. In conclusion, our study presents MLDSPP as a novel, generic tool for predicting promoter regions in bacteria, utilizing original downstream sequences as nonpromoter controls. This tool has the potential to significantly advance the field of bacterial genomics and contribute to our understanding of gene regulation in diverse bacterial systems.


Subject(s)
Tool Use Behavior , Bacteria/genetics , DNA/genetics , Machine Learning , Promoter Regions, Genetic
2.
ACS Omega ; 8(38): 34499-34515, 2023 Sep 26.
Article in English | MEDLINE | ID: mdl-37779998

ABSTRACT

The transcriptional regulator PehR regulates the synthesis of the extracellular plant cell wall-degrading enzyme polygalacturonase, which is essential in the bacterial wilt of plants caused by one of the most devastating plant phytopathogens, Ralstonia solanacearum. The bacterium has a wide global distribution infecting many different plant species, resulting in massive agricultural and economic losses. Because the PehR molecular structure has not yet been determined and the structural consequences of PehR on ligand binding have not been thoroughly investigated, we have used an in silico approach combined with in vitro experiments for the first time to characterize the PehR regulator from a local isolate (Tezpur, Assam, India) of the phytopathogenic bacterium R. solanacearum F1C1. In this study, an in silico approach was employed to model the 3D structure of the PehR regulator, followed by the binding analysis of different ligands against this regulatory protein. Molecular docking studies suggest that ATP has the highest binding affinity for the PehR regulator. By using molecular dynamics (MD) simulation analysis, involving root-mean-square deviation, root-mean-square fluctuations, hydrogen bonding, radius of gyration, solvent-accessible surface area, and principal component analysis, it was possible to confirm the sudden conformational changes of the PehR regulator caused by the presence of ATP. We used an in vitro approach to further validate the formation of the PehR-ATP complex. In this approach, recombinant DNA technology was used to clone, express, and purify the gene encoding the PehR regulator from R. solanacearum F1C1. Purified PehR was used in ATP-binding experiments using fluorescence spectroscopy and Fourier transform infrared spectroscopy, the outcomes of which showed a potent binding to ATP. The putative PehR-ATP-binding analysis revealed the importance of the amino acids Lys190, Glu191, Arg192, Arg375, and Asp378 for the ATP-binding process, but further study is required to confirm this. It will be simpler to comprehend the catalytic mechanisms of a crucial PehR regulator process in R. solanacearum with the aid of the ATP-binding process hints provided by these structural biology applications.

3.
J Microbiol Methods ; 207: 106707, 2023 04.
Article in English | MEDLINE | ID: mdl-36931327

ABSTRACT

For enumerating viable bacteria, traditional dilution plating to count colony forming units (CFUs) has always been the preferred method in microbiology owing to its simplicity, albeit being laborious and time-consuming. Similar CFU counts can be obtained by quantifying growing micro-colonies in conjunction with the benefits of a microscope. Here, we employed a simple method of five to ten microliter spotting of a diluted bacterial culture multiple times on a single Petri dish followed by determining CFU by counting micro-colonies using a phase-contrast microscope. In this method, the CFU of an Escherichia coli culture can be estimated within a four-hour period after spotting. Further, within a ten-hour period after spotting, CFU in a culture of Ralstonia solanacearum, a bacterium with a generation time of around 2 h, can be estimated. The CFU number determined by micro-colonies observed for 106-fold dilutions or lower is similar to that obtained by the dilution plating method for 107-fold dilutions or lower. Micro-colony numbers observed in the early hours of growth (2 h in case of E. coli and 8 h in case of R. solanacearum) were found to remain consistent at later hours (4 h in case of E. coli and 10 h in case of R. solanacearum), where the visibility of the colonies was better due to a noticeable increase in the size of the colonies. This suggested that micro-colonies observed in the early hours indeed represent the bacterial number in the culture. Practical applications to this counting method were employed in studying the rifampicin-resistant mutation rate as well as performing a fluctuation test in E. coli. The spotting method described here to enumerate bacterial CFU results in reduction of labour, time and resources.


Subject(s)
Bacteria , Escherichia coli , Colony Count, Microbial , Stem Cells
4.
Sci Rep ; 13(1): 1763, 2023 01 31.
Article in English | MEDLINE | ID: mdl-36720898

ABSTRACT

Archaea are a vast and unexplored cellular domain that thrive in a high diversity of environments, having central roles in processes mediating global carbon and nutrient fluxes. For these organisms to balance their metabolism, the appropriate regulation of their gene expression is essential. A key momentum in regulating genes responsible for the life maintenance of archaea is when transcription factor proteins bind to the promoter element. This DNA segment is conserved, which enables its exploration by machine learning techniques. Here, we trained and tested a support vector machine with 3935 known archaeal promoter sequences. All promoter sequences were coded into DNA Duplex Stability. After, we performed a model interpretation task to map the decision pattern of the classification procedure. We also used a dataset of known-promoter sequences for validation. Our results showed that an AT rich region around position - 27 upstream (relative to the start TSS) is the most conserved in the analyzed organisms. In addition, we were able to identify the BRE element (- 33), the PPE (at - 10) and a position at + 3, that provides a more understandable picture of how promoters are organized in all the archaeal organisms. Finally, we used the interpreted model to identify potential promoter sequences of 135 unannotated organisms, delivering regulatory regions annotation of archaea in a scale never accomplished before ( https://pcyt.unam.mx/gene-regulation/ ). We consider that this approach will be useful to understand how gene regulation is achieved in other organisms apart from the already established transcription factor binding sites.


Subject(s)
Artificial Intelligence , Machine Learning , Archaea/genetics , Promoter Regions, Genetic , Transcription Factors/genetics
5.
BMC Bioinformatics ; 23(1): 171, 2022 May 10.
Article in English | MEDLINE | ID: mdl-35538405

ABSTRACT

BACKGROUND: Archaea are a vast and unexplored domain. Bioinformatic techniques might enlighten the path to a higher quality genome annotation in varied organisms. Promoter sequences of archaea have the action of a plethora of proteins upon it. The conservation found in a structural level of the binding site of proteins such as TBP, TFB, and TFE aids RNAP-DNA stabilization and makes the archaeal promoter prone to be explored by statistical and machine learning techniques. RESULTS AND DISCUSSIONS: In this study, experimentally verified promoter sequences of the organisms Haloferax volcanii, Sulfolobus solfataricus, and Thermococcus kodakarensis were converted into DNA duplex stability attributes (i.e. numerical variables) and were classified through Artificial Neural Networks and an in-house statistical method of classification, being tested with three forms of controls. The recognition of these promoters enabled its use to validate unannotated promoter sequences in other organisms. As a result, the binding site of basal transcription factors was located through a DNA duplex stability codification. Additionally, the classification presented satisfactory results (above 90%) among varied levels of control. CONCLUDING REMARKS: The classification models were employed to perform genomic annotation into the archaea Aciduliprofundum boonei and Thermofilum pendens, from which potential promoters have been identified and uploaded into public repositories.


Subject(s)
Archaea , Archaeal Proteins , Archaea/genetics , Archaeal Proteins/chemistry , Archaeal Proteins/genetics , Machine Learning , Promoter Regions, Genetic , Transcription, Genetic
6.
Microbiologyopen ; 10(5): e1230, 2021 10.
Article in English | MEDLINE | ID: mdl-34713600

ABSTRACT

The transcription machinery of archaea can be roughly classified as a simplified version of eukaryotic organisms. The basal transcription factor machinery binds to the TATA box found around 28 nucleotides upstream of the transcription start site; however, some transcription units lack a clear TATA box and still have TBP/TFB binding over them. This apparent absence of conserved sequences could be a consequence of sequence divergence associated with the upstream region, operon, and gene organization. Furthermore, earlier studies have found that a structural analysis gains more information compared with a simple sequence inspection. In this work, we evaluated and coded 3630 archaeal promoter sequences of three organisms, Haloferax volcanii, Thermococcus kodakarensis, and Sulfolobus solfataricus into DNA duplex stability, enthalpy, curvature, and bendability parameters. We also split our dataset into conserved TATA and degenerated TATA promoters to identify differences among these two classes of promoters. The structural analysis reveals variations in archaeal promoter architecture, that is, a distinctive signal is observed in the TFB, TBP, and TFE binding sites independently of these being TATA-conserved or TATA-degenerated. In addition, the promoter encountering method was validated with upstream regions of 13 other archaea, suggesting that there might be promoter sequences among them. Therefore, we suggest a novel method for locating promoters within the genome of archaea based on DNA energetic/structural features.


Subject(s)
Archaea/genetics , DNA, Archaeal , Genome, Archaeal , Nucleic Acid Conformation , Promoter Regions, Genetic , TATA Box , Base Sequence , Computational Biology/methods , Protein Binding , Transcription Initiation Site , Transcription, Genetic
7.
FEBS Lett ; 595(19): 2504-2521, 2021 10.
Article in English | MEDLINE | ID: mdl-34387867

ABSTRACT

Nucleoid-associated proteins (NAPs) maintain bacterial nucleoid configuration through their architectural properties of DNA bending, wrapping, and bridging. However, the contribution of DNA structural alterations to DNA-NAP recognition at the genomic scale remains unresolved. Present work dissects the DNA sequence, shape and altered structural preferences at a genomic scale for six NAPs in Mycobacterium tuberculosis. Results suggest narrower minor groove width (MGW) and higher DNA rigidity are marked for the binding sites of EspR and Lsr2, while mIHF, MtHU and NapM have heterogeneous DNA structural predilections. In contrast, WhiB4-DNA-binding sites were characterized by wider MGW, highly deformable and less curved DNA. This work provides systematic insight into NAP-mediated genome organization as a function of DNA structural features.


Subject(s)
Bacterial Proteins/metabolism , DNA, Bacterial/chemistry , DNA, Bacterial/genetics , Genomics , Mycobacterium tuberculosis/genetics , Mycobacterium tuberculosis/metabolism , Binding Sites , DNA, Bacterial/metabolism , Gene Expression Regulation, Bacterial
8.
Biochimie ; 184: 40-51, 2021 May.
Article in English | MEDLINE | ID: mdl-33548392

ABSTRACT

The role of G-quadruplexes in the cellular physiology of human pathogenesis is an intriguing area of research. Nonetheless, their functional roles and evolutionary conservation have not been compared comprehensively in pathogenic forms of various bacterial genera and species. In the current in silico study, we addressed the role of G-quadruplex-forming sequences (G4 motifs) in the context of cis-regulation, expression variation, regulatory networks, gene orthology and ontology. Genome-wide screening across seven pathogenic genomes using the G4Hunter tool revealed the significant prevalence of G4 motifs in cis-regulatory regions compared to the intragenic regions. Significant conservation of G4 motifs was observed in the regulatory region of 300 orthologous genes. Further analysis of published ChIP-Seq data (Minch et al., 2015) of 91 DNA-binding proteins of the M. tuberculosis genome revealed significant links between G4 motifs and target sites of transcriptional regulators. Interestingly, the transcription factors entangled with virulence, in specific, CsoR, Rv0081, DevR/DosR, and TetR family are found to have G4 motifs in their target regulatory regions. Overall the current study applies positional-functional relationship computation to delve into the cis-regulation of G-quadruplex structures in the context of gene orthology in pathogenic bacteria.


Subject(s)
Bacteria/genetics , Computer Simulation , G-Quadruplexes , Genome, Bacterial , Regulatory Sequences, Nucleic Acid , Bacteria/pathogenicity
SELECTION OF CITATIONS
SEARCH DETAIL