Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
J Chem Inf Model ; 64(7): 2705-2719, 2024 04 08.
Artículo en Inglés | MEDLINE | ID: mdl-38258978

RESUMEN

Bacterial promoters play a crucial role in gene expression by serving as docking sites for the transcription initiation machinery. However, accurately identifying promoter regions in bacterial genomes remains a challenge due to their diverse architecture and variations. In this study, we propose MLDSPP (Machine Learning and Duplex Stability based Promoter prediction in Prokaryotes), a machine learning-based promoter prediction tool, to comprehensively screen bacterial promoter regions in 12 diverse genomes. We leveraged biologically relevant and informative DNA structural properties, such as DNA duplex stability and base stacking, and state-of-the-art machine learning (ML) strategies to gain insights into promoter characteristics. We evaluated several machine learning models, including Support Vector Machines, Random Forests, and XGBoost, and assessed their performance using accuracy, precision, recall, specificity, F1 score, and MCC metrics. Our findings reveal that XGBoost outperformed other models and current state-of-the-art promoter prediction tools, namely Sigma70pred and iPromoter2L, achieving F1-scores >95% in most systems. Significantly, the use of one-hot encoding for representing nucleotide sequences complements these structural features, enhancing our XGBoost model's predictive capabilities. To address the challenge of model interpretability, we incorporated explainable AI techniques using Shapley values. This enhancement allows for a better understanding and interpretation of the predictions of our model. In conclusion, our study presents MLDSPP as a novel, generic tool for predicting promoter regions in bacteria, utilizing original downstream sequences as nonpromoter controls. This tool has the potential to significantly advance the field of bacterial genomics and contribute to our understanding of gene regulation in diverse bacterial systems.


Asunto(s)
Comportamiento del Uso de la Herramienta , Bacterias/genética , ADN/genética , Aprendizaje Automático , Regiones Promotoras Genéticas
2.
BMC Bioinformatics ; 23(1): 171, 2022 May 10.
Artículo en Inglés | MEDLINE | ID: mdl-35538405

RESUMEN

BACKGROUND: Archaea are a vast and unexplored domain. Bioinformatic techniques might enlighten the path to a higher quality genome annotation in varied organisms. Promoter sequences of archaea have the action of a plethora of proteins upon it. The conservation found in a structural level of the binding site of proteins such as TBP, TFB, and TFE aids RNAP-DNA stabilization and makes the archaeal promoter prone to be explored by statistical and machine learning techniques. RESULTS AND DISCUSSIONS: In this study, experimentally verified promoter sequences of the organisms Haloferax volcanii, Sulfolobus solfataricus, and Thermococcus kodakarensis were converted into DNA duplex stability attributes (i.e. numerical variables) and were classified through Artificial Neural Networks and an in-house statistical method of classification, being tested with three forms of controls. The recognition of these promoters enabled its use to validate unannotated promoter sequences in other organisms. As a result, the binding site of basal transcription factors was located through a DNA duplex stability codification. Additionally, the classification presented satisfactory results (above 90%) among varied levels of control. CONCLUDING REMARKS: The classification models were employed to perform genomic annotation into the archaea Aciduliprofundum boonei and Thermofilum pendens, from which potential promoters have been identified and uploaded into public repositories.


Asunto(s)
Archaea , Proteínas Arqueales , Archaea/genética , Proteínas Arqueales/química , Proteínas Arqueales/genética , Aprendizaje Automático , Regiones Promotoras Genéticas , Transcripción Genética
3.
ACS Omega ; 8(38): 34499-34515, 2023 Sep 26.
Artículo en Inglés | MEDLINE | ID: mdl-37779998

RESUMEN

The transcriptional regulator PehR regulates the synthesis of the extracellular plant cell wall-degrading enzyme polygalacturonase, which is essential in the bacterial wilt of plants caused by one of the most devastating plant phytopathogens, Ralstonia solanacearum. The bacterium has a wide global distribution infecting many different plant species, resulting in massive agricultural and economic losses. Because the PehR molecular structure has not yet been determined and the structural consequences of PehR on ligand binding have not been thoroughly investigated, we have used an in silico approach combined with in vitro experiments for the first time to characterize the PehR regulator from a local isolate (Tezpur, Assam, India) of the phytopathogenic bacterium R. solanacearum F1C1. In this study, an in silico approach was employed to model the 3D structure of the PehR regulator, followed by the binding analysis of different ligands against this regulatory protein. Molecular docking studies suggest that ATP has the highest binding affinity for the PehR regulator. By using molecular dynamics (MD) simulation analysis, involving root-mean-square deviation, root-mean-square fluctuations, hydrogen bonding, radius of gyration, solvent-accessible surface area, and principal component analysis, it was possible to confirm the sudden conformational changes of the PehR regulator caused by the presence of ATP. We used an in vitro approach to further validate the formation of the PehR-ATP complex. In this approach, recombinant DNA technology was used to clone, express, and purify the gene encoding the PehR regulator from R. solanacearum F1C1. Purified PehR was used in ATP-binding experiments using fluorescence spectroscopy and Fourier transform infrared spectroscopy, the outcomes of which showed a potent binding to ATP. The putative PehR-ATP-binding analysis revealed the importance of the amino acids Lys190, Glu191, Arg192, Arg375, and Asp378 for the ATP-binding process, but further study is required to confirm this. It will be simpler to comprehend the catalytic mechanisms of a crucial PehR regulator process in R. solanacearum with the aid of the ATP-binding process hints provided by these structural biology applications.

4.
Sci Rep ; 13(1): 1763, 2023 01 31.
Artículo en Inglés | MEDLINE | ID: mdl-36720898

RESUMEN

Archaea are a vast and unexplored cellular domain that thrive in a high diversity of environments, having central roles in processes mediating global carbon and nutrient fluxes. For these organisms to balance their metabolism, the appropriate regulation of their gene expression is essential. A key momentum in regulating genes responsible for the life maintenance of archaea is when transcription factor proteins bind to the promoter element. This DNA segment is conserved, which enables its exploration by machine learning techniques. Here, we trained and tested a support vector machine with 3935 known archaeal promoter sequences. All promoter sequences were coded into DNA Duplex Stability. After, we performed a model interpretation task to map the decision pattern of the classification procedure. We also used a dataset of known-promoter sequences for validation. Our results showed that an AT rich region around position - 27 upstream (relative to the start TSS) is the most conserved in the analyzed organisms. In addition, we were able to identify the BRE element (- 33), the PPE (at - 10) and a position at + 3, that provides a more understandable picture of how promoters are organized in all the archaeal organisms. Finally, we used the interpreted model to identify potential promoter sequences of 135 unannotated organisms, delivering regulatory regions annotation of archaea in a scale never accomplished before ( https://pcyt.unam.mx/gene-regulation/ ). We consider that this approach will be useful to understand how gene regulation is achieved in other organisms apart from the already established transcription factor binding sites.


Asunto(s)
Inteligencia Artificial , Aprendizaje Automático , Archaea/genética , Regiones Promotoras Genéticas , Factores de Transcripción/genética
5.
J Microbiol Methods ; 207: 106707, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-36931327

RESUMEN

For enumerating viable bacteria, traditional dilution plating to count colony forming units (CFUs) has always been the preferred method in microbiology owing to its simplicity, albeit being laborious and time-consuming. Similar CFU counts can be obtained by quantifying growing micro-colonies in conjunction with the benefits of a microscope. Here, we employed a simple method of five to ten microliter spotting of a diluted bacterial culture multiple times on a single Petri dish followed by determining CFU by counting micro-colonies using a phase-contrast microscope. In this method, the CFU of an Escherichia coli culture can be estimated within a four-hour period after spotting. Further, within a ten-hour period after spotting, CFU in a culture of Ralstonia solanacearum, a bacterium with a generation time of around 2 h, can be estimated. The CFU number determined by micro-colonies observed for 106-fold dilutions or lower is similar to that obtained by the dilution plating method for 107-fold dilutions or lower. Micro-colony numbers observed in the early hours of growth (2 h in case of E. coli and 8 h in case of R. solanacearum) were found to remain consistent at later hours (4 h in case of E. coli and 10 h in case of R. solanacearum), where the visibility of the colonies was better due to a noticeable increase in the size of the colonies. This suggested that micro-colonies observed in the early hours indeed represent the bacterial number in the culture. Practical applications to this counting method were employed in studying the rifampicin-resistant mutation rate as well as performing a fluctuation test in E. coli. The spotting method described here to enumerate bacterial CFU results in reduction of labour, time and resources.


Asunto(s)
Bacterias , Escherichia coli , Recuento de Colonia Microbiana , Células Madre
6.
Biochimie ; 184: 40-51, 2021 May.
Artículo en Inglés | MEDLINE | ID: mdl-33548392

RESUMEN

The role of G-quadruplexes in the cellular physiology of human pathogenesis is an intriguing area of research. Nonetheless, their functional roles and evolutionary conservation have not been compared comprehensively in pathogenic forms of various bacterial genera and species. In the current in silico study, we addressed the role of G-quadruplex-forming sequences (G4 motifs) in the context of cis-regulation, expression variation, regulatory networks, gene orthology and ontology. Genome-wide screening across seven pathogenic genomes using the G4Hunter tool revealed the significant prevalence of G4 motifs in cis-regulatory regions compared to the intragenic regions. Significant conservation of G4 motifs was observed in the regulatory region of 300 orthologous genes. Further analysis of published ChIP-Seq data (Minch et al., 2015) of 91 DNA-binding proteins of the M. tuberculosis genome revealed significant links between G4 motifs and target sites of transcriptional regulators. Interestingly, the transcription factors entangled with virulence, in specific, CsoR, Rv0081, DevR/DosR, and TetR family are found to have G4 motifs in their target regulatory regions. Overall the current study applies positional-functional relationship computation to delve into the cis-regulation of G-quadruplex structures in the context of gene orthology in pathogenic bacteria.


Asunto(s)
Bacterias/genética , Simulación por Computador , G-Cuádruplex , Genoma Bacteriano , Secuencias Reguladoras de Ácidos Nucleicos , Bacterias/patogenicidad
7.
Microbiologyopen ; 10(5): e1230, 2021 10.
Artículo en Inglés | MEDLINE | ID: mdl-34713600

RESUMEN

The transcription machinery of archaea can be roughly classified as a simplified version of eukaryotic organisms. The basal transcription factor machinery binds to the TATA box found around 28 nucleotides upstream of the transcription start site; however, some transcription units lack a clear TATA box and still have TBP/TFB binding over them. This apparent absence of conserved sequences could be a consequence of sequence divergence associated with the upstream region, operon, and gene organization. Furthermore, earlier studies have found that a structural analysis gains more information compared with a simple sequence inspection. In this work, we evaluated and coded 3630 archaeal promoter sequences of three organisms, Haloferax volcanii, Thermococcus kodakarensis, and Sulfolobus solfataricus into DNA duplex stability, enthalpy, curvature, and bendability parameters. We also split our dataset into conserved TATA and degenerated TATA promoters to identify differences among these two classes of promoters. The structural analysis reveals variations in archaeal promoter architecture, that is, a distinctive signal is observed in the TFB, TBP, and TFE binding sites independently of these being TATA-conserved or TATA-degenerated. In addition, the promoter encountering method was validated with upstream regions of 13 other archaea, suggesting that there might be promoter sequences among them. Therefore, we suggest a novel method for locating promoters within the genome of archaea based on DNA energetic/structural features.


Asunto(s)
Archaea/genética , ADN de Archaea , Genoma Arqueal , Conformación de Ácido Nucleico , Regiones Promotoras Genéticas , TATA Box , Secuencia de Bases , Biología Computacional/métodos , Unión Proteica , Sitio de Iniciación de la Transcripción , Transcripción Genética
8.
FEBS Lett ; 595(19): 2504-2521, 2021 10.
Artículo en Inglés | MEDLINE | ID: mdl-34387867

RESUMEN

Nucleoid-associated proteins (NAPs) maintain bacterial nucleoid configuration through their architectural properties of DNA bending, wrapping, and bridging. However, the contribution of DNA structural alterations to DNA-NAP recognition at the genomic scale remains unresolved. Present work dissects the DNA sequence, shape and altered structural preferences at a genomic scale for six NAPs in Mycobacterium tuberculosis. Results suggest narrower minor groove width (MGW) and higher DNA rigidity are marked for the binding sites of EspR and Lsr2, while mIHF, MtHU and NapM have heterogeneous DNA structural predilections. In contrast, WhiB4-DNA-binding sites were characterized by wider MGW, highly deformable and less curved DNA. This work provides systematic insight into NAP-mediated genome organization as a function of DNA structural features.


Asunto(s)
Proteínas Bacterianas/metabolismo , ADN Bacteriano/química , ADN Bacteriano/genética , Genómica , Mycobacterium tuberculosis/genética , Mycobacterium tuberculosis/metabolismo , Sitios de Unión , ADN Bacteriano/metabolismo , Regulación Bacteriana de la Expresión Génica
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA