ABSTRACT
The three-dimensional (3D) structure of bacterial chromosomes is crucial for understanding chromosome function. With the growing availability of high-throughput chromosome conformation capture (3C/Hi-C) data, the 3D structure reconstruction algorithms have become powerful tools to study bacterial chromosome structure and function. It is highly desired to have a recommendation on the chromosome structure reconstruction tools to facilitate the prokaryotic 3D genomics. In this work, we review existing chromosome 3D structure reconstruction algorithms and classify them based on their underlying computational models into two categories: constraint-based modeling and thermodynamics-based modeling. We briefly compare these algorithms utilizing 3C/Hi-C datasets and fluorescence microscopy data obtained from Escherichia coli and Caulobacter crescentus, as well as simulated datasets. We discuss current challenges in the 3D reconstruction algorithms for bacterial chromosomes, primarily focusing on software usability. Finally, we briefly prospect future research directions for bacterial chromosome structure reconstruction algorithms.
Subject(s)
Bacteria , Chromosome Structures , Prokaryotic Cells , Chromosomes, Bacterial/genetics , Algorithms , Escherichia coli/geneticsABSTRACT
MOTIVATION: Reconstruction of 3D structure models is of great importance for the study of chromosome function. Software tools for this task are highly needed. RESULTS: We present a novel reconstruction algorithm, called EVRC, which utilizes co-clustering coefficients and error-vector resultant for chromosome 3D structure reconstruction. As an update of our previous EVR algorithm, EVRC now can deal with both single and multiple chromosomes in structure modeling. To evaluate the effectiveness and accuracy of the EVRC algorithm, we applied it to simulation datasets and real Hi-C datasets. The results show that the reconstructed structures have high similarity to the original/real structures, indicating the effectiveness and robustness of the EVRC algorithm. Furthermore, we applied the algorithm to the 3D conformation reconstruction of the wild-type and mutant Arabidopsis thaliana chromosomes and demonstrated the differences in structural characteristics between different chromosomes. We also accurately showed the conformational change in the centromere region of the mutant compared with the wild-type of Arabidopsis chromosome 1. Our EVRC algorithm is a valuable software tool for the field of chromatin structure reconstruction, and holds great promise for advancing our understanding on the chromosome functions. AVAILABILITY AND IMPLEMENTATION: The software is available at https://github.com/mbglab/EVRC.
Subject(s)
Chromosome Structures , Chromosomes , Chromosomes/genetics , Algorithms , Software , Centromere , Cluster AnalysisABSTRACT
The dynamic adaptation of bacteria to environmental changes is achieved through the coordinated expression of many genes, which constitutes a transcriptional regulatory network (TRN). Bradyrhizobium diazoefficiens USDA110 is an important model strain for the study of symbiotic nitrogen fixation (SNF), and its SNF ability largely depends on the TRN. In this study, independent component analysis was applied to 226 high-quality gene expression profiles of B. diazoefficiens USDA110 microarray datasets, from which 64 iModulons were identified. Using these iModulons and their condition-specific activity levels, we (1) provided new insights into the connection between the FixLJ-FixK2-FixK1 regulatory cascade and quorum sensing, (2) discovered the independence of the FixLJ-FixK2-FixK1 and NifA/RpoN regulatory cascades in response to oxygen, (3) identified the FixLJ-FixK2 cascade as a mediator connecting the FixK2-2 iModulon and the Phenylalanine iModulon, (4) described the differential activation of iModulons in B. diazoefficiens USDA110 under different environmental conditions, and (5) proposed a notion of active-TRN based on the changes in iModulon activity to better illustrate the relationship between gene regulation and environmental condition. In sum, this research offered an iModulon-based TRN for B. diazoefficiens USDA110, which formed a foundation for comprehensively understanding the intricate transcriptional regulation during SNF.
Subject(s)
Bradyrhizobium , Gene Expression Regulation , Gene Regulatory Networks , Bradyrhizobium/genetics , AcclimatizationABSTRACT
Symbiotic nitrogen fixation is an important part of the nitrogen biogeochemical cycles and the main nitrogen source of the biosphere. As a classical model system for symbiotic nitrogen fixation, rhizobium-legume systems have been studied elaborately for decades. Details about the molecular mechanisms of the communication and coordination between rhizobia and host plants is becoming clearer. For more systematic insights, there is an increasing demand for new studies integrating multiomics information. Here, we present a comprehensive computational framework integrating the reconstructed protein interactome of B. diazoefficiens USDA110 with its transcriptome and proteome data to study the complex protein-protein interaction (PPI) network involved in the symbiosis system. We reconstructed the interactome of B. diazoefficiens USDA110 by computational approaches. Based on the comparison of interactomes between B. diazoefficiens USDA110 and other rhizobia, we inferred that the slow growth of B. diazoefficiens USDA110 may be due to the requirement of more protein modifications, and we further identified 36 conserved functional PPI modules. Integrated with transcriptome and proteome data, interactomes representing free-living cell and symbiotic nitrogen-fixing (SNF) bacteroid were obtained. Based on the SNF interactome, a core-sub-PPI-network for symbiotic nitrogen fixation was determined and nine novel functional modules and eleven key protein hubs playing key roles in symbiosis were identified. The reconstructed interactome of B. diazoefficiens USDA110 may serve as a valuable reference for studying the mechanism underlying the SNF system of rhizobia and legumes.
Subject(s)
Bacterial Proteins/metabolism , Bradyrhizobium/metabolism , Nitrogen Fixation , Nitrogen/metabolism , Protein Interaction Maps , Rhizobium/physiology , Root Nodules, Plant/metabolism , Bacterial Proteins/genetics , Bradyrhizobium/genetics , Bradyrhizobium/growth & development , Proteome , Root Nodules, Plant/genetics , Glycine max/microbiology , Symbiosis , TranscriptomeABSTRACT
BACKGROUND: More and more 3C/Hi-C experiments on prokaryotes have been published. However, most of the published modeling tools for chromosome 3D structures are targeting at eukaryotes. How to transform prokaryotic experimental chromosome interaction data into spatial structure models is an important task and in great need. RESULTS: We have developed a new reconstruction program for bacterial chromosome 3D structure models called EVR that exploits a simple Error-Vector Resultant (EVR) algorithm. This software tool is particularly optimized for the closed-loop structural features of prokaryotic chromosomes. The parallel implementation of the program can utilize the computing power of both multi-core CPUs and GPUs. CONCLUSIONS: EVR can be used to reconstruct the bacterial 3D chromosome structure based on the contact frequency matrix derived from 3C/Hi-C experimental data quickly and precisely.
Subject(s)
Bacteria/genetics , Chromosomes, Bacterial/chemistry , Algorithms , Bacteria/chemistry , Computational Biology , Models, Molecular , Molecular Conformation , SoftwareABSTRACT
Protein hubs in protein-protein interaction network are especially important due to their central roles in the entire network. Despite of their importance, the folding kinetics of hub proteins in comparison with non-hubs is still unknown. In this work, the folding rates for protein hubs and non-hubs were predicted and compared for the interactome of Escherichia coli K12, and the results showed that hub proteins fold faster than non-hub proteins. A possible explanation might be that protein hubs have more and fast-folding structural conformations than non-hubs, which leads to the notion of "hub of hubs" in the protein conformation space. It was found that the sequence and structure features relevant to protein folding rates are also different between hub and non-hub proteins. Moreover, the interacting proteins tend to have similar folding rates. These results gave insightful implications for understanding the interplay between the mechanisms of protein folding and interaction.
Subject(s)
Escherichia coli/genetics , Protein Folding , Protein Interaction Maps/genetics , Proteome/chemistry , Computational Biology , Escherichia coli/chemistry , Protein Binding , Protein Conformation , Protein Interaction Mapping , Proteome/geneticsABSTRACT
DNA, RNA and proteins are major biological macromolecules that coevolve and adapt to environments as components of one highly interconnected system. We explore here sequence/structure determinants of mechanisms of adaptation of these molecules, links between them, and results of their mutual evolution. We complemented statistical analysis of genomic and proteomic sequences with folding simulations of RNA molecules, unraveling causal relations between compositional and sequence biases reflecting molecular adaptation on DNA, RNA and protein levels. We found many compositional peculiarities related to environmental adaptation and the life style. Specifically, thermal adaptation of protein-coding sequences in Archaea is characterized by a stronger codon bias than in Bacteria. Guanine and cytosine load in the third codon position is important for supporting the aerobic life style, and it is highly pronounced in Bacteria. The third codon position also provides a tradeoff between arginine and lysine, which are favorable for thermal adaptation and aerobicity, respectively. Dinucleotide composition provides stability of nucleic acids via strong base-stacking in ApG dinucleotides. In relation to coevolution of nucleic acids and proteins, thermostability-related demands on the amino acid composition affect the nucleotide content in the second codon position in Archaea.
Subject(s)
Adaptation, Physiological/genetics , DNA/chemistry , Evolution, Molecular , Proteins/chemistry , RNA/chemistry , Aerobiosis , Base Composition , Base Sequence , Codon , Nucleotides/analysis , RNA, Messenger/chemistry , Sequence Analysis, Protein , TemperatureABSTRACT
BACKGROUND: In bacterial genomes, the compactly encoded genes and operons are well organized, with genes in the same biological pathway or operons in the same regulon close to each other on the genome sequence. In addition, the linearly close genes have a higher probability of co-expression and their protein products tend to form protein-protein interactions. However, the organization features of bacterial genomes in a three-dimensional space remain elusive. The DNA interaction data of Escherichia coli, measured by the genome conformation capture (GCC) technique, have recently become available, which allowed us to investigate the spatial features of bacterial genome organization. RESULTS: By renormalizing the GCC data, we compared the interaction frequency of operon pairs in the same regulon with that of random operon pairs. The results showed that arrangements of operons in the E. coli genome tend to minimize the spatial distance between operons in the same regulon. A similar global organization feature exists for genes in biological pathways of E. coli. In addition, the genes close to each other spatially (even if they are far from each other on the genome sequence) tend to be co-expressed and form protein-protein interactions. These results provided new insights into the organization principles of bacterial genomes and support the notion of transcription factory. CONCLUSIONS: This study revealed the organization features of Escherichia coli genomic functional units in the 3D space and furthered our understanding of the link between the three-dimensional structure of chromosomes and biological function.
Subject(s)
Escherichia coli/genetics , Genome, Bacterial/genetics , Protein Interaction Maps/genetics , Gene Expression Regulation, Bacterial , Operon/geneticsABSTRACT
The spatial organization of bacterial chromosomes is crucial for cellular functions. It remains unclear how bacterial chromosomes adapt to high-temperature stress. This study delves into the 3D genome architecture and transcriptomic responses of Escherichia coli under heat-stress conditions to unravel the intricate interplay between the chromosome structure and environmental cues. By examining the role of macrodomains, chromosome interaction domains (CIDs), and nucleoid-associated proteins (NAPs), this work unveils the dynamic changes in chromosome conformation and gene expression patterns induced by high-temperature stress. It was observed that, under heat stress, the short-range interaction frequency of the chromosomes decreased, while the long-range interaction frequency of the Ter macrodomain increased. Furthermore, two metrics, namely, Global Compactness (GC) and Local Compactness (LC), were devised to measure and compare the compactness of the chromosomes based on their 3D structure models. The findings in this work shed light on the molecular mechanisms underlying thermal adaptation and chromosomal organization in bacterial cells, offering valuable insights into the complex inter-relationships between environmental stimuli and genomic responses.
ABSTRACT
Sinorhizobium fredii CCBAU45436 is an excellent rhizobium that plays an important role in agricultural production. However, there still needs more comprehensive understanding of the metabolic system of S. fredii CCBAU45436, which hinders its application in agriculture. Therefore, based on the first-generation metabolic model iCC541 we developed a new genome-scale metabolic model iAQY970, which contains 970 genes, 1,052 reactions, 942 metabolites and is scored 89% in the MEMOTE test. Cell growth phenotype predicted by iAQY970 is 81.7% consistent with the experimental data. The results of mapping the proteome data under free-living and symbiosis conditions to the model showed that the biomass production rate in the logarithmic phase was faster than that in the stable phase, and the nitrogen fixation efficiency of rhizobia parasitized in cultivated soybean was higher than that in wild-type soybean, which was consistent with the actual situation. In the symbiotic condition, there are 184 genes that would affect growth, of which 94 are essential; In the free-living condition, there are 143 genes that influence growth, of which 78 are essential. Among them, 86 of the 94 essential genes in the symbiotic condition were consistent with the prediction of iCC541, and 44 essential genes were confirmed by literature information; meanwhile, 30 genes were identified by DEG and 33 genes were identified by Geptop. In addition, we extracted four key nitrogen fixation modules from the model and predicted that sulfite reductase (EC 1.8.7.1) and nitrogenase (EC 1.18.6.1) as the target enzymes to enhance nitrogen fixation by MOMA, which provided a potential focus for strain optimization. Through the comprehensive metabolic model, we can better understand the metabolic capabilities of S. fredii CCBAU45436 and make full use of it in the future.
ABSTRACT
The ability to modulate gene expression is crucial for studying gene function and programming cell behaviors. Combining the reliability of CRISPRi and the precision of optogenetics, the optoCRISPRi technique is emerging as an advanced tool for live-cell gene regulation. Since previous versions of optoCRISPRi often exhibit no more than a 10-fold dynamic range due to the leakage activity, they are not suitable for targets that are sensitive to such leakage or critical for cell growth. Here, we describe a green-light-activated CRISPRi system with a high dynamic range (40 fold) and the flexibility of changing targets in Escherichia coli. Our optoCRISPRi-HD system can efficiently repress essential genes, nonessential genes, or inhibit the initiation of DNA replication. Providing a regulative system with high resolution over space-time and extensive targets, our study would facilitate further research involving complex gene networks, metabolic flux redirection, or bioprinting.
Subject(s)
CRISPR-Cas Systems , Escherichia coli Proteins , Metabolic Engineering/methods , Reproducibility of Results , Escherichia coli/genetics , Escherichia coli/metabolism , Escherichia coli Proteins/geneticsABSTRACT
Although the metabolic networks of the three domains of life consist of different constituents and metabolic pathways, they exhibit the same scale-free organization. This phenomenon has been hypothetically explained by preferential attachment principle that the new-recruited metabolites attach preferentially to those that are already well connected. However, since metabolites are usually small molecules and metabolic processes are basically chemical reactions, we speculate that the metabolic network organization may have a chemical basis. In this paper, chemoinformatic analyses on metabolic networks of Kyoto Encyclopedia of Genes and Genomes (KEGG), Escherichia coli and Saccharomyces cerevisiae were performed. It was found that there exist qualitative and quantitative correlations between network topology and chemical properties of metabolites. The metabolites with larger degrees of connectivity (hubs) are of relatively stronger polarity. This suggests that metabolic networks are chemically organized to a certain extent, which was further elucidated in terms of high concentrations required by metabolic hubs to drive a variety of reactions. This finding not only provides a chemical explanation to the preferential attachment principle for metabolic network expansion, but also has important implications for metabolic network design and metabolite concentration prediction.
Subject(s)
Metabolic Networks and Pathways , Escherichia coli/genetics , Escherichia coli/metabolism , Escherichia coli Proteins/genetics , Escherichia coli Proteins/metabolism , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/metabolismABSTRACT
The transcriptional regulatory network (TRN) is the central pivot of a prokaryotic organism to receive, process and respond to internal and external environmental information. However, little is known about its spatial organization so far. In recent years, chromatin interaction data of bacteria such as Escherichia coli and Bacillus subtilis have been published, making it possible to study the spatial organization of bacterial transcriptional regulatory networks. By combining TRNs and chromatin interaction data of E. coli and B. subtilis, we explored the spatial organization characteristics of bacterial TRNs in many aspects such as regulation directions (positive and negative), central nodes (hubs, bottlenecks), hierarchical levels (top, middle, bottom) and network motifs (feed-forward loops and single input modules) of the TRNs and found that the bacterial TRNs have a variety of stable spatial organization features under different physiological conditions that may be closely related with biological functions. Our findings provided new insights into the connection between transcriptional regulation and the spatial organization of chromosome in bacteria and might serve as a factual foundation for trying spatial-distance-based gene circuit design in synthetic biology.
ABSTRACT
Recently, numerous genome analyses revealed the existence of a universal G:CâA:T mutation bias in bacteria, fungi, plants and animals. To explore the molecular basis for this mutation bias, we examined the three well-known DNA mutation models, i.e., oxidative damage model, UV-radiation damage model and CpG hypermutation model. It was revealed that these models cannot provide a sufficient explanation to the universal mutation bias. Therefore, we resorted to a DNA mutation model proposed by Löwdin 40 years ago, which was based on inter-base double proton transfers (DPT). Since DPT is a fundamental and spontaneous chemical process and occurs much more frequently within GC pairs than AT pairs, Löwdin model offers a common explanation for the observed universal mutation bias and thus has broad biological implications.
Subject(s)
DNA/genetics , Models, Genetic , Mutation , Animals , DNA/chemistry , ProtonsABSTRACT
Nuclear transfer embryonic stem cells (ntESCs) hold enormous promise for individual-specific regenerative medicine. However, the chromatin states of ntESCs remain poorly characterized. In this study, we employed ATAC-seq and Hi-C techniques to explore the chromatin accessibility and three-dimensional (3D) genome organization of ntESCs. The results show that the chromatin accessibility and genome structures of somatic cells are re-arranged to ESC-like states overall in ntESCs, including compartments, topologically associating domains (TADs) and chromatin loops. However, compared to fertilized ESCs (fESCs), ntESCs show some abnormal openness and structures that have not been reprogrammed completely, which impair the differentiation potential of ntESCs. The histone modification H3K9me3 may be involved in abnormal structures in ntESCs, including incorrect compartment switches and incomplete TAD rebuilding. Moreover, ntESCs and iPSCs show high similarity in 3D genome structures, while a few differences are detected due to different somatic cell origins and reprogramming mechanisms. Through systematic analyses, our study provides a global view of chromatin accessibility and 3D genome organization in ntESCs, which can further facilitate the understanding of the similarities and differences between ntESCs and fESCs.
Subject(s)
Chromatin/metabolism , Embryonic Stem Cells/metabolism , Nuclear Transfer Techniques/standards , Animals , Cell Differentiation , Female , Humans , MiceABSTRACT
A transcriptional regulatory network (TRN) is a complex network composed of all of the regulatory interactions between transcription factors and the corresponding target genes. Recently, three-dimensional (3D) genomic studies have shown that the 3D structure of the genome may influence the regulation of gene transcription, which provides us with a novel perspective. In the present study, we constructed the TRN of the budding yeast Saccharomyces cerevisiae and placed it in the context of a 3D genome model. We analyzed the spatial organization of the yeast TRN on four levels: global features, central nodes, hierarchical structure and network motifs. The results obtained suggest that the TRN of S. cerevisiae presents an optimized structure in space to adapt to functional requirements.
Subject(s)
Gene Regulatory Networks , Saccharomyces cerevisiae/genetics , Gene Expression Profiling , GenomicsABSTRACT
Phosphates are essential for modern metabolisms. A recent study reported a phosphate-free metabolic network and suggested that thioesters, rather than phosphates, could alleviate thermodynamic bottlenecks of network expansion. As a result, it was considered that a phosphorus-independent metabolism could exist before the phosphate-based genetic coding system. To explore the origin of phosphorus-dependent metabolism, the present study constructs a protometabolic network that contains phosphates prebiotically available using computational systems biology approaches. It is found that some primitive phosphorylated intermediates could greatly alleviate thermodynamic bottlenecks of network expansion. Moreover, the phosphorus-dependent metabolic network exhibits several ancient features. Taken together, it is concluded that phosphates played a role as important as that of thioesters during the origin and evolution of metabolism. Both phosphorus and sulfur are speculated to be critical to the origin of life.
ABSTRACT
By analyzing the predicted gene expression levels of 33 prokaryotes with living temperature span from <10 degrees C to >100 degrees C, a universal positive correlation was found between the percentage of predicted highly expressed genes and the organisms' optimal growth temperature. A physical interpretation of the correlation revealed that highly expressed genes are statistically more thermostable than lowly expressed genes. These findings show the possibility of the significant contribution of gene expression level to the prokaryotic thermal adaptation and provide evidence for the translational selection pressure on the thermostability of natural proteins during evolution.
Subject(s)
Adaptation, Physiological , Gene Expression , Genes, Archaeal , Genes, Bacterial , Temperature , Archaea/growth & development , Archaeal Proteins/genetics , Archaeal Proteins/metabolism , Bacteria/growth & development , Bacterial Proteins/genetics , Bacterial Proteins/metabolism , Evolution, MolecularABSTRACT
Over-annotation of hypothetical ORFs is a common phenomenon in bacterial genomes, which necessitates confirming the coding reliability of hypothetical ORFs and then predicting their functions. The important plant pathogen Erwinia carotovora subsp. atroseptica SCRI1043 (Eca1043) is a typical case because more than a quarter of its annotated ORFs are hypothetical. Our analysis focuses on annotation of Eca1043 hypothetical ORFs, and comprises two efforts: (a) based on the Z-curve method, 49 originally annotated hypothetical ORFs are recognized as noncoding, this is further supported by principal components analysis and other evidence; and (b) using sequence-alignment tools and some functional resources, more than a half of the hypothetical genes were assigned functions. The potential functions of 427 hypothetical genes are summarized according to the cluster of orthologous groups functional category. Moreover, 114 and 86 hypothetical genes are recognized as putative 'membrane proteins' and 'exported proteins', respectively. Reannotation of Eca1043 hypothetical ORFs will benefit research into the lifestyle, metabolism and pathogenicity of the important plant pathogen. Also, our study proffers a model for the reannotation of hypothetical ORFs in microbial genomes.
Subject(s)
Erwinia/classification , Erwinia/genetics , Open Reading Frames , Genes, Bacterial , Genome, Bacterial , Plant Diseases/microbiologyABSTRACT
Tracing the characters of very ancient proteins represents one of the biggest challenges in the study of origin of life. Although there are no primitive protein fossils remaining, the characters of very ancient proteins can be traced by molecular fossils embedded in modern proteins. In this paper, first the prior findings in this area are outlined and then a new strategy is proposed to address the intriguing issue. It is interesting to find that various molecular fossils and different protein datasets lead to similar conclusions on the features of very ancient proteins, which can be summarized as follows: (i) the architectures of very ancient proteins belong to the following folds: P-loop containing nucleoside triphosphate hydrolases (c.37), TIM beta/alpha-barrel (c.1), NAD(P)-binding Rossmann-fold domains (c.2), Ferredoxin-like (d.58), Flavodoxin-like (c.23) and Ribonuclease H-like motif (c.55); (ii) the functions of very ancient proteins are related to the metabolisms of purine, pyrimidine, porphyrin, chlorophyll and carbohydrates; (iii) a certain part of very ancient proteins need cofactors (such as ATP, NADH or NADPH) to work normally.