ABSTRACT
DNA-binding transcription factors (TFs) play a central role in the gene expression of all organisms, from viruses to humans, including bacteria and archaea. The role of these proteins is the fate of gene expression in the context of environmental challenges. Because thousands of genomes have been sequenced to date, predictions of the encoded proteins are validated through the use of bioinformatics tools to obtain the necessary experimental, posterior knowledge. In this chapter, we describe three approaches to identify TFs in protein sequences. The first approach integrates the results of sequence comparisons and PFAM assignments, using as reference a manually curated collection of TFs. The second approach considers the prediction of DNA-binding structures, such as the classical helix-turn-helix (HTH); and the third approach considers a deep learning model. We suggest that all approaches must be considered together to increase the possibility of identifying new TFs in bacterial and archaeal genomes.
Subject(s)
Genome, Archaeal , Transcription Factors , Archaea/metabolism , Bacteria/metabolism , DNA/metabolism , Genome, Archaeal/genetics , Genome, Bacterial , Humans , Transcription Factors/metabolismABSTRACT
A phylogenomic and functional analysis of the first two Crenarchaeota MAGs belonging to El Tatio geysers fields in Chile is reported. A soil sample contiguous to a geothermal activity exposed lagoon of El Tatio was used for shotgun sequencing. Afterwards, contigs were binned into individual population-specific genomes data. A phylogenetic placement was carried out for both MAG 9-5TAT and MAG 47-5TAT. Then functional comparisons and metabolic reconstruction were carried out. Results showed that both MAG 9-5TAT and MAG 47-5TAT likely represent new species in the genus Thermoproteus and the genus Sulfolobus, respectively. These findings provide new insights into the phylogenetic and genomic diversity for archaea species that inhabit the El Tatio geysers field and expand the understanding of the Crenarchaeota phylum diversity.
Subject(s)
Archaea/genetics , Crenarchaeota/genetics , Genome, Archaeal/genetics , Metagenome/genetics , Metagenomics/methods , PhylogenyABSTRACT
Repairing DNA damage is one of the most important functions of the 'housekeeping' proteins, as DNA molecules are constantly subject to different kinds of damage. An important mechanism of DNA repair is the mismatch repair system (MMR). In eukaryotes, it is more complex than it is in bacteria or Archaea due to an inflated number of paralogues produced as a result of an extensive process of gene duplication and further specialization upon the evolution of the first eukaryotes, including an important part of the meiotic machinery. Recently, the discovery and sequencing of Asgard Archaea allowed us to revisit the MMR system evolution with the addition of new data from a group that is closely related to the eukaryotic ancestor. This new analysis provided evidence for a complex evolutionary history of eukaryotic MMR: an archaeal origin for the nuclear MMR system in eukaryotes, with subsequent acquisitions of other MMR systems from organelles.
Subject(s)
DNA Mismatch Repair , Eukaryota , Archaea/genetics , DNA Mismatch Repair/genetics , Eukaryota/genetics , Eukaryotic Cells , Genome, Archaeal/geneticsABSTRACT
The ability of bacteria and archaea to modulate metabolic process, defensive response, and pathogenic capabilities depend on their repertoire of genes and capacity to regulate the expression of them. Transcription factors (TFs) have fundamental roles in controlling these processes. TFs are proteins dedicated to favor and/or impede the activity of the RNA polymerase. In prokaryotes these proteins have been grouped into families that can be found in most of the different taxonomic divisions. In this work, the association between the expansion patterns of 111 protein regulatory families was systematically evaluated in 1351 non-redundant prokaryotic genomes. This analysis provides insights into the functional and evolutionary constraints imposed on different classes of regulatory factors in bacterial and archaeal organisms. Based on their distribution, we found a relationship between the contents of some TF families and genome size. For example, nine TF families that represent 43.7% of the complete collection of TFs are closely associated with genome size; i.e., in large genomes, members of these families are also abundant, but when a genome is small, such TF family sizes are decreased. In contrast, almost 102 families (56.3% of the collection) do not exhibit or show only a low correlation with the genome size, suggesting that a large proportion of duplication or gene loss events occur independently of the genome size and that various yet-unexplored questions about the evolution of these TF families remain. In addition, we identified a group of families that have a similar distribution pattern across Bacteria and Archaea, suggesting common functional and probable coevolution processes, and a group of families universally distributed among all the genomes. Finally, a specific association between the TF families and their additional domains was identified, suggesting that the families sense specific signals or make specific protein-protein contacts to achieve the regulatory roles.
Subject(s)
Prokaryotic Cells/metabolism , Transcription Factors/analysis , Transcription Factors/genetics , Archaea/genetics , Bacteria/genetics , DNA/genetics , DNA-Binding Proteins , Genome Size/genetics , Genome, Archaeal/genetics , Genome, Bacterial/genetics , Genomics/methods , Protein Binding , Transcriptome/geneticsABSTRACT
BACKGROUND: Crl, identified for curli production, is a small transcription factor that stimulates the association of the σS factor (RpoS) with the RNA polymerase core through direct and specific interactions, increasing the transcription rate of genes during the transition from exponential to stationary phase at low temperatures, using indole as an effector molecule. The lack of a comprehensive collection of information on the Crl regulon makes it difficult to identify a dominant function of Crl and to generate any hypotheses concerning its taxonomical distribution in archaeal and bacterial organisms. RESULTS: In this work, based on a systematic literature review, we identified the first comprehensive dataset of 86 genes under the control of Crl in the bacterium Escherichia coli K-12; those genes correspond to 40% of the σS regulon in this bacterium. Based on an analysis of orthologs in 18 archaeal and 69 bacterial taxonomical divisions and using E. coli K-12 as a framework, we suggest three main events that resulted in this regulon's actual form: (i) in a first step, rpoS, a gene widely distributed in bacteria and archaea cellular domains, was recruited to regulate genes involved in ancient metabolic processes, such as those associated with glycolysis and the tricarboxylic acid cycle; (ii) in a second step, the regulon recruited those genes involved in metabolic processes, which are mainly taxonomically constrained to Proteobacteria, with some secondary losses, such as those genes involved in responses to stress or starvation and cell adhesion, among others; and (iii) in a posterior step, Crl might have been recruited in Enterobacteriaceae; because its taxonomical pattern constrained to this bacterial order, however further analysis are necessary. CONCLUSIONS: Therefore, we suggest that the regulon Crl is highly flexible for phenotypic adaptation, probably as consequence of the diverse growth environments associated with all organisms in which members of this regulatory network are present.
Subject(s)
Genome, Archaeal/genetics , Genome, Bacterial/genetics , Phylogeny , Regulon/genetics , Evolution, MolecularABSTRACT
Antisense RNAs (asRNAs) are present in diverse organisms and play important roles in gene regulation. In this work, we mapped the primary antisense transcriptome in the halophilic archaeon Halobacterium salinarum NRC-1. By reanalyzing publicly available data, we mapped antisense transcription start sites (aTSSs) and inferred the probable 3' ends of these transcripts. We analyzed the resulting asRNAs according to the size, location, function of genes on the opposite strand, expression levels and conservation. We show that at least 21% of the genes contain asRNAs in H. salinarum. Most of these asRNAs are expressed at low levels. They are located antisense to genes related to distinctive characteristics of H. salinarum, such as bacteriorhodopsin, gas vesicles, transposases and other important biological processes such as translation. We provide evidence to support asRNAs in type II toxinâ»antitoxin systems in archaea. We also analyzed public Ribosome profiling (Ribo-seq) data and found that ~10% of the asRNAs are ribosome-associated non-coding RNAs (rancRNAs), with asRNAs from transposases overrepresented. Using a comparative transcriptomics approach, we found that ~19% of the asRNAs annotated in H. salinarum belong to genes with an ortholog in Haloferax volcanii, in which an aTSS could be identified with positional equivalence. This shows that most asRNAs are not conserved between these halophilic archaea.
Subject(s)
Gene Expression Profiling , Halobacterium salinarum/genetics , RNA, Antisense/genetics , Transcriptome/genetics , Gene Expression Regulation, Archaeal/genetics , Genome, Archaeal/genetics , RNA, Untranslated/genetics , Ribosomes/genetics , Transcription Initiation SiteABSTRACT
BACKGROUND: Shared traits between prokaryotes and eukaryotes are helpful in the understanding of the tree of life evolution. In bacteria and eukaryotes, it has been shown a particular organisation of tRNA genes as clusters, but this trait has not been explored in the archaea domain. OBJECTIVE: Explore the occurrence of tRNA gene clusters in archaea. METHODS: In-silico analyses of complete and draft archaeal genomes based on tRNA gene isotype and synteny, tRNA gene cluster content and mobilome elements. FINDINGS: We demonstrated the prevalence of tRNA gene clusters in archaea. tRNA gene clusters, composed of archaeal-type tRNAs, were identified in two Archaea class, Halobacteria and Methanobacteria from Euryarchaeota supergroup. Genomic analyses also revealed evidence of the association between tRNA gene clusters to mobile genetic elements and intra-domain horizontal gene transfer. MAIN CONCLUSIONS: tRNA gene cluster occurs in the three domains of life, suggesting a role of this type of tRNA gene organisation in the biology of the living organisms.
Subject(s)
Archaea/genetics , Genome, Archaeal/genetics , Multigene Family/genetics , RNA, Archaeal/genetics , RNA, Transfer/genetics , Evolution, Molecular , Phylogeny , Sequence AlignmentABSTRACT
The phylogenetic affiliations of organisms responsible for aerobic CO oxidation in hypersaline soils and sediments were assessed using media containing 3.8 M NaCl. CO-oxidizing strains of the euryarchaeotes, Haloarcula, Halorubrum, Haloterrigena and Natronorubrum, were isolated from the Bonneville Salt Flats (UT) and Atacama Desert salterns (Chile). A halophilic euryarchaeote, Haloferax strain Mke2.3(T), was isolated from Hawai'i Island saline cinders. Haloferax strain Mke2.3(T) was most closely related to Haloferax larsenii JCM 13917(T) (97.0% 16S rRNA sequence identity). It grew with a limited range of substrates, and oxidized CO at a headspace concentration of 0.1%. However, it did not grow with CO as a sole carbon and energy source. Its ability to oxidize CO, its polar lipid composition, substrate utilization and numerous other traits distinguished it from H. larsenii JCM 13917(T), and supported designation of the novel isolate as Haloferax namakaokahaiae Mke2.3(T), sp. nov (= DSM 29988, = LMG 29162). CO oxidation was also documented for 'Natronorubrum thiooxidans' HG1 (Sorokin, Tourova and Muyzer 2005), N. bangense (Xu, Zhou and Tian 1999) and N. sulfidifaciens AD2(T) (Cui et al. 2007). Collectively, these results established a previously unsuspected capacity for extremely halophilic aerobic CO oxidation, and indicated that the trait might be widespread among the Halobacteriaceae, and occur in a wide range of hypersaline habitats.
Subject(s)
Carbon Monoxide/metabolism , Haloferax , Salinity , Sodium Chloride/metabolism , Soil Microbiology , Aerobiosis , Chile , DNA, Ribosomal/genetics , Genome, Archaeal/genetics , Geologic Sediments/microbiology , Haloferax/genetics , Haloferax/isolation & purification , Haloferax/metabolism , Oxidation-Reduction , Phylogeny , RNA, Ribosomal, 16S/genetics , Soil/chemistryABSTRACT
Hadal ecosystems are found at a depth of 6,000 m below sea level and below, occupying less than 1% of the total area of the ocean. The microbial communities and metabolic potential in these ecosystems are largely uncharacterized. Here, we present four single amplified genomes (SAGs) obtained from 8,219 m below the sea surface within the hadal ecosystem of the Puerto Rico Trench (PRT). These SAGs are derived from members of deep-sea clades, including the Thaumarchaeota and SAR11 clade, and two are related to previously isolated piezophilic (high-pressure-adapted) microorganisms. In order to identify genes that might play a role in adaptation to deep-sea environments, comparative analyses were performed with genomes from closely related shallow-water microbes. The archaeal SAG possesses genes associated with mixotrophy, including lipoylation and the glycine cleavage pathway. The SAR11 SAG encodes glycolytic enzymes previously reported to be missing from this abundant and cosmopolitan group. The other SAGs, which are related to piezophilic isolates, possess genes that may supplement energy demands through the oxidation of hydrogen or the reduction of nitrous oxide. We found evidence for potential trench-specific gene distributions, as several SAG genes were observed only in a PRT metagenome and not in shallower deep-sea metagenomes. These results illustrate new ecotype features that might perform important roles in the adaptation of microorganisms to life in hadal environments.
Subject(s)
Archaea/classification , Archaea/genetics , Genome, Archaeal/genetics , Metagenome/genetics , Seawater/microbiology , Acclimatization , Archaea/isolation & purification , Base Sequence , DNA, Archaeal/genetics , Ecosystem , Energy Metabolism/physiology , Fatty Acids/metabolism , Lipids/biosynthesis , Molecular Sequence Data , Oceans and Seas , Puerto Rico , RNA, Ribosomal, 16S/genetics , Sequence Analysis, DNA , Sulfur/metabolism , Water MicrobiologyABSTRACT
All organisms that have been studied until now have been found to have differential distribution of simple sequence repeats (SSRs), with more SSRs in intergenic than in coding sequences. SSR distribution was investigated in Archaea genomes where complete chromosome sequences of 19 Archaea were analyzed with the program SPUTNIK to find di- to penta-nucleotide repeats. The number of repeats was determined for the complete chromosome sequences and for the coding and non-coding sequences. Different from what has been found for other groups of organisms, there is an abundance of SSRs in coding regions of the genome of some Archaea. Dinucleotide repeats were rare and CG repeats were found in only two Archaea. In general, trinucleotide repeats are the most abundant SSR motifs; however, pentanucleotide repeats are abundant in some Archaea. Some of the tetranucleotide and pentanucleotide repeat motifs are organism specific. In general, repeats are short and CG-rich repeats are present in Archaea having a CG-rich genome. Among the 19 Archaea, SSR density was not correlated with genome size or with optimum growth temperature. Pentanucleotide density had an inverse correlation with the CG content of the genome.
Subject(s)
Archaea/genetics , Chromosome Mapping , Genome, Archaeal/genetics , Microsatellite Repeats/genetics , Base Sequence , Molecular Sequence DataABSTRACT
All organisms that have been studied until now have been found to have differential distribution of simple sequence repeats (SSRs), with more SSRs in intergenic than in coding sequences. SSR distribution was investigated in Archaea genomes where complete chromosome sequences of 19 Archaea were analyzed with the program SPUTNIK to find di- to penta-nucleotide repeats. The number of repeats was determined for the complete chromosome sequences and for the coding and non-coding sequences. Different from what has been found for other groups of organisms, there is an abundance of SSRs in coding regions of the genome of some Archaea. Dinucleotide repeats were rare and CG repeats were found in only two Archaea. In general, trinucleotide repeats are the most abundant SSR motifs; however, pentanucleotide repeats are abundant in some Archaea. Some of the tetranucleotide and pentanucleotide repeat motifs are organism specific. In general, repeats are short and CG-rich repeats are present in Archaea having a CG-rich genome. Among the 19 Archaea, SSR density was not correlated with genome size or with optimum growth temperature. Pentanucleotide density had an inverse correlation with the CG content of the genome.
Subject(s)
Archaea/genetics , Chromosome Mapping , Genome, Archaeal/genetics , Microsatellite Repeats/genetics , Base Sequence , Molecular Sequence DataABSTRACT
BACKGROUND: The essential trace element selenium is used in a wide variety of biological processes. Selenocysteine (Sec), the 21st amino acid, is co-translationally incorporated into a restricted set of proteins. It is encoded by an UGA codon with the help of tRNASec (SelC), Sec-specific elongation factor (SelB) and a cis-acting mRNA structure (SECIS element). In addition, Sec synthase (SelA) and selenophosphate synthetase (SelD) are involved in the biosynthesis of Sec on the tRNASec. Selenium is also found in the form of 2-selenouridine, a modified base present in the wobble position of certain tRNAs, whose synthesis is catalyzed by YbbB using selenophosphate as a precursor. RESULTS: We analyzed completely sequenced genomes for occurrence of the selA, B, C, D and ybbB genes. We found that selB and selC are gene signatures for the Sec-decoding trait. However, selD is also present in organisms that do not utilize Sec, and shows association with either selA, B, C and/or ybbB. Thus, selD defines the overall selenium utilization. A global species map of Sec-decoding and 2-selenouridine synthesis traits is provided based on the presence/absence pattern of selenium-utilization genes. The phylogenies of these genes were inferred and compared to organismal phylogenies, which identified horizontal gene transfer (HGT) events involving both traits. CONCLUSION: These results provide evidence for the ancient origin of these traits, their independent maintenance, and a highly dynamic evolutionary process that can be explained as the result of speciation, differential gene loss and HGT. The latter demonstrated that the loss of these traits is not irreversible as previously thought.