ABSTRACT
Archaea are a vast and unexplored cellular domain that thrive in a high diversity of environments, having central roles in processes mediating global carbon and nutrient fluxes. For these organisms to balance their metabolism, the appropriate regulation of their gene expression is essential. A key momentum in regulating genes responsible for the life maintenance of archaea is when transcription factor proteins bind to the promoter element. This DNA segment is conserved, which enables its exploration by machine learning techniques. Here, we trained and tested a support vector machine with 3935 known archaeal promoter sequences. All promoter sequences were coded into DNA Duplex Stability. After, we performed a model interpretation task to map the decision pattern of the classification procedure. We also used a dataset of known-promoter sequences for validation. Our results showed that an AT rich region around position - 27 upstream (relative to the start TSS) is the most conserved in the analyzed organisms. In addition, we were able to identify the BRE element (- 33), the PPE (at - 10) and a position at + 3, that provides a more understandable picture of how promoters are organized in all the archaeal organisms. Finally, we used the interpreted model to identify potential promoter sequences of 135 unannotated organisms, delivering regulatory regions annotation of archaea in a scale never accomplished before ( https://pcyt.unam.mx/gene-regulation/ ). We consider that this approach will be useful to understand how gene regulation is achieved in other organisms apart from the already established transcription factor binding sites.
Subject(s)
Artificial Intelligence , Machine Learning , Archaea/genetics , Promoter Regions, Genetic , Transcription Factors/geneticsABSTRACT
A new variant of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), named Omicron (Pango lineage designation B.1.1.529), was first reported to the World Health Organization by South African health authorities on 24 November 2021. The Omicron variant possesses numerous mutations associated with increased transmissibility and immune escape properties. In November 2021, Mexican authorities reported Omicron's presence in the country. In this study, we infer the first introductory events of Omicron and the impact that human mobility has had on the spread of the virus. We also evaluated the adaptive evolutionary processes in Mexican SARS-CoV-2 genomes during the first month of the circulation of Omicron. We inferred 160 introduction events of Omicron in Mexico since its first detection in South Africa; subsequently, after the first introductions there was an evident increase in the prevalence of SARS-CoV-2 during January. This higher prevalence of the novel variant resulted in a peak of reported cases; on average 6 weeks after, a higher mobility trend was reported. During the peak of cases in the country from January to February 2022, the Omicron BA.1.1 sub-lineage dominated, followed by the BA.1 and BA.15 sub-lineages. Additionally, we identified the presence of diversifying natural selection in the genomes of Omicron and found six non-synonymous mutations in the receptor binding domain of the spike protein, all of them related to evasion of the immune response. In contrast, the other proteins in the genome are highly conserved; however, we identified homoplasic mutations in non-structural proteins, indicating a parallel evolution.