RESUMEN
The discordant prevalence of Helicobacter pylori and its related diseases, for a long time, fostered certain enigmatic situations observed in the countries of the southern world. Variation in H. pylori infection rates and disease outcomes among different populations in multi-ethnic Malaysia provides a unique opportunity to understand dynamics of host-pathogen interaction and genome evolution. In this study, we extensively analyzed and compared genomes of 27 Malaysian H. pylori isolates and identified three major phylogeographic lineages: hspEastAsia, hpEurope and hpSouthIndia. The analysis of the virulence genes within the core genome, however, revealed a comparable pathogenic potential of the strains. In addition, we identified four genes limited to strains of East-Asian lineage. Our analyses identified a few strain-specific genes encoding restriction modification systems and outlined 311 core genes possibly under differential evolutionary constraints, among the strains representing different ethnic groups. The cagA and vacA genes also showed variations in accordance with the host genetic background of the strains. Moreover, restriction modification genes were found to be significantly enriched in East-Asian strains. An understanding of these variations in the genome content would provide significant insights into various adaptive and host modulation strategies harnessed by H. pylori to effectively persist in a host-specific manner.
Asunto(s)
Genoma Bacteriano , Helicobacter pylori/genética , Enzimas de Restricción-Modificación del ADN/genética , Evolución Molecular , Genes Bacterianos , Genómica , Helicobacter pylori/clasificación , Helicobacter pylori/aislamiento & purificación , Helicobacter pylori/patogenicidad , Humanos , Malasia , Filogenia , Filogeografía , VirulenciaRESUMEN
The origin of Yersinia pestis and the early stages of its evolution are fundamental subjects of investigation given its high virulence and mortality that resulted from past pandemics. Although the earliest evidence of Y. pestis infections in humans has been identified in Late Neolithic/Bronze Age Eurasia (LNBA 5000-3500y BP), these strains lack key genetic components required for flea adaptation, thus making their mode of transmission and disease presentation in humans unclear. Here, we reconstruct ancient Y. pestis genomes from individuals associated with the Late Bronze Age period (~3800 BP) in the Samara region of modern-day Russia. We show clear distinctions between our new strains and the LNBA lineage, and suggest that the full ability for flea-mediated transmission causing bubonic plague evolved more than 1000 years earlier than previously suggested. Finally, we propose that several Y. pestis lineages were established during the Bronze Age, some of which persist to the present day.
Asunto(s)
ADN Antiguo/análisis , Genoma Bacteriano/genética , Peste/transmisión , Yersinia pestis/genética , Animales , Pulpa Dental/microbiología , Infestaciones por Pulgas/epidemiología , Infestaciones por Pulgas/microbiología , Infestaciones por Pulgas/transmisión , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Pandemias , Filogenia , Peste/epidemiología , Peste/microbiología , Polimorfismo de Nucleótido Simple , Federación de Rusia/epidemiología , Siphonaptera/microbiología , Virulencia/genética , Yersinia pestis/clasificación , Yersinia pestis/patogenicidadRESUMEN
A wide variety of genome sequencing platforms have emerged in the recent past. High-throughput platforms like Illumina and 454 are essentially adaptations of the shotgun approach generating millions of fragmented single or paired sequencing reads. To reconstruct whole genomes, the reads have to be assembled into contigs, which often require further downstream processing. The contigs can be directly ordered according to a reference, scaffolded based on paired read information, or assembled using a combination of the two approaches. While the reference-based approach appears to mask strain-specific information, scaffolding based on paired-end information suffers when repetitive elements longer than the size of the sequencing reads are present in the genome. Sequencing technologies that produce long reads can solve the problems associated with repetitive elements but are not necessarily easily available to researchers. The most common high-throughput technology currently used is the Illumina short read platform. To improve upon the shortcomings associated with the construction of draft genomes with Illumina paired-end sequencing, we developed Contig-Layout-Authenticator (CLA). The CLA pipeline can scaffold reference-sorted contigs based on paired reads, resulting in better assembled genomes. Moreover, CLA also hints at probable misassemblies and contaminations, for the users to cross-check before constructing the consensus draft. The CLA pipeline was designed and trained extensively on various bacterial genome datasets for the ordering and scaffolding of large repetitive contigs. The tool has been validated and compared favorably with other widely-used scaffolding and ordering tools using both simulated and real sequence datasets. CLA is a user friendly tool that requires a single command line input to generate ordered scaffolds.