RESUMO
Studies of bacterial adaptation and evolution are hampered by the difficulty of measuring traits such as virulence, drug resistance, and transmissibility in large populations. In contrast, it is now feasible to obtain high-quality complete assemblies of many bacterial genomes thanks to scalable high-accuracy long-read sequencing technologies. To exploit this opportunity, we introduce a phenotype- and alignment-free method for discovering coselected and epistatically interacting genomic variation from genome assemblies covering both core and accessory parts of genomes. Our approach uses a compact colored de Bruijn graph to approximate the intragenome distances between pairs of loci for a collection of bacterial genomes to account for the impacts of linkage disequilibrium (LD). We demonstrate the versatility of our approach to efficiently identify associations between loci linked with drug resistance and adaptation to the hospital niche in the major human bacterial pathogens Streptococcus pneumoniae and Enterococcus faecalis.
Assuntos
Enterococcus faecalis , Epistasia Genética , Genoma Bacteriano , Streptococcus pneumoniae , Streptococcus pneumoniae/genética , Enterococcus faecalis/genética , Desequilíbrio de Ligação , Humanos , Genômica/métodosRESUMO
Bacterial genomes differ in both gene content and sequence mutations, which underlie extensive phenotypic diversity, including variation in susceptibility to antimicrobials or vaccine-induced immunity. To identify and quantify important variants, all genes within a population must be predicted, functionally annotated, and clustered, representing the "pangenome." Despite the volume of genome data available, gene prediction and annotation are currently conducted in isolation on individual genomes, which is computationally inefficient and frequently inconsistent across genomes. Here, we introduce the open-source software graph-gene-caller (ggCaller). ggCaller combines gene prediction, functional annotation, and clustering into a single workflow using population-wide de Bruijn graphs, removing redundancy in gene annotation and resulting in more accurate gene predictions and orthologue clustering. We applied ggCaller to simulated and real-world bacterial data sets containing hundreds or thousands of genomes, comparing it to current state-of-the-art tools. ggCaller has considerable speed-ups with equivalent or greater accuracy, particularly with data sets containing complex sources of error, such as assembly contamination or fragmentation. ggCaller is also an important extension to bacterial genome-wide association studies, enabling querying of annotated graphs for functional analyses. We highlight this application by functionally annotating DNA sequences with significant associations to tetracycline and macrolide resistance in Streptococcus pneumoniae, identifying key resistance determinants that were missed when using only a single reference genome. ggCaller is a novel bacterial genome analysis tool with applications in bacterial evolution and epidemiology.
Assuntos
Antibacterianos , Estudo de Associação Genômica Ampla , Farmacorresistência Bacteriana , Macrolídeos , Software , Anotação de Sequência Molecular , Genoma Bacteriano , Análise por Conglomerados , AlgoritmosRESUMO
Horizontal gene transfer (HGT) plays a critical role in the evolution and diversification of many microbial species. The resulting dynamics of gene gain and loss can have important implications for the development of antibiotic resistance and the design of vaccine and drug interventions. Methods for the analysis of gene presence/absence patterns typically do not account for errors introduced in the automated annotation and clustering of gene sequences. In particular, methods adapted from ecological studies, including the pangenome gene accumulation curve, can be misleading as they may reflect the underlying diversity in the temporal sampling of genomes rather than a difference in the dynamics of HGT. Here, we introduce Panstripe, a method based on generalized linear regression that is robust to population structure, sampling bias, and errors in the predicted presence/absence of genes. We show using simulations that Panstripe can effectively identify differences in the rate and number of genes involved in HGT events, and illustrate its capability by analyzing several diverse bacterial genome data sets representing major human pathogens.
Assuntos
Evolução Molecular , Células Procarióticas , Humanos , Filogenia , Genoma Bacteriano , Transferência Genética HorizontalRESUMO
Malaria remains a major public health problem in many countries. Unlike influenza and HIV, where diversity in immunodominant surface antigens is understood geographically to inform disease surveillance, relatively little is known about the global population structure of PfEMP1, the major variant surface antigen of the malaria parasite Plasmodium falciparum. The complexity of the var multigene family that encodes PfEMP1 and that diversifies by recombination, has so far precluded its use in malaria surveillance. Recent studies have demonstrated that cost-effective deep sequencing of the region of var genes encoding the PfEMP1 DBLα domain and subsequent classification of within host sequences at 96% identity to define unique DBLα types, can reveal structure and strain dynamics within countries. However, to date there has not been a comprehensive comparison of these DBLα types between countries. By leveraging a bioinformatic approach (jumping hidden Markov model) designed specifically for the analysis of recombination within var genes and applying it to a dataset of DBLα types from 10 countries, we are able to describe population structure of DBLα types at the global scale. The sensitivity of the approach allows for the comparison of the global dataset to ape samples of Plasmodium Laverania species. Our analyses show that the evolution of the parasite population emerging out of Africa underlies current patterns of DBLα type diversity. Most importantly, we can distinguish geographic population structure within Africa between Gabon and Ghana in West Africa and Uganda in East Africa. Our evolutionary findings have translational implications in the context of globalization. Firstly, DBLα type diversity can provide a simple diagnostic framework for geographic surveillance of the rapidly evolving transmission dynamics of P. falciparum. It can also inform efforts to understand the presence or absence of global, regional and local population immunity to major surface antigen variants. Additionally, we identify a number of highly conserved DBLα types that are present globally that may be of biological significance and warrant further characterization.
Assuntos
Antígenos de Protozoários/genética , Malária Falciparum/parasitologia , Plasmodium falciparum/genética , Proteínas de Protozoários/genética , Variação Antigênica , Evolução Molecular , Gabão , Gana , Humanos , Malária Falciparum/epidemiologia , Cadeias de Markov , Modelos Estatísticos , Domínios Proteicos , Proteínas de Protozoários/metabolismo , UgandaRESUMO
MOTIVATION: Recombination is a fundamental process in molecular evolution, and the identification of recombinant sequences is thus of major interest. However, current methods for detecting recombinants are primarily designed for aligned sequences. Thus, they struggle with analyses of highly diverse genes, such as the var genes of the malaria parasite Plasmodium falciparum, which are known to diversify primarily through recombination. RESULTS: We introduce an algorithm to detect recent recombinant sequences from a dataset without a full multiple alignment. Our algorithm can handle thousands of gene-length sequences without the need for a reference panel. We demonstrate the accuracy of our algorithm through extensive numerical simulations; in particular, it maintains its effectiveness in the presence of insertions and deletions. We apply our algorithm to a dataset of 17 335 DBLα types in var genes from Ghana, observing that sequences belonging to the same ups group or domain subclass recombine amongst themselves more frequently, and that non-recombinant DBLα types are more conserved than recombinant ones. AVAILABILITY AND IMPLEMENTATION: Source code is freely available at https://github.com/qianfeng2/detREC_program. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Variação Genética , Proteínas de Protozoários , Proteínas de Protozoários/genética , Plasmodium falciparum/genética , Software , Evolução MolecularRESUMO
SUMMARY: Homologous recombination is an important evolutionary process in bacteria and other prokaryotes, which increases genomic sequence diversity and can facilitate adaptation. Several methods and tools have been developed to detect genomic regions recently affected by recombination. Exploration and visualization of such recombination events can reveal valuable biological insights, but it remains challenging. Here, we present RCandy, a platform-independent R package for rapid, simple and flexible visualization of recombination events in bacterial genomes. AVAILABILITY AND IMPLEMENTATION: RCandy is an R package freely available for use under the MIT license. It is platform-independent and has been tested on Windows, Linux and MacOSX. The source code comes together with a detailed vignette available on GitHub at https://github.com/ChrispinChaguza/RCandy. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Genômica , Software , Genoma , Bactérias , Evolução BiológicaRESUMO
Plasmodium falciparum erythrocyte membrane protein 1 (PfEMP1), a diverse family of multidomain proteins expressed on the surface of malaria-infected erythrocytes, is an important target of protective immunity against malaria. Our group recently studied transcription of the var genes encoding PfEMP1 in individuals from Papua, Indonesia, with severe or uncomplicated malaria. We cloned and expressed domains from 32 PfEMP1s, including 22 that were upregulated in severe malaria and 10 that were upregulated in uncomplicated malaria, using a wheat germ cell-free expression system. We used Luminex technology to measure IgG antibodies to these 32 domains and control proteins in 63 individuals (11 children). At presentation to hospital, levels of antibodies to PfEMP1 domains were either higher in uncomplicated malaria or were not significantly different between groups. Using principal component analysis, antibodies to 3 of 32 domains were highly discriminatory between groups. These included two domains upregulated in severe malaria, a DBLß13 domain and a CIDRα1.6 domain (which has been previously implicated in severe malaria pathogenesis), and a DBLδ domain that was upregulated in uncomplicated malaria. Antibody to control non-PfEMP1 antigens did not differ with disease severity. Antibodies to PfEMP1 domains differ with malaria severity. Lack of antibodies to locally expressed PfEMP1 types, including both domains previously associated with severe malaria and newly identified targets, may in part explain malaria severity in Papuan adults.
Assuntos
Malária Falciparum , Malária , Adulto , Anticorpos Antiprotozoários , Criança , Eritrócitos , Humanos , Indonésia , Proteínas de Membrana/genética , Plasmodium falciparum/genética , Proteínas de Protozoários/genéticaRESUMO
The routine use of genomics for disease surveillance provides the opportunity for high-resolution bacterial epidemiology. Current whole-genome clustering and multilocus typing approaches do not fully exploit core and accessory genomic variation, and they cannot both automatically identify, and subsequently expand, clusters of significantly similar isolates in large data sets spanning entire species. Here, we describe PopPUNK (Population Partitioning Using Nucleotide K -mers), a software implementing scalable and expandable annotation- and alignment-free methods for population analysis and clustering. Variable-length k-mer comparisons are used to distinguish isolates' divergence in shared sequence and gene content, which we demonstrate to be accurate over multiple orders of magnitude using data from both simulations and genomic collections representing 10 taxonomically widespread species. Connections between closely related isolates of the same strain are robustly identified, despite interspecies variation in the pairwise distance distributions that reflects species' diverse evolutionary patterns. PopPUNK can process 103-104 genomes in a single batch, with minimal memory use and runtimes up to 200-fold faster than existing model-based methods. Clusters of strains remain consistent as new batches of genomes are added, which is achieved without needing to reanalyze all genomes de novo. This facilitates real-time surveillance with consistent cluster naming between studies and allows for outbreak detection using hundreds of genomes in minutes. Interactive visualization and online publication is streamlined through the automatic output of results to multiple platforms. PopPUNK has been designed as a flexible platform that addresses important issues with currently used whole-genome clustering and typing methods, and has potential uses across bacterial genetics and public health research.
Assuntos
Técnicas de Tipagem Bacteriana/métodos , Genoma Bacteriano , Software , Bactérias/classificação , Infecções Bacterianas/epidemiologia , Variação Genética , Genômica/métodosRESUMO
Within the human host, the malaria parasite Plasmodium falciparum is exposed to multiple selection pressures. The host environment changes dramatically in severe malaria, but the extent to which the parasite responds to-or is selected by-this environment remains unclear. From previous studies, the parasites that cause severe malaria appear to increase expression of a restricted but poorly defined subset of the PfEMP1 variant, surface antigens. PfEMP1s are major targets of protective immunity. Here, we used RNA sequencing (RNAseq) to analyse gene expression in 44 parasite isolates that caused severe and uncomplicated malaria in Papuan patients. The transcriptomes of 19 parasite isolates associated with severe malaria indicated that these parasites had decreased glycolysis without activation of compensatory pathways; altered chromatin structure and probably transcriptional regulation through decreased histone methylation; reduced surface expression of PfEMP1; and down-regulated expression of multiple chaperone proteins. Our RNAseq also identified novel associations between disease severity and PfEMP1 transcripts, domains, and smaller sequence segments and also confirmed all previously reported associations between expressed PfEMP1 sequences and severe disease. These findings will inform efforts to identify vaccine targets for severe malaria and also indicate how parasites adapt to-or are selected by-the host environment in severe malaria.
Assuntos
Antígenos de Protozoários/genética , Antígenos de Superfície/genética , Malária/parasitologia , Plasmodium falciparum/genética , Proteínas de Protozoários/genética , Transcriptoma , Regulação da Expressão Gênica , Humanos , Malária/patologia , Plasmodium falciparum/isolamento & purificação , Plasmodium falciparum/metabolismo , Análise de Sequência de RNARESUMO
We present fastbaps, a fast solution to the genetic clustering problem. Fastbaps rapidly identifies an approximate fit to a Dirichlet process mixture model (DPM) for clustering multilocus genotype data. Our efficient model-based clustering approach is able to cluster datasets 10-100 times larger than the existing model-based methods, which we demonstrate by analyzing an alignment of over 110 000 sequences of HIV-1 pol genes. We also provide a method for rapidly partitioning an existing hierarchy in order to maximize the DPM model marginal likelihood, allowing us to split phylogenetic trees into clades and subclades using a population genomic model. Extensive tests on simulated data as well as a diverse set of real bacterial and viral datasets show that fastbaps provides comparable or improved solutions to previous model-based methods, while being significantly faster. The method is made freely available under an open source MIT licence as an easy to use R package at https://github.com/gtonkinhill/fastbaps.
Assuntos
Algoritmos , Proteínas de Bactérias/classificação , Teorema de Bayes , Análise por Conglomerados , Bases de Dados de Proteínas , Proteínas do Vírus da Imunodeficiência Humana/classificação , Modelos Teóricos , Proteínas de Bactérias/genética , Biologia Computacional/métodos , Proteínas do Vírus da Imunodeficiência Humana/genética , Filogenia , Reprodutibilidade dos TestesRESUMO
Covariance-based discovery of polymorphisms under co-selective pressure or epistasis has received considerable recent attention in population genomics. Both statistical modeling of the population level covariation of alleles across the chromosome and model-free testing of dependencies between pairs of polymorphisms have been shown to successfully uncover patterns of selection in bacterial populations. Here we introduce a model-free method, SpydrPick, whose computational efficiency enables analysis at the scale of pan-genomes of many bacteria. SpydrPick incorporates an efficient correction for population structure, which adjusts for the phylogenetic signal in the data without requiring an explicit phylogenetic tree. We also introduce a new type of visualization of the results similar to the Manhattan plots used in genome-wide association studies, which enables rapid exploration of the identified signals of co-evolution. Simulations demonstrate the usefulness of our method and give some insight to when this type of analysis is most likely to be successful. Application of the method to large population genomic datasets of two major human pathogens, Streptococcus pneumoniae and Neisseria meningitidis, revealed both previously identified and novel putative targets of co-selection related to virulence and antibiotic resistance, highlighting the potential of this approach to drive molecular discoveries, even in the absence of phenotypic data.
Assuntos
Biologia Computacional/métodos , Epistasia Genética , Genoma Bacteriano/genética , Genômica , Resistência Microbiana a Medicamentos/genética , Humanos , Metagenômica/métodos , Neisseria meningitidis/genética , Neisseria meningitidis/patogenicidade , Streptococcus pneumoniae/genética , Virulência/genéticaRESUMO
Population genomics has revolutionized our ability to study bacterial evolution by enabling data-driven discovery of the genetic architecture of trait variation. Genome-wide association studies (GWAS) have more recently become accompanied by genome-wide epistasis and co-selection (GWES) analysis, which offers a phenotype-free approach to generating hypotheses about selective processes that simultaneously impact multiple loci across the genome. However, existing GWES methods only consider associations between distant pairs of loci within the genome due to the strong impact of linkage-disequilibrium (LD) over short distances. Based on the general functional organisation of genomes it is nevertheless expected that majority of co-selection and epistasis will act within relatively short genomic proximity, on co-variation occurring within genes and their promoter regions, and within operons. Here, we introduce LDWeaver, which enables an exhaustive GWES across both short- and long-range LD, to disentangle likely neutral co-variation from selection. We demonstrate the ability of LDWeaver to efficiently generate hypotheses about co-selection using large genomic surveys of multiple major human bacterial pathogen species and validate several findings using functional annotation and phenotypic measurements. Our approach will facilitate the study of bacterial evolution in the light of rapidly expanding population genomic data.
RESUMO
Streptococcus dysgalactiae subspecies equisimilis (SDSE) and Streptococcus pyogenes share skin and throat niches with extensive genomic homology and horizontal gene transfer (HGT) possibly underlying shared disease phenotypes. It is unknown if cross-species transmission interaction occurs. Here, we conduct a genomic analysis of a longitudinal household survey in remote Australian First Nations communities for patterns of cross-species transmission interaction and HGT. Collected from 4547 person-consultations, we analyse 294 SDSE and 315 S. pyogenes genomes. We find SDSE and S. pyogenes transmission intersects extensively among households and show that patterns of co-occurrence and transmission links are consistent with independent transmission without inter-species interference. We identify at least one of three near-identical cross-species mobile genetic elements (MGEs) carrying antimicrobial resistance or streptodornase virulence genes in 55 (19%) SDSE and 23 (7%) S. pyogenes isolates. These findings demonstrate co-circulation of both pathogens and HGT in communities with a high burden of streptococcal disease, supporting a need to integrate SDSE and S. pyogenes surveillance and control efforts.
Assuntos
Transferência Genética Horizontal , Sequências Repetitivas Dispersas , Infecções Estreptocócicas , Streptococcus pyogenes , Streptococcus , Streptococcus pyogenes/genética , Streptococcus pyogenes/isolamento & purificação , Streptococcus pyogenes/classificação , Infecções Estreptocócicas/transmissão , Infecções Estreptocócicas/microbiologia , Humanos , Streptococcus/genética , Streptococcus/isolamento & purificação , Sequências Repetitivas Dispersas/genética , Austrália , Genoma Bacteriano/genética , Feminino , Masculino , Criança , Características da Família , Adulto , Pré-Escolar , Adolescente , Estudos Longitudinais , Farmacorresistência Bacteriana/genética , Adulto JovemRESUMO
BACKGROUND: Nosocomial infections pose a considerable risk to patients who are susceptible, and this is particularly acute in intensive care units when hospital-associated bacteria are endemic. During the first wave of the COVID-19 pandemic, the surge of patients presented a significant obstacle to the effectiveness of infection control measures. We aimed to assess the risks and extent of nosocomial pathogen transmission under a high patient burden by designing a novel bacterial pan-pathogen deep-sequencing approach that could be integrated with standard clinical surveillance and diagnostics workflows. METHODS: We did a prospective cohort study in a region of northern Italy that was severely affected by the first wave of the COVID-19 pandemic. Inpatients on both ordinary and intensive care unit (ICU) wards at the San Matteo hospital, Pavia were sampled on multiple occasions to identify bacterial pathogens from respiratory, nasal, and rectal samples. Diagnostic samples collected between April 7 and May 10, 2020 were cultured on six different selective media designed to enrich for Acinetobacter baumannii, Escherichia coli, Enterococcus faecium, Enterococcus faecalis, Klebsiella pneumoniae, Pseudomonas aeruginosa, Staphylococcus aureus, and Streptococcus pneumoniae, and DNA from each plate with positive growth was deep sequenced en masse. We used mSWEEP and mGEMS to bin sequencing reads by sequence cluster for each species, followed by mapping with snippy to generate high quality alignments. Antimicrobial resistance genes were detected by use of ARIBA and CARD. Estimates of hospital transmission were obtained from pairwise bacterial single nucleotide polymorphism distances, partitioned by within-patient and between-patient samples. Finally, we compared the accuracy of our binned Acinetobacter baumannii genomes with those obtained by single colony whole-genome sequencing of isolates from the same hospital. FINDINGS: We recruited patients from March 1 to May 7, 2020. The pathogen population among the patients was large and diverse, with 2148 species detections overall among the 2418 sequenced samples from the 256 patients. In total, 55 sequence clusters from key pathogen species were detected at least five times. The antimicrobial resistance gene prevalence was correspondingly high, with key carbapenemase and extended spectrum ß-lactamase genes detected in at least 50 (40%) of 125 patients in ICUs. Using high-resolution mapping to infer transmission, we established that hospital transmission was likely to be a significant mode of acquisition for each of the pathogen species. Finally, comparison with single colony Acinetobacter baumannii genomes showed that the resolution offered by deep sequencing was equivalent to single-colony sequencing, with the additional benefit of detection of co-colonisation of highly similar strains. INTERPRETATION: Our study shows that a culture-based deep-sequencing approach is a possible route towards improving future pathogen surveillance and infection control at hospitals. Future studies should be designed to directly compare the accuracy, cost, and feasibility of culture-based deep sequencing with single colony whole-genome sequencing on a range of bacterial species. FUNDING: Wellcome Trust, European Research Council, Academy of Finland Flagship program, Trond Mohn Foundation, and Research Council of Norway.
Assuntos
Bactérias , COVID-19 , Infecção Hospitalar , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Itália/epidemiologia , Infecção Hospitalar/epidemiologia , Infecção Hospitalar/microbiologia , Infecção Hospitalar/transmissão , Estudos Prospectivos , COVID-19/epidemiologia , COVID-19/transmissão , Bactérias/genética , Bactérias/isolamento & purificação , SARS-CoV-2/genética , Unidades de Terapia Intensiva , Infecções Bacterianas/epidemiologia , Infecções Bacterianas/microbiologia , Infecções Bacterianas/transmissão , Infecções Bacterianas/diagnóstico , Klebsiella pneumoniae/genética , Klebsiella pneumoniae/isolamento & purificação , Masculino , Acinetobacter baumannii/genética , Acinetobacter baumannii/isolamento & purificação , Feminino , Idoso , Pessoa de Meia-IdadeRESUMO
Streptococcus dysgalactiae subsp. equisimilis (SDSE) is an emerging cause of human infection with invasive disease incidence and clinical manifestations comparable to the closely related species, Streptococcus pyogenes. Through systematic genomic analyses of 501 disseminated SDSE strains, we demonstrate extensive overlap between the genomes of SDSE and S. pyogenes. More than 75% of core genes are shared between the two species with one third demonstrating evidence of cross-species recombination. Twenty-five percent of mobile genetic element (MGE) clusters and 16 of 55 SDSE MGE insertion regions were shared across species. Assessing potential cross-protection from leading S. pyogenes vaccine candidates on SDSE, 12/34 preclinical vaccine antigen genes were shown to be present in >99% of isolates of both species. Relevant to possible vaccine evasion, six vaccine candidate genes demonstrated evidence of inter-species recombination. These findings demonstrate previously unappreciated levels of genomic overlap between these closely related pathogens with implications for streptococcal pathobiology, disease surveillance and prevention.
Assuntos
Infecções Estreptocócicas , Streptococcus , Vacinas , Humanos , Streptococcus pyogenes/genética , Fluxo GênicoRESUMO
Horizontal gene transfer (HGT) and the resulting patterns of gene gain and loss are a fundamental part of bacterial evolution. Investigating these patterns can help us to understand the role of selection in the evolution of bacterial pangenomes and how bacteria adapt to a new niche. Predicting the presence or absence of genes can be a highly error-prone process that can confound efforts to understand the dynamics of horizontal gene transfer. This review discusses both the challenges in accurately constructing a pangenome and the potential consequences errors can have on downstream analyses. We hope that by summarizing these issues researchers will be able to avoid potential pitfalls, leading to improved bacterial pangenome analyses.
Assuntos
Evolução Molecular , Células Procarióticas , Filogenia , Bactérias/genética , Transferência Genética HorizontalRESUMO
Extrachromosomal elements of bacterial cells such as plasmids are notorious for their importance in evolution and adaptation to changing ecology. However, high-resolution population-wide analysis of plasmids has only become accessible recently with the advent of scalable long-read sequencing technology. Current typing methods for the classification of plasmids remain limited in their scope which motivated us to develop a computationally efficient approach to simultaneously recognize novel types and classify plasmids into previously identified groups. Here, we introduce mge-cluster that can easily handle thousands of input sequences which are compressed using a unitig representation in a de Bruijn graph. Our approach offers a faster runtime than existing algorithms, with moderate memory usage, and enables an intuitive visualization, classification and clustering scheme that users can explore interactively within a single framework. Mge-cluster platform for plasmid analysis can be easily distributed and replicated, enabling a consistent labelling of plasmids across past, present, and future sequence collections. We underscore the advantages of our approach by analysing a population-wide plasmid data set obtained from the opportunistic pathogen Escherichia coli, studying the prevalence of the colistin resistance gene mcr-1.1 within the plasmid population, and describing an instance of resistance plasmid transmission within a hospital environment.
RESUMO
As observed in cancers, individual mutagens and defects in DNA repair create distinctive mutational signatures that combine to form context-specific spectra within cells. We reasoned that similar processes must occur in bacterial lineages, potentially allowing decomposition analysis to detect both disruption of DNA repair processes and exposure to niche-specific mutagens. Here we reconstruct mutational spectra for 84 clades from 31 diverse bacterial species and find distinct mutational patterns. We extract signatures driven by specific DNA repair defects using hypermutator lineages, and further deconvolute the spectra into multiple signatures operating within different clades. We show that these signatures are explained by both bacterial phylogeny and replication niche. By comparing mutational spectra of clades from different environmental and biological locations, we identify niche-associated mutational signatures, and then employ these signatures to infer the predominant replication niches for several clades where this was previously obscure. Our results show that mutational spectra may be associated with sites of bacterial replication when mutagen exposures differ, and can be used in these cases to infer transmission routes for established and emergent human bacterial pathogens.
Assuntos
Neoplasias , Humanos , Mutação , Neoplasias/genética , Reparo do DNA/genética , Mutagênicos , Análise Mutacional de DNA/métodosRESUMO
Here we introduce a new endpoint "census population size" to evaluate the epidemiology and control of Plasmodium falciparum infections, where the parasite, rather than the infected human host, is the unit of measurement. To calculate census population size, we rely on a definition of parasite variation known as multiplicity of infection (MOIvar), based on the hyper-diversity of the var multigene family. We present a Bayesian approach to estimate MOIvar from sequencing and counting the number of unique DBLα tags (or DBLα types) of var genes, and derive from it census population size by summation of MOIvar in the human population. We track changes in this parasite population size and structure through sequential malaria interventions by indoor residual spraying (IRS) and seasonal malaria chemoprevention (SMC) from 2012 to 2017 in an area of high-seasonal malaria transmission in northern Ghana. Following IRS, which reduced transmission intensity by > 90% and decreased parasite prevalence by ~40-50%, significant reductions in var diversity, MOIvar, and population size were observed in ~2,000 humans across all ages. These changes, consistent with the loss of diverse parasite genomes, were short lived and 32-months after IRS was discontinued and SMC was introduced, var diversity and population size rebounded in all age groups except for the younger children (1-5 years) targeted by SMC. Despite major perturbations from IRS and SMC interventions, the parasite population remained very large and retained the var population genetic characteristics of a high-transmission system (high var diversity; low var repertoire similarity) demonstrating the resilience of P. falciparum to short-term interventions in high-burden countries of sub-Saharan Africa.
RESUMO
In less than a decade, population genomics of microbes has progressed from the effort of sequencing dozens of strains to thousands, or even tens of thousands of strains in a single study. There are now hundreds of thousands of genomes available even for a single bacterial species, and the number of genomes is expected to continue to increase at an accelerated pace given the advances in sequencing technology and widespread genomic surveillance initiatives. This explosion of data calls for innovative methods to enable rapid exploration of the structure of a population based on different data modalities, such as multiple sequence alignments, assemblies and estimates of gene content across different genomes. Here, we present Mandrake, an efficient implementation of a dimensional reduction method tailored for the needs of large-scale population genomics. Mandrake is capable of visualizing population structure from millions of whole genomes, and we illustrate its usefulness with several datasets representing major pathogens. Our method is freely available both as an analysis pipeline (https://github.com/johnlees/mandrake) and as a browser-based interactive application (https://gtonkinhill.github.io/mandrake-web/). This article is part of a discussion meeting issue 'Genomic population structures of microbial pathogens'.