RESUMEN
Genomic characterization of an Escherichia coli O157:H7 strain linked to leafy greens-associated outbreaks dates its emergence to late 2015. One clade has notable accessory genomic content and a previously described mutation putatively associated with increased arsenic tolerance. This strain is a reoccurring, emerging, or persistent strain causing illness over an extended period.
Asunto(s)
Escherichia coli O157 , Escherichia coli O157/genética , Brotes de Enfermedades , Genómica , MutaciónRESUMEN
Cronobacter sakazakii, a species of gram-negative bacteria belonging to the Enterobacteriaceae family, is known to cause severe and often fatal meningitis and sepsis in young infants. C. sakazakii is ubiquitous in the environment, and most reported infant cases have been attributed to contaminated powdered infant formula (powdered formula) or breast milk that was expressed using contaminated breast pump equipment (1-3). Previous investigations of cases and outbreaks have identified C. sakazakii in opened powdered formula, breast pump parts, environmental surfaces in the home, and, rarely, in unopened powdered formula and formula manufacturing facilities (2,4-6). This report describes two infants with C. sakazakii meningitis reported to CDC in September 2021 and February 2022. CDC used whole genome sequencing (WGS) analysis to link one case to contaminated opened powdered formula from the patient's home and the other to contaminated breast pump equipment. These cases highlight the importance of expanding awareness about C. sakazakii infections in infants, safe preparation and storage of powdered formula, proper cleaning and sanitizing of breast pump equipment, and using WGS as a tool for C. sakazakii investigations.
Asunto(s)
Cronobacter sakazakii , Infecciones por Enterobacteriaceae , Femenino , Lactante , Humanos , Fórmulas Infantiles , Cronobacter sakazakii/genética , Infecciones por Enterobacteriaceae/diagnóstico , Enterobacteriaceae , Leche Humana , PolvosRESUMEN
Vibrio parahaemolyticus is the leading cause of seafood-related foodborne illness globally. In 2018, the U.S. federal, state, and local public health and regulatory partners investigated a multistate outbreak of V. parahaemolyticus infections linked to crabmeat that resulted in 26 ill people and nine hospitalizations. State and U.S. Food and Drug Administration (FDA) laboratories recovered V. parahaemolyticus, Salmonella spp., and Listeria monocytogenes isolates from crabmeat samples collected from various points of distribution and conducted phylogenetic analyses of whole-genome sequencing data. Federal, state, and local partners conducted traceback investigations to determine the source of crabmeat. Multiple Venezuelan processors that supplied various brands of crabmeat were identified, but a sole firm was not confirmed as the source of the outbreak. Travel restrictions between the United States and Venezuela prevented FDA officials from conducting on-site inspections of cooked crabmeat processors. Based on investigation findings, partners developed public communications advising consumers not to eat crabmeat imported from Venezuela and placed potentially implicated firms on import alerts. While some challenges limited the scope of the investigation, epidemiologic, traceback, and laboratory evidence identified the contaminated food and country of origin, and contributed to public health and regulatory actions, preventing additional illnesses. This multistate outbreak illustrates the importance of adhering to appropriate food safety practices and regulations for imported seafood.
Asunto(s)
Enfermedades Transmitidas por los Alimentos , Vibriosis , Vibrio parahaemolyticus , Humanos , Estados Unidos/epidemiología , Filogenia , Venezuela/epidemiología , Enfermedades Transmitidas por los Alimentos/epidemiología , Vibriosis/epidemiología , Brotes de EnfermedadesRESUMEN
Enzymatic library preparation kits are increasingly used for bacterial whole genome sequencing. While they offer a rapid workflow, the transposases used in the kits are recognized to be somewhat biased. The aim of this study was to optimize and validate a protocol for the Illumina DNA Prep kit (formerly Nextera DNA Flex) for sequencing enteric pathogens and compare its performance against the Nextera XT kit. One hundred forty-three strains of Campylobacter, Escherichia, Listeria, Salmonella, Shigella, and Vibrio were prepared with both methods and sequenced on the Illumina MiSeq using 300 and/or 500 cycle chemistries. Sequences were compared using core genome multilocus sequence typing (cgMLST), 7-gene multilocus sequence typing (MLST), and detection of markers encoding serotype, virulence, and antimicrobial resistance. Sequences for one Escherichia strain were downsampled to determine the minimum coverage required for the analyses. While organism-specific differences were observed, the Prep libraries generated longer average read lengths and less fragmented assemblies compared to the XT libraries. In downstream analysis, the most notable difference between the kits was observed for Escherichia, particularly for the 300 cycle sequences. The O group was not predicted in 32% and 4% of XT sequences when using blast and kmer algorithms, respectively, while the O group was predicted from all Prep sequences regardless of the algorithm. In addition, the ehxA gene was not detected in 6% of XT sequences and 34% were missing one or more of the type III secretion systems and/or plasmid-associated genes, which were detected in the Prep sequences. The coverage downsampling revealed that acceptable assembly quality and allele detection was achieved at 30 × coverage with the Prep libraries, whereas 40-50 × coverage was required for the XT libraries. The better performance of the Prep libraries was attributed to more even coverage, particularly in genome regions low in GC content.
Asunto(s)
Microbioma Gastrointestinal , Genoma Bacteriano , ADN , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Tipificación de Secuencias MultilocusRESUMEN
Single-nucleotide polymorphisms (SNPs) are widely used for whole-genome sequencing (WGS)-based subtyping of foodborne pathogens in outbreak and source tracking investigations. Mobile genetic elements (MGEs) are commonly present in bacterial genomes and may affect SNP subtyping results if their evolutionary history and dynamics differ from that of the bacterial chromosomes. Using Salmonella enterica as a model organism, we surveyed major categories of MGEs, including plasmids, phages, insertion sequences, integrons, and integrative and conjugative elements (ICEs), in 990 genomes representing 21 major serotypes of S. enterica We evaluated whether plasmids and chromosomal MGEs affect SNP subtyping with 9 outbreak clusters of different serotypes found in the United States in 2018. The median total length of chromosomal MGEs accounted for 2.5% of a typical S. enterica chromosome. Of the 990 analyzed S. enterica isolates, 68.9% contained at least one assembled plasmid sequence. The median total length of assembled plasmids in these isolates was 93,671 bp. Plasmids that carry high densities of SNPs were found to substantially affect both SNP phylogenies and SNP distances among closely related isolates if they were present in the reference genome for SNP subtyping. In comparison, chromosomal MGEs were found to have limited impact on SNP subtyping. We recommend the identification of plasmid sequences in the reference genome and the exclusion of plasmid-borne SNPs from SNP subtyping analysis.IMPORTANCE Despite increasingly routine use of WGS and SNP subtyping in outbreak and source tracking investigations, whether and how MGEs affect SNP subtyping has not been thoroughly investigated. Besides chromosomal MGEs, plasmids are frequently entangled in draft genome assemblies and yet to be assessed for their impact on SNP subtyping. This study provides evidence-based guidance on the treatment of MGEs in SNP analysis for Salmonella to infer phylogenetic relationship and SNP distance between isolates.
Asunto(s)
Secuencias Repetitivas Esparcidas/genética , Polimorfismo de Nucleótido Simple , Salmonella enterica/clasificación , Salmonella enterica/genética , Cromosomas Bacterianos , Brotes de Enfermedades , Genoma Bacteriano , Humanos , Filogenia , Plásmidos/aislamiento & purificación , Serogrupo , Secuenciación Completa del GenomaRESUMEN
Foodborne salmonellosis causes an estimated 1 million illnesses and 400 deaths annually in the United States (1). Salmonella Anatum is one of the top 20 Salmonella serotypes in the United States. During 2013-2015 there were approximately 300-350 annual illnesses reported to PulseNet, the national molecular subtyping network for foodborne disease surveillance. In June 2016, PulseNet identified a cluster of 16 Salmonella Anatum infections with an indistinguishable pulsed-field gel electrophoresis (PFGE) pattern from four states.* In April 2016, the same PFGE pattern had been uploaded to PulseNet from an isolate obtained from an Anaheim pepper, a mild to medium hot pepper. Hot peppers include many pepper varieties, such as Anaheim, jalapeño, poblano, and serrano, which can vary in heat level from mild to very hot depending on the variety and preparation. This rare PFGE pattern had been seen only 24 times previously in the PulseNet database, compared with common PFGE patterns for this serotype which have been seen in the database hundreds of times. Local and state health departments, CDC, and the Food and Drug Administration (FDA) investigated to determine the cause of the outbreak. Thirty-two patients in nine states were identified with illness onsets from May 6-July 9, 2016. Whole-genome sequencing (WGS) was performed to characterize clinical isolates and the Anaheim pepper isolate further. The combined evidence indicated that fresh hot peppers were the likely source of infection; however, a single pepper type or source farm was not identified. This outbreak highlights challenges in reconciling epidemiologic and WGS data, and the difficulties of identifying ingredient-level exposures through epidemiologic investigations alone.
Asunto(s)
Capsicum/microbiología , Comercio , Brotes de Enfermedades , Intoxicación Alimentaria por Salmonella/epidemiología , Salmonella/aislamiento & purificación , Adolescente , Adulto , Anciano , Niño , Preescolar , Femenino , Microbiología de Alimentos , Humanos , Masculino , Persona de Mediana Edad , Salmonella/genética , Estados Unidos/epidemiología , Adulto JovenRESUMEN
In April 2016, PulseNet, the national molecular subtyping network for foodborne disease surveillance, detected a multistate cluster of Salmonella enterica serotype Oslo infections with an indistinguishable pulsed-field gel electrophoresis (PFGE) pattern (XbaI PFGE pattern OSLX01.0090).* This PFGE pattern was new in the database; no previous infections or outbreaks have been identified. CDC, state and local health and agriculture departments and laboratories, and the Food and Drug Administration (FDA) conducted epidemiologic, traceback, and laboratory investigations to identify the source of this outbreak. A total of 14 patients in eight states were identified, with illness onsets occurring during March 21-April 9, 2016. Whole genome sequencing, a highly discriminating subtyping method, was used to further characterize PFGE pattern OSLX01.0090 isolates. Epidemiologic evidence indicates Persian cucumbers as the source of Salmonella Oslo infections in this outbreak. This is the fourth identified multistate outbreak of salmonellosis associated with cucumbers since 2013. Further research is needed to understand the mechanism and factors that contribute to contamination of cucumbers during growth, harvesting, and processing to prevent future outbreaks.
Asunto(s)
Cucumis sativus/microbiología , Brotes de Enfermedades , Intoxicación Alimentaria por Salmonella/epidemiología , Adolescente , Adulto , Anciano , Niño , Preescolar , Femenino , Microbiología de Alimentos , Humanos , Masculino , Persona de Mediana Edad , Salmonella/aislamiento & purificación , Estados Unidos/epidemiología , Adulto JovenRESUMEN
Campylobacter is a leading causing of bacterial foodborne and zoonotic illnesses in the USA. Pulsed-field gene electrophoresis (PFGE) and 7-gene multilocus sequence typing (MLST) have been historically used to differentiate sporadic from outbreak Campylobacter isolates. Whole genome sequencing (WGS) has been shown to provide superior resolution and concordance with epidemiological data when compared with PFGE and 7-gene MLST during outbreak investigations. In this study, we evaluated epidemiological concordance for high-quality SNP (hqSNP), core genome (cg)MLST and whole genome (wg)MLST to cluster or differentiate outbreak-associated and sporadic Campylobacter jejuni and Campylobacter coli isolates. Phylogenetic hqSNP, cgMLST and wgMLST analyses were also compared using Baker's gamma index (BGI) and cophenetic correlation coefficients. Pairwise distances comparing all three analysis methods were compared using linear regression models. Our results showed that 68/73 sporadic C. jejuni and C. coli isolates were differentiated from outbreak-associated isolates using all three methods. There was a high correlation between cgMLST and wgMLST analyses of the isolates; the BGI, cophenetic correlation coefficient, linear regression model R 2 and Pearson correlation coefficients were >0.90. The correlation was sometimes lower comparing hqSNP analysis to the MLST-based methods; the linear regression model R 2 and Pearson correlation coefficients were between 0.60 and 0.86, and the BGI and cophenetic correlation coefficient were between 0.63 and 0.86 for some outbreak isolates. We demonstrated that C. jejuni and C. coli isolates clustered in concordance with epidemiological data using WGS-based analysis methods. Discrepancies between allele and SNP-based approaches may reflect the differences between how genomic variation (SNPs and indels) are captured between the two methods. Since cgMLST examines allele differences in genes that are common in most isolates being compared, it is well suited to surveillance: searching large genomic databases for similar isolates is easily and efficiently done using allelic profiles. On the other hand, use of an hqSNP approach is much more computer intensive and not scalable to large sets of genomes. If further resolution between potential outbreak isolates is needed, wgMLST or hqSNP analysis can be used.
Asunto(s)
Campylobacter coli , Campylobacter jejuni , Estados Unidos/epidemiología , Tipificación de Secuencias Multilocus , Campylobacter coli/genética , Filogenia , Brotes de EnfermedadesRESUMEN
Toxigenic Vibrio cholerae serogroup O1 is the etiologic agent of the disease cholera, and strains of this serogroup are responsible for pandemics. A few other serogroups have been found to carry cholera toxin genes-most notably, O139, O75, and O141-and public health surveillance in the United States is focused on these four serogroups. A toxigenic isolate was recovered from a case of vibriosis from Texas in 2008. This isolate did not agglutinate with any of the four different serogroups' antisera (O1, O139, O75, or O141) routinely used in phenotypic testing and did not display a rough phenotype. We investigated several hypotheses that might explain the recovery of this potential nonagglutinating (NAG) strain using whole-genome sequencing analysis and phylogenetic methods. The NAG strain formed a monophyletic cluster with O141 strains in a whole-genome phylogeny. Furthermore, a phylogeny of ctxAB and tcpA sequences revealed that the sequences from the NAG strain also formed a monophyletic cluster with toxigenic U.S. Gulf Coast (USGC) strains (O1, O75, and O141) that were recovered from vibriosis cases associated with exposures to Gulf Coast waters. A comparison of the NAG whole-genome sequence showed that the O-antigen-determining region of the NAG strain was closely related to those of O141 strains, and specific mutations were likely responsible for the inability to agglutinate. This work shows the utility of whole-genome sequence analysis tools for characterization of an atypical clinical isolate of V. cholerae originating from a USGC state. IMPORTANCE Clinical cases of vibriosis are on the rise due to climate events and ocean warming (1, 2), and increased surveillance of toxigenic Vibrio cholerae strains is now more crucial than ever. While traditional phenotyping using antisera against O1 and O139 is useful for monitoring currently circulating strains with pandemic or epidemic potential, reagents are limited for non-O1/non-O139 strains. With the increased use of next-generation sequencing technologies, analysis of less well-characterized strains and O-antigen regions is possible. The framework for advanced molecular analysis of O-antigen-determining regions presented herein will be useful in the absence of reagents for serotyping. Furthermore, molecular analyses based on whole-genome sequence data and using phylogenetic methods will help characterize both historical and novel strains of clinical importance. Closely monitoring emerging mutations and trends will improve our understanding of the epidemic potential of Vibrio cholerae to anticipate and rapidly respond to future public health emergencies.
Asunto(s)
Cólera , Vibriosis , Vibrio cholerae , Estados Unidos , Humanos , Vibrio cholerae/genética , Filogenia , Antígenos O/genéticaRESUMEN
Salmonella enterica is a leading cause of bacterial foodborne and zoonotic illnesses in the United States. For this study, we applied four different whole genome sequencing (WGS)-based subtyping methods: high quality single-nucleotide polymorphism (hqSNP) analysis, whole genome multilocus sequence typing using either all loci [wgMLST (all loci)] and only chromosome-associated loci [wgMLST (chrom)], and core genome multilocus sequence typing (cgMLST) to a dataset of isolate sequences from 9 well-characterized Salmonella outbreaks. For each outbreak, we evaluated the genomic and epidemiologic concordance between hqSNP and allele-based methods. We first compared pairwise genomic differences using all four methods. We observed discrepancies in allele difference ranges when using wgMLST (all loci), likely caused by inflated genetic variation due to loci found on plasmids and/or other mobile genetic elements in the accessory genome. Therefore, we excluded wgMLST (all loci) results from any further comparisons in the study. Then, we created linear regression models and phylogenetic tanglegrams using the remaining three methods. K-means analysis using the silhouette method was applied to compare the ability of the three methods to partition outbreak and sporadic isolate sequences. Our results showed that pairwise hqSNP differences had high concordance with cgMLST and wgMLST (chrom) allele differences. The slopes of the regressions for hqSNP vs. allele pairwise differences were 0.58 (cgMLST) and 0.74 [wgMLST (chrom)], and the slope of the regression was 0.77 for cgMLST vs. wgMLST (chrom) pairwise differences. Tanglegrams showed high clustering concordance between methods using two statistical measures, the Baker's gamma index (BGI) and cophenetic correlation coefficient (CCC), where 9/9 (100%) of outbreaks yielded BGI values ≥ 0.60 and CCCs were ≥ 0.97 across all nine outbreaks and all three methods. K-means analysis showed separation of outbreak and sporadic isolate groups with average silhouette widths ≥ 0.87 for outbreak groups and ≥ 0.16 for sporadic groups. This study demonstrates that Salmonella isolates clustered in concordance with epidemiologic data using three WGS-based subtyping methods and supports using cgMLST as the primary method for national surveillance of Salmonella outbreak clusters.
RESUMEN
Identification of enteric bacteria species by whole genome sequence (WGS) analysis requires a rapid and an easily standardized approach. We leveraged the principles of average nucleotide identity using MUMmer (ANIm) software, which calculates the percent bases aligned between two bacterial genomes and their corresponding ANI values, to set threshold values for determining species consistent with the conventional identification methods of known species. The performance of species identification was evaluated using two datasets: the Reference Genome Dataset v2 (RGDv2), consisting of 43 enteric genome assemblies representing 32 species, and the Test Genome Dataset (TGDv1), comprising 454 genome assemblies which is designed to represent all species needed to query for identification, as well as rare and closely related species. The RGDv2 contains six Campylobacter spp., three Escherichia/Shigella spp., one Grimontia hollisae, six Listeria spp., one Photobacterium damselae, two Salmonella spp., and thirteen Vibrio spp., while the TGDv1 contains 454 enteric bacterial genomes representing 42 different species. The analysis showed that, when a standard minimum of 70% genome bases alignment existed, the ANI threshold values determined for these species were ≥95 for Escherichia/Shigella and Vibrio species, ≥93% for Salmonella species, and ≥92% for Campylobacter and Listeria species. Using these metrics, the RGDv2 accurately classified all validation strains in TGDv1 at the species level, which is consistent with the classification based on previous gold standard methods.
RESUMEN
A high burden of Salmonella enterica subspecies enterica serovar Typhi (S. Typhi) bacteremia has been reported from urban informal settlements in sub-Saharan Africa, yet little is known about the introduction of these strains to the region. Understanding regional differences in the predominant strains of S. Typhi can provide insight into the genomic epidemiology. We genetically characterized 310 S. Typhi isolates from typhoid fever surveillance conducted over a 12-year period (2007-2019) in Kibera, an urban informal settlement in Nairobi, Kenya, to assess the circulating strains, their antimicrobial resistance attributes, and how they relate to global S. Typhi isolates. Whole genome multi-locus sequence typing (wgMLST) identified 4 clades, with up to 303 pairwise allelic differences. The identified genotypes correlated with wgMLST clades. The predominant clade contained 290 (93.5%) isolates with a median of 14 allele differences (range 0-52) and consisted entirely of genotypes 4.3.1.1 and 4.3.1.2. Resistance determinants were identified exclusively in the predominant clade. Determinants associated with resistance to aminoglycosides were observed in 245 isolates (79.0%), sulphonamide in 243 isolates (78.4%), trimethoprim in 247 isolates (79.7%), tetracycline in 224 isolates (72.3%), chloramphenicol in 247 isolates (79.6%), ß-lactams in 239 isolates (77.1%) and quinolones in 62 isolates (20.0%). Multidrug resistance (MDR) determinants (defined as determinants conferring resistance to ampicillin, chloramphenicol and cotrimoxazole) were found in 235 (75.8%) isolates. The prevalence of MDR associated genes was similar throughout the study period (2007-2012: 203, 76.3% vs 2013-2019: 32, 72.7%; Fisher's Exact Test: P = 0.5478, while the proportion of isolates harboring quinolone resistance determinants increased (2007-2012: 42, 15.8% and 2013-2019: 20, 45.5%; Fisher's Exact Test: P<0.0001) following a decline in S. Typhi in Kibera. Some isolates (49, 15.8%) harbored both MDR and quinolone resistance determinants. There were no determinants associated with resistance to cephalosporins or azithromycin detected among the isolates sequenced in this study. Plasmid markers were only identified in the main clade including IncHI1A and IncHI1B(R27) in 226 (72.9%) isolates, and IncQ1 in 238 (76.8%) isolates. Molecular clock analysis of global typhoid isolates and isolates from Kibera suggests that genotype 4.3.1 has been introduced multiple times in Kibera. Several genomes from Kibera formed a clade with genomes from Kenya, Malawi, South Africa, and Tanzania. The most recent common ancestor (MRCA) for these isolates was from around 1997. Another isolate from Kibera grouped with several isolates from Uganda, sharing a common ancestor from around 2009. In summary, S. Typhi in Kibera belong to four wgMLST clades one of which is frequently associated with MDR genes and this poses a challenge in treatment and control.
Asunto(s)
Quinolonas , Fiebre Tifoidea , Antibacterianos/farmacología , Cloranfenicol , Humanos , Kenia/epidemiología , Pruebas de Sensibilidad Microbiana , Tipificación de Secuencias Multilocus , Salmonella typhi , Fiebre Tifoidea/epidemiologíaRESUMEN
Laboratories that run Whole Genome Sequencing (WGS) produce a tremendous amount of data, up to 10 gigabytes for some common instruments. There is a need to standardize the quality assurance and quality control process (QA/QC). Therefore we have created SneakerNet to automate the QA/QC for WGS.
RESUMEN
Four enzymatic DNA library preparation kits were compared for sequencing Shiga toxin-producing E. coli. All kits produced high quality sequence data which performed equally well in the downstream analyses for surveillance and outbreak detection. Important differences were noted in the workflow user-friendliness and per sample cost.
Asunto(s)
Infecciones por Escherichia coli/diagnóstico , Juego de Reactivos para Diagnóstico/economía , Juego de Reactivos para Diagnóstico/microbiología , Escherichia coli Shiga-Toxigénica/genética , Escherichia coli Shiga-Toxigénica/aislamiento & purificación , ADN Bacteriano/aislamiento & purificación , Brotes de Enfermedades , Monitoreo Epidemiológico , Infecciones por Escherichia coli/microbiología , Biblioteca de Genes , Genoma Bacteriano , Humanos , Análisis de Secuencia de ADN , Secuenciación Completa del Genoma , Flujo de TrabajoRESUMEN
Modern epidemiology of foodborne bacterial pathogens in industrialized countries relies increasingly on whole genome sequencing (WGS) techniques. As opposed to profiling techniques such as pulsed-field gel electrophoresis, WGS requires a variety of computational methods. Since 2013, United States agencies responsible for food safety including the CDC, FDA, and USDA, have been performing whole-genome sequencing (WGS) on all Listeria monocytogenes found in clinical, food, and environmental samples. Each year, more genomes of other foodborne pathogens such as Escherichia coli, Campylobacter jejuni, and Salmonella enterica are being sequenced. Comparing thousands of genomes across an entire species requires a fast method with coarse resolution; however, capturing the fine details of highly related isolates requires a computationally heavy and sophisticated algorithm. Most L. monocytogenes investigations employing WGS depend on being able to identify an outbreak clade whose inter-genomic distances are less than an empirically determined threshold. When the difference between a few single nucleotide polymorphisms (SNPs) can help distinguish between genomes that are likely outbreak-associated and those that are less likely to be associated, we require a fine-resolution method. To achieve this level of resolution, we have developed Lyve-SET, a high-quality SNP pipeline. We evaluated Lyve-SET by retrospectively investigating 12 outbreak data sets along with four other SNP pipelines that have been used in outbreak investigation or similar scenarios. To compare these pipelines, several distance and phylogeny-based comparison methods were applied, which collectively showed that multiple pipelines were able to identify most outbreak clusters and strains. Currently in the US PulseNet system, whole genome multi-locus sequence typing (wgMLST) is the preferred primary method for foodborne WGS cluster detection and outbreak investigation due to its ability to name standardized genomic profiles, its central database, and its ability to be run in a graphical user interface. However, creating a functional wgMLST scheme requires extended up-front development and subject-matter expertise. When a scheme does not exist or when the highest resolution is needed, SNP analysis is used. Using three Listeria outbreak data sets, we demonstrated the concordance between Lyve-SET SNP typing and wgMLST. Availability: Lyve-SET can be found at https://github.com/lskatz/Lyve-SET.