RESUMEN
Classification by serotyping is the essential first step in the characterization of Salmonella isolates and is important for surveillance, source tracking, and outbreak detection. To improve detection and reduce the burden of salmonellosis, several rapid and high-throughput molecular Salmonella serotyping methods have been developed.The aim of this study was to compare three commercial kits, Salm SeroGen (Salm Sero-Genotyping AS-1 kit), Check&Trace (Check-Points), and xMAP (xMAP Salmonella serotyping assay), to the Salmonella genoserotyping array (SGSA) developed by our laboratory. They were assessed using a panel of 321 isolates that represent commonly reported serovars from human and nonhuman sources globally. The four methods correctly identified 73.8% to 94.7% of the isolates tested. The methods correctly identified 85% and 98% of the clinically important Salmonella serovars Enteritidis and Typhimurium, respectively. The methods correctly identified 75% to 100% of the nontyphoidal, broad host range Salmonella serovars, including Heidelberg, Hadar, Infantis, Kentucky, Montevideo, Newport, and Virchow. The sensitivity and specificity of Salmonella serovars Typhimurium and Enteritidis ranged from 85% to 100% and 99% to 100%, respectively.It is anticipated that whole-genome sequencing will replace serotyping in public health laboratories in the future. However, at present, it is approximately three times more expensive than molecular methods. Until consistent standards and methodologies are deployed for whole-genome sequencing, data analysis and interlaboratory comparability remain a challenge. The use of molecular serotyping will provide a valuable high-throughput alternative to traditional serotyping. This comprehensive analysis provides a detailed comparison of commercial kits available for the molecular serotyping of Salmonella.
Asunto(s)
Técnicas de Genotipaje/métodos , Técnicas de Diagnóstico Molecular/métodos , Salmonelosis Animal/microbiología , Infecciones por Salmonella/microbiología , Salmonella/clasificación , Serogrupo , Serotipificación/métodos , Animales , Humanos , Salmonella/genética , Infecciones por Salmonella/diagnóstico , Salmonelosis Animal/diagnósticoRESUMEN
The COVID-19 pandemic led to a large global effort to sequence SARS-CoV-2 genomes from patient samples to track viral evolution and inform public health response. Millions of SARS-CoV-2 genome sequences have been deposited in global public repositories. The Canadian COVID-19 Genomics Network (CanCOGeN - VirusSeq), a consortium tasked with coordinating expanded sequencing of SARS-CoV-2 genomes across Canada early in the pandemic, created the Canadian VirusSeq Data Portal, with associated data pipelines and procedures, to support these efforts. The goal of VirusSeq was to allow open access to Canadian SARS-CoV-2 genomic sequences and enhanced, standardized contextual data that were unavailable in other repositories and that meet FAIR standards (Findable, Accessible, Interoperable and Reusable). In addition, the Portal data submission pipeline contains data quality checking procedures and appropriate acknowledgement of data generators that encourages collaboration. From inception to execution, the portal was developed with a conscientious focus on strong data governance principles and practices. Extensive efforts ensured a commitment to Canadian privacy laws, data security standards, and organizational processes. This Portal has been coupled with other resources like Viral AI and was further leveraged by the Coronavirus Variants Rapid Response Network (CoVaRR-Net) to produce a suite of continually updated analytical tools and notebooks. Here we highlight this Portal, including its contextual data not available elsewhere, and the 'Duotang', a web platform that presents key genomic epidemiology and modeling analyses on circulating and emerging SARS-CoV-2 variants in Canada. Duotang presents dynamic changes in variant composition of SARS-CoV-2 in Canada and by province, estimates variant growth, and displays complementary interactive visualizations, with a text overview of the current situation. The VirusSeq Data Portal and Duotang resources, alongside additional analyses and resources computed from the Portal (COVID-MVP, CoVizu), are all open-source and freely available. Together, they provide an updated picture of SARS-CoV-2 evolution to spur scientific discussions, inform public discourse, and support communication with and within public health authorities. They also serve as a framework for other jurisdictions interested in open, collaborative sequence data sharing and analyses.
RESUMEN
The COVID-19 pandemic led to a large global effort to sequence SARS-CoV-2 genomes from patient samples to track viral evolution and inform the public health response. Millions of SARS-CoV-2 genome sequences have been deposited in global public repositories. The Canadian COVID-19 Genomics Network (CanCOGeN - VirusSeq), a consortium tasked with coordinating expanded sequencing of SARS-CoV-2 genomes across Canada early in the pandemic, created the Canadian VirusSeq Data Portal, with associated data pipelines and procedures, to support these efforts. The goal of VirusSeq was to allow open access to Canadian SARS-CoV-2 genomic sequences and enhanced, standardized contextual data that were unavailable in other repositories and that meet FAIR standards (Findable, Accessible, Interoperable and Reusable). In addition, the portal data submission pipeline contains data quality checking procedures and appropriate acknowledgement of data generators that encourages collaboration. From inception to execution, the portal was developed with a conscientious focus on strong data governance principles and practices. Extensive efforts ensured a commitment to Canadian privacy laws, data security standards, and organizational processes. This portal has been coupled with other resources, such as Viral AI, and was further leveraged by the Coronavirus Variants Rapid Response Network (CoVaRR-Net) to produce a suite of continually updated analytical tools and notebooks. Here we highlight this portal (https://virusseq-dataportal.ca/), including its contextual data not available elsewhere, and the Duotang (https://covarr-net.github.io/duotang/duotang.html), a web platform that presents key genomic epidemiology and modelling analyses on circulating and emerging SARS-CoV-2 variants in Canada. Duotang presents dynamic changes in variant composition of SARS-CoV-2 in Canada and by province, estimates variant growth, and displays complementary interactive visualizations, with a text overview of the current situation. The VirusSeq Data Portal and Duotang resources, alongside additional analyses and resources computed from the portal (COVID-MVP, CoVizu), are all open source and freely available. Together, they provide an updated picture of SARS-CoV-2 evolution to spur scientific discussions, inform public discourse, and support communication with and within public health authorities. They also serve as a framework for other jurisdictions interested in open, collaborative sequence data sharing and analyses.
Asunto(s)
COVID-19 , Genoma Viral , SARS-CoV-2 , Canadá/epidemiología , SARS-CoV-2/genética , Humanos , COVID-19/epidemiología , COVID-19/virología , Genómica/métodos , Pandemias , Bases de Datos GenéticasRESUMEN
We have developed a Salmonella genoserotyping array (SGSA) which rapidly generates an antigenic formula consistent with the White-Kauffmann-Le Minor scheme, currently the gold standard for Salmonella serotyping. A set of 287 strains representative of 133 Salmonella serovars was assembled to validate the array and to test the array probes for accuracy, specificity, and reproducibility. Initially, 76 known serovars were utilized to validate the specificity and repeatability of the array probes and their expected probe patterns. The SGSA generated the correct serovar designations for 100% of the known subspecies I serovars tested in the validation panel and an antigenic formula consistent with that of the White-Kauffmann-Le Minor scheme for 97% of all known serovars tested. Once validated, the SGSA was assessed against a blind panel of 100 Salmonella enterica subsp. I samples serotyped using traditional methods. In summary, the SGSA correctly identified all of the blind samples as representing Salmonella and successfully identified 92% of the antigens found within the unknown samples. Antigen- and serovar-specific probes, in combination with a pepT PCR for confirmation of S. enterica subsp. Enteritidis determinations, generated an antigenic formula and/or a serovar designation consistent with the White-Kauffmann-Le Minor scheme for 87% of unknown samples tested with the SGSA. Future experiments are planned to test the specificity of the array probes with other Salmonella serovars to demonstrate the versatility and utility of this array as a public health tool in the identification of Salmonella.
Asunto(s)
Antígenos Bacterianos/genética , Tipificación Molecular/métodos , Salmonella enterica/clasificación , Salmonella enterica/genética , Animales , Genotipo , Humanos , Reproducibilidad de los Resultados , Sensibilidad y Especificidad , Serotipificación/métodosRESUMEN
Hierarchical genotyping approaches can provide insights into the source, geography and temporal distribution of bacterial pathogens. Multiple hierarchical SNP genotyping schemes have previously been developed so that new isolates can rapidly be placed within pre-computed population structures, without the need to rebuild phylogenetic trees for the entire dataset. This classification approach has, however, seen limited uptake in routine public health settings due to analytical complexity and the lack of standardized tools that provide clear and easy ways to interpret results. The BioHansel tool was developed to provide an organism-agnostic tool for hierarchical SNP-based genotyping. The tool identifies split k-mers that distinguish predefined lineages in whole genome sequencing (WGS) data using SNP-based genotyping schemes. BioHansel uses the Aho-Corasick algorithm to type isolates from assembled genomes or raw read sequence data in a matter of seconds, with limited computational resources. This makes BioHansel ideal for use by public health agencies that rely on WGS methods for surveillance of bacterial pathogens. Genotyping results are evaluated using a quality assurance module which identifies problematic samples, such as low-quality or contaminated datasets. Using existing hierarchical SNP schemes for Mycobacterium tuberculosis and Salmonella Typhi, we compare the genotyping results obtained with the k-mer-based tools BioHansel and SKA, with those of the organism-specific tools TBProfiler and genotyphi, which use gold-standard reference-mapping approaches. We show that the genotyping results are fully concordant across these different methods, and that the k-mer-based tools are significantly faster. We also test the ability of the BioHansel quality assurance module to detect intra-lineage contamination and demonstrate that it is effective, even in populations with low genetic diversity. We demonstrate the scalability of the tool using a dataset of ~8100 S. Typhi public genomes and provide the aggregated results of geographical distributions as part of the tool's output. BioHansel is an open source Python 3 application available on PyPI and Conda repositories and as a Galaxy tool from the public Galaxy Toolshed. In a public health context, BioHansel enables rapid and high-resolution classification of bacterial pathogens with low genetic diversity.
Asunto(s)
Bacterias/genética , Técnicas de Tipificación Bacteriana/métodos , Técnicas de Genotipaje/métodos , Polimorfismo de Nucleótido Simple , Bacterias/clasificación , Bacterias/aislamiento & purificación , Variación Genética , Genoma Bacteriano , Genotipo , Epidemiología Molecular/métodos , Mycobacterium tuberculosis/genética , Filogenia , Salmonella/genética , Programas Informáticos , Secuenciación Completa del GenomaRESUMEN
We report here the completed closed genome sequences of strains representing 36 serotypes of Salmonella These genome sequences will provide useful references for understanding the genetic variation between serotypes, particularly as references for mapping of raw reads or to create assemblies of higher quality, as well as to aid in studies of comparative genomics of Salmonella.
RESUMEN
Public health and food safety institutions around the world are adopting whole genome sequencing (WGS) to replace conventional methods for characterizing Salmonella for use in surveillance and outbreak response. Falling costs and increased throughput of WGS have resulted in an explosion of data, but questions remain as to the reliability and robustness of the data. Due to the critical importance of serovar information to public health, it is essential to have reliable serovar assignments available for all of the Salmonella records. The current study used a systematic assessment and curation of all Salmonella in the sequence read archive (SRA) to assess the state of the data and their utility. A total of 67â758 genomes were assembled de novo and quality-assessed for their assembly metrics as well as species and serovar assignments. A total of 42â400 genomes passed all of the quality criteria but 30.16â% of genomes were deposited without serotype information. These data were used to compare the concordance of reported and predicted serovars for two in silico prediction tools, multi-locus sequence typing (MLST) and the Salmonella in silico Typing Resource (SISTR), which produced predictions that were fully concordant with 87.51 and 91.91â% of the tested isolates, respectively. Concordance of in silico predictions increased when serovar variants were grouped together, 89.25â% for MLST and 94.98â% for SISTR. This study represents the first large-scale validation of serovar information in public genomes and provides a large validated set of genomes, which can be used to benchmark new bioinformatics tools.
Asunto(s)
Técnicas de Tipificación Bacteriana/métodos , Bases de Datos de Ácidos Nucleicos , Salmonella/genética , Secuenciación Completa del Genoma/métodos , Simulación por Computador , ADN Bacteriano/genética , Genoma Bacteriano , Tipificación de Secuencias Multilocus , Salud Pública , Reproducibilidad de los Resultados , Salmonella/clasificación , Infecciones por Salmonella/microbiología , Salmonella enterica , Análisis de Secuencia , Serogrupo , SerotipificaciónRESUMEN
We report here 32 completed closed genome sequences of strains representing 30 serotypes of Salmonella. These genome sequences will provide useful references for understanding the genetic variation within Salmonella enterica serotypes, particularly as references to aid in comparative genomics studies, as well as providing information for improving in silico serotyping accuracy.
RESUMEN
Previously we developed and tested the Salmonella GenoSerotyping Array (SGSA), which utilized oligonucleotide probes for O- and H- antigen biomarkers to perform accurate molecular serotyping of 57 Salmonella serotypes. Here we describe the development and validation of the ISO 17025 accredited second version of the SGSA (SGSA v. 2) with reliable and unambiguous molecular serotyping results for 112 serotypes of Salmonella which were verified both in silico and in vitro. Improvements included an expansion of the probe sets along with a new classifier tool for prediction of individual antigens and overall serotype from the array probe intensity results. The array classifier and probe sequences were validated in silico to high concordance using 36,153 draft genomes of diverse Salmonella serotypes assembled from public repositories. We obtained correct and unambiguous serotype assignments for 31,924 (88.30%) of the tested samples and a further 3,916 (10.83%) had fully concordant antigen predictions but could not be assigned to a single serotype. The SGSA v. 2 can directly use bacterial colonies with a limit of detection of 860 CFU/mL or purified DNA template at a concentration of 1.0 x 10-1 ng/µl. The SGSA v. 2 was also validated in the wet laboratory and certified using panel of 406 samples representing 185 different serotypes with correct antigen and serotype determinations for 60.89% of the panel and 18.31% correctly identified but an ambiguous overall serotype determination.
Asunto(s)
Técnicas de Genotipaje , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Salmonella/clasificación , Salmonella/genética , Serotipificación/métodos , Inocuidad de los Alimentos , Internet , Límite de DetecciónRESUMEN
A DNA-based microarray designed to detect somatic (O) and flagellar (H) antigens present in the five most commonly isolated Salmonella serovars within Canada was developed as an alternative to the traditional Kauffmann-White serotyping scheme currently used to serotype salmonellae. Short oligonucleotide probes were designed based on publicly available sequence data of selected genes responsible for O and H antigen biosynthesis. These targets included: antigen-specific sequences within the flagella (H) antigen phase 1 (fliC) and phase 2 (fljB) genes and somatic (O) antigen biosynthesis genes within the rfb cluster (Groups B--rfbJ, C1--wbaA, C2--rfbJ, D1--rfbS). A prototype microarray with 117 O and H antigen-specific probes and controls was used to assess probe performance against two pools of gene target PCR amplicons. A set of 31 of these antigen-specific probes (8 O and 23 H) with high specific signal and low non-specific signal were selected based on t-test (p-value <0.01) and log(2) ratio distribution analysis to create a prototype microarray. The microarray was tested against 16 Salmonella strains of known serotype. Based on the strains tested in this study, these probes successfully identified and differentiated 11 of the 12 antigens targeted. The prototype DNA-based typing microarray described here has the potential to be an automated alternative to the traditional antigen-antibody serotyping scheme currently used for Salmonella.
Asunto(s)
Técnicas Bacteriológicas , Análisis de Secuencia por Matrices de Oligonucleótidos , Salmonella/clasificación , Salmonella/genética , Antígenos Bacterianos/genética , Proteínas Bacterianas/genética , ADN Bacteriano/genética , Flagelina/genética , Antígenos O/genética , Sensibilidad y EspecificidadRESUMEN
Salmonella serotyping remains the gold-standard tool for the classification of Salmonella isolates and forms the basis of Canada's national surveillance program for this priority foodborne pathogen. Public health officials have been increasingly looking toward whole genome sequencing (WGS) to provide a large set of data from which all the relevant information about an isolate can be mined. However, rigorous validation and careful consideration of potential implications in the replacement of traditional surveillance methodologies with WGS data analysis tools is needed. Two in silico tools for Salmonella serotyping have been developed, the Salmonella in silico Typing Resource (SISTR) and SeqSero, while seven gene MLST for serovar prediction can be adapted for in silico analysis. All three analysis methods were assessed and compared to traditional serotyping techniques using a set of 813 verified clinical and laboratory isolates, including 492 Canadian clinical isolates and 321 isolates of human and non-human sources. Successful results were obtained for 94.8, 88.2, and 88.3% of the isolates tested using SISTR, SeqSero, and MLST, respectively, indicating all would be suitable for maintaining historical records, surveillance systems, and communication structures currently in place and the choice of the platform used will ultimately depend on the users need. Results also pointed to the need to reframe serotyping in the genomic era as a test to understand the genes that are carried by an isolate, one which is not necessarily congruent with what is antigenically expressed. The adoption of WGS for serotyping will provide the simultaneous collection of information that can be used by multiple programs within the current surveillance paradigm; however, this does not negate the importance of the various programs or the role of serotyping going forward.
RESUMEN
For nearly 100 years serotyping has been the gold standard for the identification of Salmonella serovars. Despite the increasing adoption of DNA-based subtyping approaches, serotype information remains a cornerstone in food safety and public health activities aimed at reducing the burden of salmonellosis. At the same time, recent advances in whole-genome sequencing (WGS) promise to revolutionize our ability to perform advanced pathogen characterization in support of improved source attribution and outbreak analysis. We present the Salmonella In Silico Typing Resource (SISTR), a bioinformatics platform for rapidly performing simultaneous in silico analyses for several leading subtyping methods on draft Salmonella genome assemblies. In addition to performing serovar prediction by genoserotyping, this resource integrates sequence-based typing analyses for: Multi-Locus Sequence Typing (MLST), ribosomal MLST (rMLST), and core genome MLST (cgMLST). We show how phylogenetic context from cgMLST analysis can supplement the genoserotyping analysis and increase the accuracy of in silico serovar prediction to over 94.6% on a dataset comprised of 4,188 finished genomes and WGS draft assemblies. In addition to allowing analysis of user-uploaded whole-genome assemblies, the SISTR platform incorporates a database comprising over 4,000 publicly available genomes, allowing users to place their isolates in a broader phylogenetic and epidemiological context. The resource incorporates several metadata driven visualizations to examine the phylogenetic, geospatial and temporal distribution of genome-sequenced isolates. As sequencing of Salmonella isolates at public health laboratories around the world becomes increasingly common, rapid in silico analysis of minimally processed draft genome assemblies provides a powerful approach for molecular epidemiology in support of public health investigations. Moreover, this type of integrated analysis using multiple sequence-based methods of sub-typing allows for continuity with historical serotyping data as we transition towards the increasing adoption of genomic analyses in epidemiology. The SISTR platform is freely available on the web at https://lfz.corefacility.ca/sistr-app/.
Asunto(s)
Genoma Bacteriano , Internet , Salmonella/genética , Simulación por Computador , Filogenia , Salmonella/clasificaciónRESUMEN
We report the draft genome sequences of 25 Salmonella enterica strains representing 24 different serotypes, many of which were not available in public repositories during our selection process. These draft genomes will provide useful reference for the genetic variation between serotypes and aid in the development of molecular typing tools.
RESUMEN
Salmonella serotyping is an essential first step for identification of isolates associated with disease outbreaks. The Salmonella genoserotyping array (SGSA) is a microarray-based alternative to standard serotyping designed to rapidly identify 57 of the most commonly reported serovars through detection of the genes encoding surface O and H antigens and reporting the corresponding serovar in accordance with the existing White-Kaufmann-Le Minor serotyping scheme. In this study, we evaluated the SGSA at 4 laboratories in 3 countries by testing 1874 isolates from human and non-human sources. The SGSA correctly identified 96.7% of isolates from the target 57 serovars. For the prevalent and clinically important Salmonella serovars Enteritidis and Typhimurium, test specificity and sensitivity were greater than 98% and 99%, respectively. Due to its high-throughput nature, the SGSA is a rapid and cost-effective alternative to standard serotyping for identifying the most prevalent serovars of Salmonella.