Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 34
Filter
1.
BMC Genomics ; 25(1): 679, 2024 Jul 08.
Article in English | MEDLINE | ID: mdl-38978005

ABSTRACT

BACKGROUND: Oxford Nanopore provides high throughput sequencing platforms able to reconstruct complete bacterial genomes with 99.95% accuracy. However, even small levels of error can obscure the phylogenetic relationships between closely related isolates. Polishing tools have been developed to correct these errors, but it is uncertain if they obtain the accuracy needed for the high-resolution source tracking of foodborne illness outbreaks. RESULTS: We tested 132 combinations of assembly and short- and long-read polishing tools to assess their accuracy for reconstructing the genome sequences of 15 highly similar Salmonella enterica serovar Newport isolates from a 2020 onion outbreak. While long-read polishing alone improved accuracy, near perfect accuracy (99.9999% accuracy or ~ 5 nucleotide errors across the 4.8 Mbp genome, excluding low confidence regions) was only obtained by pipelines that combined both long- and short-read polishing tools. Notably, medaka was a more accurate and efficient long-read polisher than Racon. Among short-read polishers, NextPolish showed the highest accuracy, but Pilon, Polypolish, and POLCA performed similarly. Among the 5 best performing pipelines, polishing with medaka followed by NextPolish was the most common combination. Importantly, the order of polishing tools mattered i.e., using less accurate tools after more accurate ones introduced errors. Indels in homopolymers and repetitive regions, where the short reads could not be uniquely mapped, remained the most challenging errors to correct. CONCLUSIONS: Short reads are still needed to correct errors in nanopore sequenced assemblies to obtain the accuracy required for source tracking investigations. Our granular assessment of the performance of the polishing pipelines allowed us to suggest best practices for tool users and areas for improvement for tool developers.


Subject(s)
Benchmarking , Disease Outbreaks , Genome, Bacterial , Nanopores , Nanopore Sequencing/methods , High-Throughput Nucleotide Sequencing/methods , Salmonella enterica/genetics , Salmonella enterica/isolation & purification , Humans , Phylogeny
2.
Bioinformatics ; 39(1)2023 01 01.
Article in English | MEDLINE | ID: mdl-36579850

ABSTRACT

MOTIVATION: Scientists seeking to understand the genomic basis of bacterial phenotypes, such as antibiotic resistance, today have access to an unprecedented number of complete and nearly complete genomes. Making sense of these data requires computational tools able to perform multiple-genome comparisons efficiently, yet currently available tools cannot scale beyond several tens of genomes. RESULTS: We describe PRAWNS, an efficient and scalable tool for multiple-genome analysis. PRAWNS defines a concise set of genomic features (metablocks), as well as pairwise relationships between them, which can be used as a basis for large-scale genotype-phenotype association studies. We demonstrate the effectiveness of PRAWNS by identifying genomic regions associated with antibiotic resistance in Acinetobacter baumannii. AVAILABILITY AND IMPLEMENTATION: PRAWNS is implemented in C++ and Python3, licensed under the GPLv3 license, and freely downloadable from GitHub (https://github.com/KiranJavkar/PRAWNS.git). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Metagenomics , Software , Genomics , Genome , Bacteria
3.
BMC Genomics ; 24(1): 165, 2023 Apr 04.
Article in English | MEDLINE | ID: mdl-37016310

ABSTRACT

BACKGROUND: The Salmonella enterica serovar Newport red onion outbreak of 2020 was the largest foodborne outbreak of Salmonella in over a decade. The epidemiological investigation suggested two farms as the likely source of contamination. However, single nucleotide polymorphism (SNP) analysis of the whole genome sequencing data showed that none of the Salmonella isolates collected from the farm regions were linked to the clinical isolates-preventing the use of phylogenetics in source identification. Here, we explored an alternative method for analyzing the whole genome sequencing data driven by the hypothesis that if the outbreak strain had come from the farm regions, then the clinical isolates would disproportionately contain plasmids found in isolates from the farm regions due to horizontal transfer. RESULTS: SNP analysis confirmed that the clinical isolates formed a single, nearly-clonal clade with evidence for ancestry in California going back a decade. The clinical clade had a large core genome (4,399 genes) and a large and sparsely distributed accessory genome (2,577 genes, at least 64% on plasmids). At least 20 plasmid types occurred in the clinical clade, more than were found in the literature for Salmonella Newport. A small number of plasmids, 14 from 13 clinical isolates and 17 from 8 farm isolates, were found to be highly similar (> 95% identical)-indicating they might be related by horizontal transfer. Phylogenetic analysis was unable to determine the geographic origin, isolation source, or time of transfer of the plasmids, likely due to their promiscuous and transient nature. However, our resampling analysis suggested that observing a similar number and combination of highly similar plasmids in random samples of environmental Salmonella enterica within the NCBI Pathogen Detection database was unlikely, supporting a connection between the outbreak strain and the farms implicated by the epidemiological investigation. CONCLUSION: Horizontally transferred plasmids provided evidence for a connection between clinical isolates and the farms implicated as the source of the outbreak. Our case study suggests that such analyses might add a new dimension to source tracking investigations, but highlights the need for detailed and accurate metadata, more extensive environmental sampling, and a better understanding of plasmid molecular evolution.


Subject(s)
Salmonella enterica , Serogroup , Onions/genetics , Farms , Phylogeny , Plasmids/genetics , Disease Outbreaks
4.
J Appl Toxicol ; 43(12): 1899-1915, 2023 12.
Article in English | MEDLINE | ID: mdl-37551865

ABSTRACT

We have adapted a semiautomated method for tracking Caenorhabditis elegans spontaneous locomotor activity into a quantifiable assay by developing a sophisticated method for analyzing the time course of measured activity. The 16-h worm Adult Activity Test (wAAT) can be used to measure C. elegans activity levels for efficient screening for pharmacological and toxicity-induced effects. As with any apical endpoint assay, the wAAT is mode of action agnostic, allowing for detection of effects from a broad spectrum of response pathways. With caffeine as a model mild stimulant, the wAAT showed transient hyperactivity followed by reversion to baseline. Mercury chloride (HgCl2 ) produced an early dose-response hyperactivity phase followed by pronounced hypoactivity, a behavior pattern we have termed a toxicant "escape response." Methylmercury chloride (meHgCl) produced a similar pattern to HgCl2 , but at much lower concentrations, a weaker hyperactivity response, and more pronounced hypoactivity. Sodium arsenite (NaAsO2 ) and dimethylarsinic acid (DMA) induced hypoactivity at high concentrations. Acute toxicity, as measured by hypoactivity in C. elegans adults, was ranked: meHgCl > HgCl2 > NaAsO2 = DMA. Caffeine was not toxic with the wAAT at tested concentrations. Methods for conducting the wAAT are described, along with instructions for preparing C. elegans Habitation Medium, a liquid nutrient medium that allows for developmental timing equivalent to that found with C. elegans grown on agar with OP50 Escherichia coli feeder cultures. A de novo mathematical parametric model for adult C. elegans activity and the application of this model in ranking exposure toxicity are presented.


Subject(s)
Caenorhabditis elegans , Models, Theoretical , Animals , Mercuric Chloride/toxicity , Escherichia coli
5.
Clin Infect Dis ; 73(8): 1537-1539, 2021 10 20.
Article in English | MEDLINE | ID: mdl-34240118

ABSTRACT

Open-source DNA sequence databases have long been touted as beneficial to public health, including the facilitation of earlier detection and response to infectious disease outbreaks. Of critical importance to harnessing these benefits is the metadata that describe general and other domain-specific attributes (eg, collection location, isolate type) of a sample. Unlike the sequence data, metadata are often incomplete and lack adherence to an international standard. Here, we describe the problem posed by such variable and incomplete metadata in terms of interpretative labor costs (the time and energy necessary to make sense of the signal in the genetic data) and the impact such metadata have on foodborne outbreak detection and response. Improving the quality of sequence-associated metadata would allow for earlier detection of emerging food safety hazards and allow faster response to foodborne outbreaks.


Subject(s)
Foodborne Diseases , Metadata , Disease Outbreaks , Food Safety , Foodborne Diseases/epidemiology , Humans , Public Health , Public Health Surveillance
6.
BMC Genomics ; 22(1): 389, 2021 May 26.
Article in English | MEDLINE | ID: mdl-34039264

ABSTRACT

BACKGROUND: Whole genome sequencing of cultured pathogens is the state of the art public health response for the bioinformatic source tracking of illness outbreaks. Quasimetagenomics can substantially reduce the amount of culturing needed before a high quality genome can be recovered. Highly accurate short read data is analyzed for single nucleotide polymorphisms and multi-locus sequence types to differentiate strains but cannot span many genomic repeats, resulting in highly fragmented assemblies. Long reads can span repeats, resulting in much more contiguous assemblies, but have lower accuracy than short reads. RESULTS: We evaluated the accuracy of Listeria monocytogenes assemblies from enrichments (quasimetagenomes) of naturally-contaminated ice cream using long read (Oxford Nanopore) and short read (Illumina) sequencing data. Accuracy of ten assembly approaches, over a range of sequencing depths, was evaluated by comparing sequence similarity of genes in assemblies to a complete reference genome. Long read assemblies reconstructed a circularized genome as well as a 71 kbp plasmid after 24 h of enrichment; however, high error rates prevented high fidelity gene assembly, even at 150X depth of coverage. Short read assemblies accurately reconstructed the core genes after 28 h of enrichment but produced highly fragmented genomes. Hybrid approaches demonstrated promising results but had biases based upon the initial assembly strategy. Short read assemblies scaffolded with long reads accurately assembled the core genes after just 24 h of enrichment, but were highly fragmented. Long read assemblies polished with short reads reconstructed a circularized genome and plasmid and assembled all the genes after 24 h enrichment but with less fidelity for the core genes than the short read assemblies. CONCLUSION: The integration of long and short read sequencing of quasimetagenomes expedited the reconstruction of a high quality pathogen genome compared to either platform alone. A new and more complete level of information about genome structure, gene order and mobile elements can be added to the public health response by incorporating long read analyses with the standard short read WGS outbreak response.


Subject(s)
Listeria monocytogenes , Nanopores , Genomics , High-Throughput Nucleotide Sequencing , Listeria monocytogenes/genetics , Sequence Analysis, DNA , Whole Genome Sequencing
7.
BMC Genomics ; 22(1): 114, 2021 Feb 10.
Article in English | MEDLINE | ID: mdl-33568057

ABSTRACT

BACKGROUND: Processing and analyzing whole genome sequencing (WGS) is computationally intense: a single Illumina MiSeq WGS run produces ~ 1 million 250-base-pair reads for each of 24 samples. This poses significant obstacles for smaller laboratories, or laboratories not affiliated with larger projects, which may not have dedicated bioinformatics staff or computing power to effectively use genomic data to protect public health. Building on the success of the cloud-based Galaxy bioinformatics platform ( http://galaxyproject.org ), already known for its user-friendliness and powerful WGS analytical tools, the Center for Food Safety and Applied Nutrition (CFSAN) at the U.S. Food and Drug Administration (FDA) created a customized 'instance' of the Galaxy environment, called GalaxyTrakr ( https://www.galaxytrakr.org ), for use by laboratory scientists performing food-safety regulatory research. The goal was to enable laboratories outside of the FDA internal network to (1) perform quality assessments of sequence data, (2) identify links between clinical isolates and positive food/environmental samples, including those at the National Center for Biotechnology Information sequence read archive ( https://www.ncbi.nlm.nih.gov/sra/ ), and (3) explore new methodologies such as metagenomics. GalaxyTrakr hosts a variety of free and adaptable tools and provides the data storage and computing power to run the tools. These tools support coordinated analytic methods and consistent interpretation of results across laboratories. Users can create and share tools for their specific needs and use sequence data generated locally and elsewhere. RESULTS: In its first full year (2018), GalaxyTrakr processed over 85,000 jobs and went from 25 to 250 users, representing 53 different public and state health laboratories, academic institutions, international health laboratories, and federal organizations. By mid-2020, it has grown to 600 registered users and processed over 450,000 analytical jobs. To illustrate how laboratories are making use of this resource, we describe how six institutions use GalaxyTrakr to quickly analyze and review their data. Instructions for participating in GalaxyTrakr are provided. CONCLUSIONS: GalaxyTrakr advances food safety by providing reliable and harmonized WGS analyses for public health laboratories and promoting collaboration across laboratories with differing resources. Anticipated enhancements to this resource will include workflows for additional foodborne pathogens, viruses, and parasites, as well as new tools and services.


Subject(s)
Metagenomics , Public Health , Computational Biology , High-Throughput Nucleotide Sequencing , Humans , Whole Genome Sequencing
8.
Microbiology (Reading) ; 166(5): 453-459, 2020 05.
Article in English | MEDLINE | ID: mdl-32100709

ABSTRACT

In 2017, the US Food and Drug Administration investigated the sources of multiple outbreaks of salmonellosis. Epidemiologic and traceback investigations identified Maradol papayas as the suspect vehicles. During the investigations, the genomes of 55 Salmonella enterica that were isolated from papaya samples were sequenced. Serovar assignments and phylogenetic analysis placed the 55 isolates into ten distinct groups, each representing a different serovar. Within-serovar SNP differences are generally between 0 and 20 SNPs, while the median between-serovar distance is 51 812 SNPs. We observed two groups with SNP distances between 21 and 100 SNPs. These relatively large within-serovar SNP distances may indicate that the isolates represent either diverse populations or multiple, genetically distinct subpopulations. Further inspection of these cases with traceback evidence allowed us to identify an 11th population. We observed that high levels of genomic diversity from individual firms is possible, with one firm yielding five of the ten serovars. Also, high levels of diversity are possible within small geographic regions, as five of the serovars were isolated from papayas that originated from farms located in Armería and Tecomán, Colima. In addition, we identified AMR genes that are present in three of the serovars studied here (aph(3')-lb, aph(6)-ld, tet(C), fosA7, and qnrB19) and we detected the presence of the plasmid IncHI2A among S. Urbana isolates.


Subject(s)
Carica/microbiology , Genetic Variation , Salmonella Infections/microbiology , Salmonella enterica/classification , Salmonella enterica/genetics , Disease Outbreaks , Food Contamination , Genome, Bacterial , Genotype , Humans , Phylogeny , Polymorphism, Single Nucleotide , Salmonella Infections/epidemiology , Salmonella enterica/isolation & purification , Serogroup , United States/epidemiology , Whole Genome Sequencing
9.
J Clin Microbiol ; 57(5)2019 05.
Article in English | MEDLINE | ID: mdl-30728194

ABSTRACT

Foodborne pathogen surveillance in the United States is transitioning from strain identification using restriction digest technology (pulsed-field gel electrophoresis [PFGE]) to shotgun sequencing of the entire genome (whole-genome sequencing [WGS]). WGS requires a new suite of analysis tools, some of which have long histories in academia but are new to the field of public health and regulatory decision making. Although the general workflow is fairly standard for collecting and analyzing WGS data for disease surveillance, there are a number of differences in how the data are collected and analyzed across public health agencies, both nationally and internationally. This impedes collaborative public health efforts, so national and international efforts are underway to enable direct comparison of these different analysis methods. Ultimately, the harmonization efforts will allow the (mutually trusted and understood) production and analysis of WGS data by labs and agencies worldwide, thus improving outbreak response capabilities globally. This review provides a historical perspective on the use of WGS for pathogen tracking and summarizes the efforts underway to ensure the major steps in phylogenomic pipelines used for pathogen disease surveillance can be readily validated. The tools for doing this will ensure that the results produced are sound, reproducible, and comparable across different analytic approaches.


Subject(s)
Bacteria/genetics , Data Analysis , Foodborne Diseases/diagnosis , Phylogeny , Bacteria/pathogenicity , Computational Biology/methods , Computational Biology/standards , Disease Outbreaks/prevention & control , Electrophoresis, Gel, Pulsed-Field , Epidemiological Monitoring , Genome, Bacterial , Humans , Public Health , United States , Whole Genome Sequencing
10.
BMC Bioinformatics ; 18(1): 178, 2017 Mar 20.
Article in English | MEDLINE | ID: mdl-28320310

ABSTRACT

BACKGROUND: Using phylogenomic analysis tools for tracking pathogens has become standard practice in academia, public health agencies, and large industries. Using the same raw read genomic data as input, there are several different approaches being used to infer phylogenetic tree. These include many different SNP pipelines, wgMLST approaches, k-mer algorithms, whole genome alignment and others; each of these has advantages and disadvantages, some have been extensively validated, some are faster, some have higher resolution. A few of these analysis approaches are well-integrated into the regulatory process of US Federal agencies (e.g. the FDA's SNP pipeline for tracking foodborne pathogens). However, despite extensive validation on benchmark datasets and comparison with other pipelines, we lack methods for fully exploring the effects of multiple parameter values in each pipeline that can potentially have an effect on whether the correct phylogenetic tree is recovered. RESULTS: To resolve this problem, we offer a program, TreeToReads, which can generate raw read data from mutated genomes simulated under a known phylogeny. This simulation pipeline allows direct comparisons of simulated and observed data in a controlled environment. At each step of these simulations, researchers can vary parameters of interest (e.g., input tree topology, amount of sequence divergence, rate of indels, read coverage, distance of reference genome, etc) to assess the effects of various parameter values on correctly calling SNPs and reconstructing an accurate tree. CONCLUSIONS: Such critical assessments of the accuracy and robustness of analytical pipelines are essential to progress in both research and applied settings.


Subject(s)
Genomics/methods , Phylogeny
11.
J Immunol ; 192(8): 3828-36, 2014 Apr 15.
Article in English | MEDLINE | ID: mdl-24646743

ABSTRACT

The IL-17 pathway is an established driver of psoriasis pathogenesis. We examined the detailed molecular and cellular effects of blockade of IL-17 signaling in human psoriatic skin before and following treatment with brodalumab, a competitive inhibitor of the IL-17 Receptor A subunit. Thousands of aberrantly expressed genes in lesional skin normalized within 2 weeks following brodalumab treatment, with conversion of the lesional psoriasis transcriptome to resemble that seen in nonlesional skin. Keratinocyte-expressed genes appeared to normalize rapidly, whereas T cell-specific normalization occurred over six weeks. The three IL-17 ligand genes that are upregulated in lesional skin, IL17A, IL17C, and IL17F, were all downregulated in a dose-dependent manner following brodalumab treatment. Cellular measures also showed a similar pattern with dramatic decreases in keratinocyte hyperplasia within one week, and decreases in infiltrating leukocytes occurred over a longer timescale. Individuals with the highest brodalumab exposure showed normalization of both IL-17-responsive genes and the psoriasis transcriptome, whereas subjects with lower exposures showed transient or incomplete molecular responses. Clinical and molecular response appeared dependent on the extent of brodalumab exposure relative to the expression of IL-17 ligand genes, and reduction of IL-17 signaling into the nonlesional range was strongly correlated with normalization of the psoriasis transcriptome. These data indicate that blockade of IL-17 signaling in psoriatic skin leads to rapid transcriptomal changes initially in keratinocyte-expressed genes, followed by normalization in the leukocyte abnormalities, and demonstrates the essential role of the IL-17R on keratinocytes in driving disease pathogenesis.


Subject(s)
Antibodies, Monoclonal/pharmacology , Psoriasis/genetics , Receptors, Interleukin-17/antagonists & inhibitors , Skin/drug effects , Skin/metabolism , Transcriptome , Antibodies, Monoclonal/therapeutic use , Antibodies, Monoclonal, Humanized , Cluster Analysis , Dose-Response Relationship, Drug , Gene Expression Profiling , Gene Expression Regulation/drug effects , Humans , Interferon-gamma/genetics , Interferon-gamma/metabolism , Interleukin-17/genetics , Interleukin-17/metabolism , Keratinocytes/drug effects , Keratinocytes/metabolism , Psoriasis/drug therapy , Skin/pathology , T-Lymphocytes/drug effects , T-Lymphocytes/metabolism
12.
mSystems ; 9(6): e0141523, 2024 Jun 18.
Article in English | MEDLINE | ID: mdl-38819130

ABSTRACT

Wastewater surveillance has emerged as a crucial public health tool for population-level pathogen surveillance. Supported by funding from the American Rescue Plan Act of 2021, the FDA's genomic epidemiology program, GenomeTrakr, was leveraged to sequence SARS-CoV-2 from wastewater sites across the United States. This initiative required the evaluation, optimization, development, and publication of new methods and analytical tools spanning sample collection through variant analyses. Version-controlled protocols for each step of the process were developed and published on protocols.io. A custom data analysis tool and a publicly accessible dashboard were built to facilitate real-time visualization of the collected data, focusing on the relative abundance of SARS-CoV-2 variants and sub-lineages across different samples and sites throughout the project. From September 2021 through June 2023, a total of 3,389 wastewater samples were collected, with 2,517 undergoing sequencing and submission to NCBI under the umbrella BioProject, PRJNA757291. Sequence data were released with explicit quality control (QC) tags on all sequence records, communicating our confidence in the quality of data. Variant analysis revealed wide circulation of Delta in the fall of 2021 and captured the sweep of Omicron and subsequent diversification of this lineage through the end of the sampling period. This project successfully achieved two important goals for the FDA's GenomeTrakr program: first, contributing timely genomic data for the SARS-CoV-2 pandemic response, and second, establishing both capacity and best practices for culture-independent, population-level environmental surveillance for other pathogens of interest to the FDA. IMPORTANCE: This paper serves two primary objectives. First, it summarizes the genomic and contextual data collected during a Covid-19 pandemic response project, which utilized the FDA's laboratory network, traditionally employed for sequencing foodborne pathogens, for sequencing SARS-CoV-2 from wastewater samples. Second, it outlines best practices for gathering and organizing population-level next generation sequencing (NGS) data collected for culture-free, surveillance of pathogens sourced from environmental samples.


Subject(s)
COVID-19 , SARS-CoV-2 , United States Food and Drug Administration , Wastewater , SARS-CoV-2/genetics , United States/epidemiology , Wastewater/virology , COVID-19/epidemiology , COVID-19/transmission , COVID-19/prevention & control , COVID-19/virology , Humans , Pandemics/prevention & control , Genome, Viral/genetics , Wastewater-Based Epidemiological Monitoring
13.
PeerJ ; 11: e14596, 2023.
Article in English | MEDLINE | ID: mdl-36721781

ABSTRACT

Background: The accurate identification of SARS-CoV-2 (SC2) variants and estimation of their abundance in mixed population samples (e.g., air or wastewater) is imperative for successful surveillance of community level trends. Assessing the performance of SC2 variant composition estimators (VCEs) should improve our confidence in public health decision making. Here, we introduce a linear regression based VCE and compare its performance to four other VCEs: two re-purposed DNA sequence read classifiers (Kallisto and Kraken2), a maximum-likelihood based method (Lineage deComposition for Sars-Cov-2 pooled samples (LCS)), and a regression based method (Freyja). Methods: We simulated DNA sequence datasets of known variant composition from both Illumina and Oxford Nanopore Technologies (ONT) platforms and assessed the performance of each VCE. We also evaluated VCEs performance using publicly available empirical wastewater samples collected for SC2 surveillance efforts. Bioinformatic analyses were performed with a custom NextFlow workflow (C-WAP, CFSAN Wastewater Analysis Pipeline). Relative root mean squared error (RRMSE) was used as a measure of performance with respect to the known abundance and concordance correlation coefficient (CCC) was used to measure agreement between pairs of estimators. Results: Based on our results from simulated data, Kallisto was the most accurate estimator as it had the lowest RRMSE, followed by Freyja. Kallisto and Freyja had the most similar predictions, reflected by the highest CCC metrics. We also found that accuracy was platform and amplicon panel dependent. For example, the accuracy of Freyja was significantly higher with Illumina data compared to ONT data; performance of Kallisto was best with ARTICv4. However, when analyzing empirical data there was poor agreement among methods and variations in the number of variants detected (e.g., Freyja ARTICv4 had a mean of 2.2 variants while Kallisto ARTICv4 had a mean of 10.1 variants). Conclusion: This work provides an understanding of the differences in performance of a number of VCEs and how accurate they are in capturing the relative abundance of SC2 variants within a mixed sample (e.g., wastewater). Such information should help officials gauge the confidence they can have in such data for informing public health decisions.


Subject(s)
COVID-19 , Humans , COVID-19/diagnosis , Likelihood Functions , SARS-CoV-2/genetics , Wastewater
14.
Microbiol Spectr ; 11(6): e0148223, 2023 Dec 12.
Article in English | MEDLINE | ID: mdl-37812012

ABSTRACT

IMPORTANCE: In developed countries, the human diet is predominated by food commodities, which have been manufactured, processed, and stored in a food production facility. Little is known about the application of metagenomic sequencing approaches for detecting foodborne pathogens, such as L. monocytogenes, and characterizing microbial diversity in food production ecosystems. In this work, we investigated the utility of 16S rRNA amplicon and quasimetagenomic sequencing for the taxonomic and phylogenetic classification of Listeria culture enrichments of environmental swabs collected from dairy and seafood production facilities. We demonstrated that single-nucleotide polymorphism (SNP) analyses of L. monocytogenes metagenome-assembled genomes (MAGs) from quasimetagenomic data sets can achieve similar resolution as culture isolate whole-genome sequencing. To further understand the impact of genome coverage on MAG SNP cluster resolution, an in silico downsampling approach was employed to reduce the percentage of target pathogen sequence reads, providing an initial estimate of required MAG coverage for subtyping resolution of L. monocytogenes.


Subject(s)
Listeria monocytogenes , Humans , Listeria monocytogenes/genetics , Food Microbiology , Phylogeny , RNA, Ribosomal, 16S/genetics , Ecosystem , Seafood
15.
Front Microbiol ; 13: 797997, 2022.
Article in English | MEDLINE | ID: mdl-35875579

ABSTRACT

Whole-genome sequence databases continue to grow. Collection times between samples are also growing, providing both a challenge for comparing recently collected sequence data to historical samples and an opportunity for evolutionary analyses that can be used to refine match criteria. We measured evolutionary rates for 22 Salmonella enterica serotypes. Based upon these measurements, we propose using an evolutionary rate of 1.97 single-nucleotide polymorphisms (SNPs) per year when determining whether genome sequences match.

16.
PLoS One ; 17(9): e0268470, 2022.
Article in English | MEDLINE | ID: mdl-36048885

ABSTRACT

Food production facilities are often routinely tested over time for the presence of foodborne pathogens (e.g., Listeria monocytogenes or Salmonella enterica subsp. enterica). Strains detected in a single sampling event can be classified as transient; positive findings of the same strain across multiple sampling events can be classified as resident pathogens. We analyzed whole-genome sequence (WGS) data from 4,758 isolates (L. monocytogenes = 3,685; Salmonella = 1,073) from environmental samples taken by FDA from 536 U.S. facilities. Our primary objective was to determine the frequency of transient or resident pathogens within food production facilities. Strains were defined as isolates from the same facility that are less than 50 SNP (single-nucleotide polymorphisms) different from one another. Resident pathogens were defined as strains that had more than one isolate collected >59 days apart and from the same facility. We found 1,076 strains (median = 1 and maximum = 21 strains per facility); 180 were resident pathogens, 659 were transient, and 237 came from facilities that had only been sampled once. As a result, 21% of strains (180/ 839) from facilities with positive findings and that were sampled multiple times were found to be resident pathogens; nearly 1 in 4 (23%) of L. monocytogenes strains were found to be resident pathogens compared to 1 in 6 (16%) of Salmonella strains. Our results emphasize the critical importance of preventing the colonization of food production environments by foodborne pathogens, since when colonization does occur, there is an appreciable chance it will become a resident pathogen that presents an ongoing potential to contaminate product.


Subject(s)
Listeria monocytogenes , Salmonella enterica , Food Handling , Food Microbiology , Genetic Variation , Genome, Bacterial , Listeria monocytogenes/genetics , Salmonella/genetics , Salmonella enterica/genetics
17.
Front Microbiol ; 12: 714284, 2021.
Article in English | MEDLINE | ID: mdl-34659144

ABSTRACT

Carbapenems-one of the important last-line antibiotics for the treatment of gram-negative infections-are becoming ineffective for treating Acinetobacter baumannii infections. Studies have identified multiple genes (and mechanisms) responsible for carbapenem resistance. In some A. baumannii strains, the presence/absence of putative resistance genes is not consistent with their resistance phenotype-indicating the genomic factors underlying carbapenem resistance in A. baumannii are not fully understood. Here, we describe a large-scale whole-genome genotype-phenotype association study with 349 A. baumannii isolates that extends beyond the presence/absence of individual antimicrobial resistance genes and includes the genomic positions and pairwise interactions of genes. Ten known resistance genes exhibited statistically significant associations with resistance to imipenem, a type of carbapenem: blaOXA-23, qacEdelta1, sul1, mphE, msrE, ant(3")-II, aacC1, yafP, aphA6, and xerD. A review of the strains without any of these 10 genes uncovered a clade of isolates with diverse imipenem resistance phenotypes. Finer resolution evaluation of this clade revealed the presence of a 38.6 kbp conserved chromosomal region found exclusively in imipenem-susceptible isolates. This region appears to host several HTH-type DNA binding transcriptional regulators and transporter genes. Imipenem-susceptible isolates from this clade also carried two mutually exclusive plasmids that contain genes previously known to be specific to imipenem-susceptible isolates. Our analysis demonstrates the utility of using whole genomes for genotype-phenotype correlations in the context of antibiotic resistance and provides several new hypotheses for future research.

18.
Sci Data ; 7(1): 402, 2020 11 19.
Article in English | MEDLINE | ID: mdl-33214563

ABSTRACT

The US PulseNet and GenomeTrakr laboratory networks work together within the Genomics for Food Safety (Gen-FS) consortium to collect and analyze genomic data for foodborne pathogen surveillance (species include Salmonella enterica, Listeria monocytogenes, Escherichia coli (STECs), and Campylobactor). In 2017 these two laboratory networks started harmonizing their respective proficiency test exercises, agreeing on distributing a single strain-set and following the same standard operating procedure (SOP) for genomic data collection, running a jointly coordinated annual proficiency test exercise. In this data release we are publishing the reference genomes and raw data submissions for the 2017 and 2018 proficiency test exercises.


Subject(s)
Food Microbiology/methods , Food Safety , Genomics/standards , Laboratories/standards , Campylobacter/isolation & purification , Escherichia coli/isolation & purification , Genome, Bacterial , Listeria monocytogenes/isolation & purification , Salmonella enterica/isolation & purification , United States
19.
Genome Biol ; 20(1): 286, 2019 12 18.
Article in English | MEDLINE | ID: mdl-31849328

ABSTRACT

Although it is assumed that contamination in bacterial whole-genome sequencing causes errors, the influences of contamination on clustering analyses, such as single-nucleotide polymorphism discovery, phylogenetics, and multi-locus sequencing typing, have not been quantified. By developing and analyzing 720 Listeria monocytogenes, Salmonella enterica, and Escherichia coli short-read datasets, we demonstrate that within-species contamination causes errors that confound clustering analyses, while between-species contamination generally does not. Contaminant reads mapping to references or becoming incorporated into chimeric sequences during assembly are the sources of those errors. Contamination sufficient to influence clustering analyses is present in public sequence databases.


Subject(s)
DNA Contamination , Genome, Bacterial , Whole Genome Sequencing , Cluster Analysis , Escherichia coli/genetics , Listeria monocytogenes/genetics , Salmonella enterica/genetics
20.
Infect Genet Evol ; 73: 214-220, 2019 09.
Article in English | MEDLINE | ID: mdl-31039448

ABSTRACT

We review how FDA surveillance identifies several ways that whole genome sequencing (WGS) improves actionable outcomes for public health and compliance in a case involving Listeria monocytogenes contamination in an ice cream facility. In late August 2017 FDA conducted environmental sampling inside an ice cream facility. These isolates were sequenced and deposited into the GenomeTrakr databases. In September 2018 the Centers for Disease Control and Prevention contacted the Florida Department of Health after finding that the pathogen analyses of three clinical cases of listeriosis (two in 2013, one in 2018) were highly related to the aforementioned L. monocytogenes isolates collected from the ice cream facility. in 2017. FDA returned to the ice cream facility in late September 2018 and conducted further environmental sampling and again recovered L. monocytogenes from environmental subsamples that were genetically related to the clinical cases. A voluntary recall was issued to include all ice cream manufactured from August 2017 to October 2018. Subsequently, FDA suspended this food facility's registration. WGS results for L. monocytogenes found in the facility and from clinical samples clustered together by 0-31 single nucleotide polymorphisms (SNPs). The FDA worked together with the Centers for Disease Control and Prevention, as well as the Florida Department of Health, and the Florida Department of Agriculture and Consumer Services to recall all ice cream products produced by this facility. Our data suggests that when available isolates from food facility inspections are subject to whole genome sequencing and the subsequent sequence data point to linkages between these strains and recent clinical isolates (i.e., <20 nucleotide differences), compliance officials should take regulatory actions early to prevent further potential illness. The utility of WGS for applications related to enforcement of FDA compliance programs in the context of foodborne pathogens is reviewed.


Subject(s)
Food Microbiology , Ice Cream/microbiology , Listeria/genetics , Listeria/isolation & purification , Whole Genome Sequencing , Food Industry , Humans , Manufacturing and Industrial Facilities
SELECTION OF CITATIONS
SEARCH DETAIL