Search | VHL Regional Portal

1.

Rapid identification of enteric bacteria from whole genome sequences using average nucleotide identity metrics.

Lindsey, Rebecca L; Gladney, Lori M; Huang, Andrew D; Griswold, Taylor; Katz, Lee S; Dinsmore, Blake A; Im, Monica S; Kucerova, Zuzana; Smith, Peyton A; Lane, Charlotte; Carleton, Heather A.

Front Microbiol ; 14: 1225207, 2023.

Article in English | MEDLINE | ID: mdl-38156000

ABSTRACT

Identification of enteric bacteria species by whole genome sequence (WGS) analysis requires a rapid and an easily standardized approach. We leveraged the principles of average nucleotide identity using MUMmer (ANIm) software, which calculates the percent bases aligned between two bacterial genomes and their corresponding ANI values, to set threshold values for determining species consistent with the conventional identification methods of known species. The performance of species identification was evaluated using two datasets: the Reference Genome Dataset v2 (RGDv2), consisting of 43 enteric genome assemblies representing 32 species, and the Test Genome Dataset (TGDv1), comprising 454 genome assemblies which is designed to represent all species needed to query for identification, as well as rare and closely related species. The RGDv2 contains six Campylobacter spp., three Escherichia/Shigella spp., one Grimontia hollisae, six Listeria spp., one Photobacterium damselae, two Salmonella spp., and thirteen Vibrio spp., while the TGDv1 contains 454 enteric bacterial genomes representing 42 different species. The analysis showed that, when a standard minimum of 70% genome bases alignment existed, the ANI threshold values determined for these species were ≥95 for Escherichia/Shigella and Vibrio species, ≥93% for Salmonella species, and ≥92% for Campylobacter and Listeria species. Using these metrics, the RGDv2 accurately classified all validation strains in TGDv1 at the species level, which is consistent with the classification based on previous gold standard methods.

2.

Predicting Food Sources of Listeria monocytogenes Based on Genomic Profiling Using Random Forest Model.

Gu, Weidong; Cui, Zhaohui; Stroika, Steven; Carleton, Heather A; Conrad, Amanda; Katz, Lee S; Richardson, LaTonia C; Hunter, Jennifer; Click, Eleanor S; Bruce, Beau B.

Foodborne Pathog Dis ; 20(12): 579-586, 2023 12.

Article in English | MEDLINE | ID: mdl-37699246

ABSTRACT

Listeria monocytogenes can cause severe foodborne illness, including miscarriage during pregnancy or death in newborn infants. When outbreaks of L. monocytogenes illness occur, it may be possible to determine the food source of the outbreak. However, most reported L. monocytogenes illnesses do not occur as part of a recognized outbreak and most of the time the food source of sporadic L. monocytogenes illness in people cannot be determined. In the United States, L. monocytogenes isolates from patients, foods, and environments are routinely sequenced and analyzed by whole genome multilocus sequence typing (wgMLST) for outbreak detection by PulseNet, the national molecular surveillance system for foodborne illnesses. We investigated whether machine learning approaches applied to wgMLST allele call data could assist in attribution analysis of food source of L. monocytogenes isolates. We compiled isolates with a known source from five food categories (dairy, fruit, meat, seafood, and vegetable) using the metadata of L. monocytogenes isolates in PulseNet, deduplicated closely genetically related isolates, and developed random forest models to predict the food sources of isolates. Prediction accuracy of the final model varied across the food categories; it was highest for meat (65%), followed by fruit (45%), vegetable (45%), dairy (44%), and seafood (37%); overall accuracy was 49%, compared with the naive prediction accuracy of 28%. Our results show that random forest can be used to capture genetically complex features of high-resolution wgMLST for attribution of isolates to their sources.

Subject(s)

Foodborne Diseases , Listeria monocytogenes , Listeriosis , Infant , Infant, Newborn , Humans , United States/epidemiology , Listeriosis/epidemiology , Random Forest , Food Microbiology , Foodborne Diseases/epidemiology , Multilocus Sequence Typing , Disease Outbreaks , Vegetables , Genomics

3.

Reoccurring Escherichia coli O157:H7 Strain Linked to Leafy Greens-Associated Outbreaks, 2016-2019.

Chen, Jessica C; Patel, Kane; Smith, Peyton A; Vidyaprakash, Eshaw; Snyder, Caroline; Tagg, Kaitlin A; Webb, Hattie E; Schroeder, Morgan N; Katz, Lee S; Rowe, Lori A; Howard, Dakota; Griswold, Taylor; Lindsey, Rebecca L; Carleton, Heather A.

Emerg Infect Dis ; 29(9): 1895-1899, 2023 09.

Article in English | MEDLINE | ID: mdl-37610207

ABSTRACT

Genomic characterization of an Escherichia coli O157:H7 strain linked to leafy greens-associated outbreaks dates its emergence to late 2015. One clade has notable accessory genomic content and a previously described mutation putatively associated with increased arsenic tolerance. This strain is a reoccurring, emerging, or persistent strain causing illness over an extended period.

Subject(s)

Escherichia coli O157 , Escherichia coli O157/genetics , Disease Outbreaks , Genomics , Mutation

4.

Characterization of a Nonagglutinating Toxigenic Vibrio cholerae Isolate.

Gladney, Lori M; Griswold, Taylor; Turnsek, Maryann; Im, Monica S; Parsons, Michele M B; Katz, Lee S; Tarr, Cheryl L; Lee, Christine C.

Microbiol Spectr ; 11(3): e0018223, 2023 06 15.

Article in English | MEDLINE | ID: mdl-37195209

ABSTRACT

Toxigenic Vibrio cholerae serogroup O1 is the etiologic agent of the disease cholera, and strains of this serogroup are responsible for pandemics. A few other serogroups have been found to carry cholera toxin genes-most notably, O139, O75, and O141-and public health surveillance in the United States is focused on these four serogroups. A toxigenic isolate was recovered from a case of vibriosis from Texas in 2008. This isolate did not agglutinate with any of the four different serogroups' antisera (O1, O139, O75, or O141) routinely used in phenotypic testing and did not display a rough phenotype. We investigated several hypotheses that might explain the recovery of this potential nonagglutinating (NAG) strain using whole-genome sequencing analysis and phylogenetic methods. The NAG strain formed a monophyletic cluster with O141 strains in a whole-genome phylogeny. Furthermore, a phylogeny of ctxAB and tcpA sequences revealed that the sequences from the NAG strain also formed a monophyletic cluster with toxigenic U.S. Gulf Coast (USGC) strains (O1, O75, and O141) that were recovered from vibriosis cases associated with exposures to Gulf Coast waters. A comparison of the NAG whole-genome sequence showed that the O-antigen-determining region of the NAG strain was closely related to those of O141 strains, and specific mutations were likely responsible for the inability to agglutinate. This work shows the utility of whole-genome sequence analysis tools for characterization of an atypical clinical isolate of V. cholerae originating from a USGC state. IMPORTANCE Clinical cases of vibriosis are on the rise due to climate events and ocean warming (1, 2), and increased surveillance of toxigenic Vibrio cholerae strains is now more crucial than ever. While traditional phenotyping using antisera against O1 and O139 is useful for monitoring currently circulating strains with pandemic or epidemic potential, reagents are limited for non-O1/non-O139 strains. With the increased use of next-generation sequencing technologies, analysis of less well-characterized strains and O-antigen regions is possible. The framework for advanced molecular analysis of O-antigen-determining regions presented herein will be useful in the absence of reagents for serotyping. Furthermore, molecular analyses based on whole-genome sequence data and using phylogenetic methods will help characterize both historical and novel strains of clinical importance. Closely monitoring emerging mutations and trends will improve our understanding of the epidemic potential of Vibrio cholerae to anticipate and rapidly respond to future public health emergencies.

Subject(s)

Cholera , Vibrio Infections , Vibrio cholerae , United States , Humans , Vibrio cholerae/genetics , Phylogeny , O Antigens/genetics

5.

Genome Sequences from a Reemergence of Vibrio cholerae in Haiti, 2022 Reveal Relatedness to Previously Circulating Strains.

Walters, Cynney; Chen, Jessica; Stroika, Steven; Katz, Lee S; Turnsek, Maryann; Compère, Valusnor; Im, Monica S; Gomez, Suzanna; McCullough, Andre; Landaverde, Clarissa; Putney, Jordan; Caidi, Hayat; Folster, Jason; Carleton, Heather A; Boncy, Jacques; Lee, Christine C.

J Clin Microbiol ; 61(3): e0014223, 2023 03 23.

Article in English | MEDLINE | ID: mdl-36877025

Subject(s)

Cholera , Vibrio cholerae O1 , Vibrio cholerae , Humans , Vibrio cholerae/genetics , Haiti/epidemiology , Cholera/epidemiology , Genome, Bacterial , Vibrio cholerae O1/genetics , Disease Outbreaks

6.

Cronobacter sakazakii Infections in Two Infants Linked to Powdered Infant Formula and Breast Pump Equipment - United States, 2021 and 2022.

Haston, Julia C; Miko, Shanna; Cope, Jennifer R; McKeel, Haley; Walters, Cynney; Joseph, Lavin A; Griswold, Taylor; Katz, Lee S; Andújar, Ashley A; Tourdot, Laura; Rounds, Joshua; Vagnone, Paula; Medus, Carlota; Harris, JoAnn; Geist, Robert; Neises, Daniel; Wiggington, Ashley; Smith, Trey; Im, Monica S; Wheeler, Courtney; Smith, Peyton; Carleton, Heather A; Lee, Christine C.

MMWR Morb Mortal Wkly Rep ; 72(9): 223-226, 2023 Mar 03.

Article in English | MEDLINE | ID: mdl-36862586

ABSTRACT

Cronobacter sakazakii, a species of gram-negative bacteria belonging to the Enterobacteriaceae family, is known to cause severe and often fatal meningitis and sepsis in young infants. C. sakazakii is ubiquitous in the environment, and most reported infant cases have been attributed to contaminated powdered infant formula (powdered formula) or breast milk that was expressed using contaminated breast pump equipment (1-3). Previous investigations of cases and outbreaks have identified C. sakazakii in opened powdered formula, breast pump parts, environmental surfaces in the home, and, rarely, in unopened powdered formula and formula manufacturing facilities (2,4-6). This report describes two infants with C. sakazakii meningitis reported to CDC in September 2021 and February 2022. CDC used whole genome sequencing (WGS) analysis to link one case to contaminated opened powdered formula from the patient's home and the other to contaminated breast pump equipment. These cases highlight the importance of expanding awareness about C. sakazakii infections in infants, safe preparation and storage of powdered formula, proper cleaning and sanitizing of breast pump equipment, and using WGS as a tool for C. sakazakii investigations.

Subject(s)

Cronobacter sakazakii , Enterobacteriaceae Infections , Female , Infant , Humans , Infant Formula , Cronobacter sakazakii/genetics , Enterobacteriaceae Infections/diagnosis , Enterobacteriaceae , Milk, Human , Powders

7.

Genome Sequences of Hemolytic and Nonhemolytic Listeria innocua Strains from Human, Food, and Environmental Sources.

McIntosh, Tori; Kucerova, Zuzana; Katz, Lee S; Lilley, Cullen M; Rowe, Lori A; Unoarumhi, Yvette; Batra, Dhwani; Burnett, Elton; Smikle, Monica; Lee, Christine.

Microbiol Resour Announc ; 11(12): e0072322, 2022 Dec 15.

Article in English | MEDLINE | ID: mdl-36445150

ABSTRACT

This report describes genome sequences for nine Listeria innocua strains that varied in hemolytic phenotypes on sheep blood agar. All strains were sequenced using Pacific Biosciences (PacBio) single-molecule real-time (SMRT) chemistry; overall, the average read length of these sequences was 2,869,880 bp, with an average GC content of 37%.

8.

Benchmark datasets for SARS-CoV-2 surveillance bioinformatics.

Xiaoli, Lingzi; Hagey, Jill V; Park, Daniel J; Gulvik, Christopher A; Young, Erin L; Alikhan, Nabil-Fareed; Lawsin, Adrian; Hassell, Norman; Knipe, Kristen; Oakeson, Kelly F; Retchless, Adam C; Shakya, Migun; Lo, Chien-Chi; Chain, Patrick; Page, Andrew J; Metcalf, Benjamin J; Su, Michelle; Rowell, Jessica; Vidyaprakash, Eshaw; Paden, Clinton R; Huang, Andrew D; Roellig, Dawn; Patel, Ketan; Winglee, Kathryn; Weigand, Michael R; Katz, Lee S.

PeerJ ; 10: e13821, 2022.

Article in English | MEDLINE | ID: mdl-36093336

ABSTRACT

Background: Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the cause of coronavirus disease 2019 (COVID-19), has spread globally and is being surveilled with an international genome sequencing effort. Surveillance consists of sample acquisition, library preparation, and whole genome sequencing. This has necessitated a classification scheme detailing Variants of Concern (VOC) and Variants of Interest (VOI), and the rapid expansion of bioinformatics tools for sequence analysis. These bioinformatic tools are means for major actionable results: maintaining quality assurance and checks, defining population structure, performing genomic epidemiology, and inferring lineage to allow reliable and actionable identification and classification. Additionally, the pandemic has required public health laboratories to reach high throughput proficiency in sequencing library preparation and downstream data analysis rapidly. However, both processes can be limited by a lack of a standardized sequence dataset. Methods: We identified six SARS-CoV-2 sequence datasets from recent publications, public databases and internal resources. In addition, we created a method to mine public databases to identify representative genomes for these datasets. Using this novel method, we identified several genomes as either VOI/VOC representatives or non-VOI/VOC representatives. To describe each dataset, we utilized a previously published datasets format, which describes accession information and whole dataset information. Additionally, a script from the same publication has been enhanced to download and verify all data from this study. Results: The benchmark datasets focus on the two most widely used sequencing platforms: long read sequencing data from the Oxford Nanopore Technologies platform and short read sequencing data from the Illumina platform. There are six datasets: three were derived from recent publications; two were derived from data mining public databases to answer common questions not covered by published datasets; one unique dataset representing common sequence failures was obtained by rigorously scrutinizing data that did not pass quality checks. The dataset summary table, data mining script and quality control (QC) values for all sequence data are publicly available on GitHub: https://github.com/CDCgov/datasets-sars-cov-2. Discussion: The datasets presented here were generated to help public health laboratories build sequencing and bioinformatics capacity, benchmark different workflows and pipelines, and calibrate QC thresholds to ensure sequencing quality. Together, improvements in these areas support accurate and timely outbreak investigation and surveillance, providing actionable data for pandemic management. Furthermore, these publicly available and standardized benchmark data will facilitate the development and adjudication of new pipelines.

Subject(s)

COVID-19 , SARS-CoV-2 , Humans , SARS-CoV-2/genetics , COVID-19/epidemiology , Benchmarking , Computational Biology , Sequence Analysis

9.

Molecular characterization of circulating Salmonella Typhi strains in an urban informal settlement in Kenya.

Ochieng, Caroline; Chen, Jessica C; Osita, Mike Powel; Katz, Lee S; Griswold, Taylor; Omballa, Victor; Ng'eno, Eric; Ouma, Alice; Wamola, Newton; Opiyo, Christine; Achieng, Loicer; Munywoki, Patrick K; Hendriksen, Rene S; Freeman, Molly; Mikoleit, Matthew; Juma, Bonventure; Bigogo, Godfrey; Mintz, Eric; Verani, Jennifer R; Hunsperger, Elizabeth; Carleton, Heather A.

PLoS Negl Trop Dis ; 16(8): e0010704, 2022 08.

Article in English | MEDLINE | ID: mdl-36007074

ABSTRACT

A high burden of Salmonella enterica subspecies enterica serovar Typhi (S. Typhi) bacteremia has been reported from urban informal settlements in sub-Saharan Africa, yet little is known about the introduction of these strains to the region. Understanding regional differences in the predominant strains of S. Typhi can provide insight into the genomic epidemiology. We genetically characterized 310 S. Typhi isolates from typhoid fever surveillance conducted over a 12-year period (2007-2019) in Kibera, an urban informal settlement in Nairobi, Kenya, to assess the circulating strains, their antimicrobial resistance attributes, and how they relate to global S. Typhi isolates. Whole genome multi-locus sequence typing (wgMLST) identified 4 clades, with up to 303 pairwise allelic differences. The identified genotypes correlated with wgMLST clades. The predominant clade contained 290 (93.5%) isolates with a median of 14 allele differences (range 0-52) and consisted entirely of genotypes 4.3.1.1 and 4.3.1.2. Resistance determinants were identified exclusively in the predominant clade. Determinants associated with resistance to aminoglycosides were observed in 245 isolates (79.0%), sulphonamide in 243 isolates (78.4%), trimethoprim in 247 isolates (79.7%), tetracycline in 224 isolates (72.3%), chloramphenicol in 247 isolates (79.6%), ß-lactams in 239 isolates (77.1%) and quinolones in 62 isolates (20.0%). Multidrug resistance (MDR) determinants (defined as determinants conferring resistance to ampicillin, chloramphenicol and cotrimoxazole) were found in 235 (75.8%) isolates. The prevalence of MDR associated genes was similar throughout the study period (2007-2012: 203, 76.3% vs 2013-2019: 32, 72.7%; Fisher's Exact Test: P = 0.5478, while the proportion of isolates harboring quinolone resistance determinants increased (2007-2012: 42, 15.8% and 2013-2019: 20, 45.5%; Fisher's Exact Test: P<0.0001) following a decline in S. Typhi in Kibera. Some isolates (49, 15.8%) harbored both MDR and quinolone resistance determinants. There were no determinants associated with resistance to cephalosporins or azithromycin detected among the isolates sequenced in this study. Plasmid markers were only identified in the main clade including IncHI1A and IncHI1B(R27) in 226 (72.9%) isolates, and IncQ1 in 238 (76.8%) isolates. Molecular clock analysis of global typhoid isolates and isolates from Kibera suggests that genotype 4.3.1 has been introduced multiple times in Kibera. Several genomes from Kibera formed a clade with genomes from Kenya, Malawi, South Africa, and Tanzania. The most recent common ancestor (MRCA) for these isolates was from around 1997. Another isolate from Kibera grouped with several isolates from Uganda, sharing a common ancestor from around 2009. In summary, S. Typhi in Kibera belong to four wgMLST clades one of which is frequently associated with MDR genes and this poses a challenge in treatment and control.

Subject(s)

Quinolones , Typhoid Fever , Anti-Bacterial Agents/pharmacology , Chloramphenicol , Humans , Kenya/epidemiology , Microbial Sensitivity Tests , Multilocus Sequence Typing , Salmonella typhi , Typhoid Fever/epidemiology

10.

Software testing in microbial bioinformatics: a call to action.

van der Putten, Boas C L; Mendes, C I; Talbot, Brooke M; de Korne-Elenbaas, Jolinda; Mamede, Rafael; Vila-Cerqueira, Pedro; Coelho, Luis Pedro; Gulvik, Christopher A; Katz, Lee S.

Microb Genom ; 8(3)2022 03.

Article in English | MEDLINE | ID: mdl-35259087

ABSTRACT

Computational algorithms have become an essential component of research, with great efforts by the scientific community to raise standards on development and distribution of code. Despite these efforts, sustainability and reproducibility are major issues since continued validation through software testing is still not a widely adopted practice. Here, we report seven recommendations that help researchers implement software testing in microbial bioinformatics. We have developed these recommendations based on our experience from a collaborative hackathon organised prior to the American Society for Microbiology Next Generation Sequencing (ASM NGS) 2020 conference. We also present a repository hosting examples and guidelines for testing, available from https://github.com/microbinfie-hackathon2020/CSIS.

Subject(s)

Computational Biology , Software , Algorithms , High-Throughput Nucleotide Sequencing , Reproducibility of Results , United States

11.

Use of Whole Genome Sequencing by the Federal Interagency Collaboration for Genomics for Food and Feed Safety in the United States.

Stevens, Eric L; Carleton, Heather A; Beal, Jennifer; Tillman, Glenn E; Lindsey, Rebecca L; Lauer, A C; Pightling, Arthur; Jarvis, Karen G; Ottesen, Andrea; Ramachandran, Padmini; Hintz, Leslie; Katz, Lee S; Folster, Jason P; Whichard, Jean M; Trees, Eija; Timme, Ruth E; McDERMOTT, Patrick; Wolpert, Beverly; Bazaco, Michael; Zhao, Shaohua; Lindley, Sabina; Bruce, Beau B; Griffin, Patricia M; Brown, Eric; Allard, Marc; Tallent, Sandra; Irvin, Kari; Hoffmann, Maria; Wise, Matt; Tauxe, Robert; Gerner-Smidt, Peter; Simmons, Mustafa; Kissler, Bonnie; Defibaugh-Chavez, Stephanie; Klimke, William; Agarwala, Richa; Lindsay, James; Cook, Kimberly; Austerman, Suelee Robbe; Goldman, David; McGARRY, Sherri; Hale, Kis Robertson; Dessai, Uday; Musser, Steven M; Braden, Chris.

J Food Prot ; 85(5): 755-772, 2022 05 01.

Article in English | MEDLINE | ID: mdl-35259246

ABSTRACT

ABSTRACT: This multiagency report developed by the Interagency Collaboration for Genomics for Food and Feed Safety provides an overview of the use of and transition to whole genome sequencing (WGS) technology for detection and characterization of pathogens transmitted commonly by food and for identification of their sources. We describe foodborne pathogen analysis, investigation, and harmonization efforts among the following federal agencies: National Institutes of Health; Department of Health and Human Services, Centers for Disease Control and Prevention (CDC) and U.S. Food and Drug Administration (FDA); and the U.S. Department of Agriculture, Food Safety and Inspection Service, Agricultural Research Service, and Animal and Plant Health Inspection Service. We describe single nucleotide polymorphism, core-genome, and whole genome multilocus sequence typing data analysis methods as used in the PulseNet (CDC) and GenomeTrakr (FDA) networks, underscoring the complementary nature of the results for linking genetically related foodborne pathogens during outbreak investigations while allowing flexibility to meet the specific needs of Interagency Collaboration partners. We highlight how we apply WGS to pathogen characterization (virulence and antimicrobial resistance profiles) and source attribution efforts and increase transparency by making the sequences and other data publicly available through the National Center for Biotechnology Information. We also highlight the impact of current trends in the use of culture-independent diagnostic tests for human diagnostic testing on analytical approaches related to food safety and what is next for the use of WGS in the area of food safety.

Subject(s)

Foodborne Diseases , Animals , Disease Outbreaks/prevention & control , Food Safety , Foodborne Diseases/epidemiology , Foodborne Diseases/prevention & control , Genomics , United States , Whole Genome Sequencing

12.

Future-proofing and maximizing the utility of metadata: The PHA4GE SARS-CoV-2 contextual data specification package.

Griffiths, Emma J; Timme, Ruth E; Mendes, Catarina Inês; Page, Andrew J; Alikhan, Nabil-Fareed; Fornika, Dan; Maguire, Finlay; Campos, Josefina; Park, Daniel; Olawoye, Idowu B; Oluniyi, Paul E; Anderson, Dominique; Christoffels, Alan; da Silva, Anders Gonçalves; Cameron, Rhiannon; Dooley, Damion; Katz, Lee S; Black, Allison; Karsch-Mizrachi, Ilene; Barrett, Tanya; Johnston, Anjanette; Connor, Thomas R; Nicholls, Samuel M; Witney, Adam A; Tyson, Gregory H; Tausch, Simon H; Raphenya, Amogelang R; Alcock, Brian; Aanensen, David M; Hodcroft, Emma; Hsiao, William W L; Vasconcelos, Ana Tereza R; MacCannell, Duncan R.

Gigascience ; 112022 02 16.

Article in English | MEDLINE | ID: mdl-35169842

ABSTRACT

BACKGROUND: The Public Health Alliance for Genomic Epidemiology (PHA4GE) (https://pha4ge.org) is a global coalition that is actively working to establish consensus standards, document and share best practices, improve the availability of critical bioinformatics tools and resources, and advocate for greater openness, interoperability, accessibility, and reproducibility in public health microbial bioinformatics. In the face of the current pandemic, PHA4GE has identified a need for a fit-for-purpose, open-source SARS-CoV-2 contextual data standard. RESULTS: As such, we have developed a SARS-CoV-2 contextual data specification package based on harmonizable, publicly available community standards. The specification can be implemented via a collection template, as well as an array of protocols and tools to support both the harmonization and submission of sequence data and contextual information to public biorepositories. CONCLUSIONS: Well-structured, rich contextual data add value, promote reuse, and enable aggregation and integration of disparate datasets. Adoption of the proposed standard and practices will better enable interoperability between datasets and systems, improve the consistency and utility of generated data, and ultimately facilitate novel insights and discoveries in SARS-CoV-2 and COVID-19. The package is now supported by the NCBI's BioSample database.

Subject(s)

COVID-19 , SARS-CoV-2 , Genomics , Humans , Metadata , Public Health , Reproducibility of Results

13.

Evaluating whole-genome sequencing quality metrics for enteric pathogen outbreaks.

Wagner, Darlene D; Carleton, Heather A; Trees, Eija; Katz, Lee S.

PeerJ ; 9: e12446, 2021.

Article in English | MEDLINE | ID: mdl-34900416

ABSTRACT

BACKGROUND: Whole genome sequencing (WGS) has gained increasing importance in responses to enteric bacterial outbreaks. Common analysis procedures for WGS, single nucleotide polymorphisms (SNPs) and genome assembly, are highly dependent upon WGS data quality. METHODS: Raw, unprocessed WGS reads from Escherichia coli, Salmonella enterica, and Shigella sonnei outbreak clusters were characterized for four quality metrics: PHRED score, read length, library insert size, and ambiguous nucleotide composition. PHRED scores were strongly correlated with improved SNPs analysis results in E. coli and S. enterica clusters. RESULTS: Assembly quality showed only moderate correlations with PHRED scores and library insert size, and then only for Salmonella. To improve SNP analyses and assemblies, we compared seven read-healing pipelines to improve these four quality metrics and to see how well they improved SNP analysis and genome assembly. The most effective read healing pipelines for SNPs analysis incorporated quality-based trimming, fixed-width trimming, or both. The Lyve-SET SNPs pipeline showed a more marked improvement than the CFSAN SNP Pipeline, but the latter performed better on raw, unhealed reads. For genome assembly, SPAdes enabled significant improvements in healed E. coli reads only, while Skesa yielded no significant improvements on healed reads. CONCLUSIONS: PHRED scores will continue to be a crucial quality metric albeit not of equal impact across all types of analyses for all enteric bacteria. While trimming-based read healing performed well for SNPs analyses, different read healing approaches are likely needed for genome assembly or other, emerging WGS analysis methodologies.

14.

Clinical and Laboratory Findings in Patients With Potential Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) Reinfection, May-July 2020.

Lee, James T; Hesse, Elisabeth M; Paulin, Heather N; Datta, Deblina; Katz, Lee S; Talwar, Amish; Chang, Gregory; Galang, Romeo R; Harcourt, Jennifer L; Tamin, Azaibi; Thornburg, Natalie J; Wong, Karen K; Stevens, Valerie; Kim, Kaylee; Tong, Suxiang; Zhou, Bin; Queen, Krista; Drobeniuc, Jan; Folster, Jennifer M; Sexton, D Joseph; Ramachandran, Sumathi; Browne, Hannah; Iskander, John; Mitruka, Kiren.

Clin Infect Dis ; 73(12): 2217-2225, 2021 12 16.

Article in English | MEDLINE | ID: mdl-33598716

ABSTRACT

BACKGROUND: We investigated patients with potential severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) reinfection in the United States during May-July 2020. METHODS: We conducted case finding for patients with potential SARS-CoV-2 reinfection through the Emerging Infections Network. Cases reported were screened for laboratory and clinical findings of potential reinfection followed by requests for medical records and laboratory specimens. Available medical records were abstracted to characterize patient demographics, comorbidities, clinical course, and laboratory test results. Submitted specimens underwent further testing, including reverse transcription polymerase chain reaction (RT-PCR), viral culture, whole genome sequencing, subgenomic RNA PCR, and testing for anti-SARS-CoV-2 total antibody. RESULTS: Among 73 potential reinfection patients with available records, 30 patients had recurrent coronavirus disease 2019 (COVID-19) symptoms explained by alternative diagnoses with concurrent SARS-CoV-2 positive RT-PCR, 24 patients remained asymptomatic after recovery but had recurrent or persistent RT-PCR, and 19 patients had recurrent COVID-19 symptoms with concurrent SARS-CoV-2 positive RT-PCR but no alternative diagnoses. These 19 patients had symptom recurrence a median of 57 days after initial symptom onset (interquartile range: 47-76). Six of these patients had paired specimens available for further testing, but none had laboratory findings confirming reinfections. Testing of an additional 3 patients with recurrent symptoms and alternative diagnoses also did not confirm reinfection. CONCLUSIONS: We did not confirm SARS-CoV-2 reinfection within 90 days of the initial infection based on the clinical and laboratory characteristics of cases in this investigation. Our findings support current Centers for Disease Control and Prevention (CDC) guidance around quarantine and testing for patients who have recovered from COVID-19.

Subject(s)

COVID-19 , SARS-CoV-2 , Antibodies, Viral , Humans , Laboratories , Reinfection

15.

SneakerNet: A modular quality assurance and quality check workflow for primary genomic and metagenomic read data.

Griswold, Taylor; Kapsak, Curtis; Chen, Jessica C; den Bakker, Henk C; Williams, Grant; Kelley, Alyssa; Vidyaprakash, Eshaw; Katz, Lee S.

J Open Source Softw ; 6(60)2021 Apr 16.

Article in English | MEDLINE | ID: mdl-35360664

ABSTRACT

Laboratories that run Whole Genome Sequencing (WGS) produce a tremendous amount of data, up to 10 gigabytes for some common instruments. There is a need to standardize the quality assurance and quality control process (QA/QC). Therefore we have created SneakerNet to automate the QA/QC for WGS.

16.

Sequencing and Characterization of Five Extensively Drug-Resistant Salmonella enterica Serotype Typhi Isolates Implicated in Human Infections from Punjab, Pakistan.

Tagg, Kaitlin A; Amir, Afreenish; Ikram, Aamer; Chen, Jessica C; Kim, Justin Y; Meservey, Elizabeth; Joung, Yoo J; Halpin, Jessica L; Batra, Dhwani; Leeper, Molly M; Katz, Lee S; Saeed, Asim; Freeman, Molly; Watkins, Louise Francois; Salman, Muhammad; Folster, Jason P.

Microbiol Resour Announc ; 9(13)2020 Mar 26.

Article in English | MEDLINE | ID: mdl-32217683

ABSTRACT

A large outbreak of extensively drug-resistant (XDR) Salmonella enterica serotype Typhi infections is ongoing in Pakistan, predominantly in Sindh Province. Here, we report the sequencing and characterization of five XDR Salmonella Typhi isolates from the Punjab province of Pakistan that are closely related to the outbreak strain and carry the same IncY plasmid.

17.

Implications of Mobile Genetic Elements for Salmonella enterica Single-Nucleotide Polymorphism Subtyping and Source Tracking Investigations.

Li, Shaoting; Zhang, Shaokang; Baert, Leen; Jagadeesan, Balamurugan; Ngom-Bru, Catherine; Griswold, Taylor; Katz, Lee S; Carleton, Heather A; Deng, Xiangyu.

Appl Environ Microbiol ; 85(24)2019 12 15.

Article in English | MEDLINE | ID: mdl-31585993

ABSTRACT

Single-nucleotide polymorphisms (SNPs) are widely used for whole-genome sequencing (WGS)-based subtyping of foodborne pathogens in outbreak and source tracking investigations. Mobile genetic elements (MGEs) are commonly present in bacterial genomes and may affect SNP subtyping results if their evolutionary history and dynamics differ from that of the bacterial chromosomes. Using Salmonella enterica as a model organism, we surveyed major categories of MGEs, including plasmids, phages, insertion sequences, integrons, and integrative and conjugative elements (ICEs), in 990 genomes representing 21 major serotypes of S. enterica We evaluated whether plasmids and chromosomal MGEs affect SNP subtyping with 9 outbreak clusters of different serotypes found in the United States in 2018. The median total length of chromosomal MGEs accounted for 2.5% of a typical S. enterica chromosome. Of the 990 analyzed S. enterica isolates, 68.9% contained at least one assembled plasmid sequence. The median total length of assembled plasmids in these isolates was 93,671 bp. Plasmids that carry high densities of SNPs were found to substantially affect both SNP phylogenies and SNP distances among closely related isolates if they were present in the reference genome for SNP subtyping. In comparison, chromosomal MGEs were found to have limited impact on SNP subtyping. We recommend the identification of plasmid sequences in the reference genome and the exclusion of plasmid-borne SNPs from SNP subtyping analysis.IMPORTANCE Despite increasingly routine use of WGS and SNP subtyping in outbreak and source tracking investigations, whether and how MGEs affect SNP subtyping has not been thoroughly investigated. Besides chromosomal MGEs, plasmids are frequently entangled in draft genome assemblies and yet to be assessed for their impact on SNP subtyping. This study provides evidence-based guidance on the treatment of MGEs in SNP analysis for Salmonella to infer phylogenetic relationship and SNP distance between isolates.

Subject(s)

Interspersed Repetitive Sequences/genetics , Polymorphism, Single Nucleotide , Salmonella enterica/classification , Salmonella enterica/genetics , Chromosomes, Bacterial , Disease Outbreaks , Genome, Bacterial , Humans , Phylogeny , Plasmids/isolation & purification , Serogroup , Whole Genome Sequencing

18.

Mashtree: a rapid comparison of whole genome sequence files.

Katz, Lee S; Griswold, Taylor; Morrison, Shatavia S; Caravas, Jason A; Zhang, Shaokang; den Bakker, Henk C; Deng, Xiangyu; Carleton, Heather A.

J Open Source Softw ; 4(44)2019 Dec 10.

Article in English | MEDLINE | ID: mdl-35978566

19.

Genome wide characterization of enterotoxigenic Escherichia coli serogroup O6 isolates from multiple outbreaks and sporadic infections from 1975-2016.

Pattabiraman, Vaishnavi; Katz, Lee S; Chen, Jessica C; McCullough, Andre E; Trees, Eija.

PLoS One ; 13(12): e0208735, 2018.

Article in English | MEDLINE | ID: mdl-30596673

ABSTRACT

Enterotoxigenic Escherichia coli (ETEC) are an important cause of diarrhea globally, particularly among children under the age of five in developing countries. ETEC O6 is the most common ETEC serogroup, yet the genome wide population structure of isolates of this serogroup is yet to be determined. In this study, we have characterized 40 ETEC O6 isolates collected between 1975-2016 by whole genome sequencing (WGS) and by phenotypic antimicrobial susceptibility testing. To determine the relatedness of isolates, we evaluated two methods-whole genome high-quality single nucleotide polymorphism (whole genome-hqSNP) and core genome SNP analyses using Lyve-SET and Parsnp respectively. All isolates were tested for antimicrobial susceptibility using a panel of 14 antibiotics. ResFinder 2.1 and a custom quinolone resistance determinants workflow were used for resistance determinant detection. VirulenceFinder 1.5 was used for prediction of the virulence genes. Thirty-seven isolates clustered into three major clades (I, II, III) by whole genome-hqSNP and core genome SNP analyses, while three isolates included in the whole genome-hqSNP analysis only did not cluster with clades I-III by both analyses and formed a distantly related outgroup, designated clade IV. Median number of pairwise whole genome-hqSNPs in clonal ETEC O6 outbreaks ranged from 0 to 5. Of the 40 isolates tested for antimicrobial susceptibility, 18 isolates were pansusceptible. Twenty-two isolates were resistant to at least one antibiotic, nine of which were multidrug resistant. Phenotypic antimicrobial resistance (AR) correlated with AR determinants in 22 isolates. Thirty-two isolates harbored both enterotoxin virulence genes while the remaining 8 isolates had only one of the two virulence genes. In summary, whole genome-hqSNP and core genome SNP analyses from this study revealed similar evolutionary relationships and an overall diversity of ETEC O6 isolates independent of time of isolation. Less than 5 pairwise hqSNPs between ETEC O6 isolates is circumstantially indicative of an outbreak cluster. Findings from this study will be a basis for quicker outbreak detection and control by efficient subtyping by WGS.

Subject(s)

Enterotoxigenic Escherichia coli/genetics , Escherichia coli Infections/epidemiology , Escherichia coli Infections/microbiology , Anti-Bacterial Agents/pharmacology , Computational Biology , DNA, Bacterial , Disease Outbreaks , Drug Resistance, Bacterial/genetics , Enterotoxigenic Escherichia coli/drug effects , Enterotoxigenic Escherichia coli/isolation & purification , Enterotoxigenic Escherichia coli/pathogenicity , Genome, Bacterial , Humans , Microbial Sensitivity Tests , Polymorphism, Single Nucleotide , Sequence Analysis, DNA , Serogroup , Virulence Factors/genetics

20.

SNVPhyl: a single nucleotide variant phylogenomics pipeline for microbial genomic epidemiology.

Petkau, Aaron; Mabon, Philip; Sieffert, Cameron; Knox, Natalie C; Cabral, Jennifer; Iskander, Mariam; Iskander, Mark; Weedmark, Kelly; Zaheer, Rahat; Katz, Lee S; Nadon, Celine; Reimer, Aleisha; Taboada, Eduardo; Beiko, Robert G; Hsiao, William; Brinkman, Fiona; Graham, Morag; Van Domselaar, Gary.

Microb Genom ; 3(6): e000116, 2017 06 30.

Article in English | MEDLINE | ID: mdl-29026651

ABSTRACT

The recent widespread application of whole-genome sequencing (WGS) for microbial disease investigations has spurred the development of new bioinformatics tools, including a notable proliferation of phylogenomics pipelines designed for infectious disease surveillance and outbreak investigation. Transitioning the use of WGS data out of the research laboratory and into the front lines of surveillance and outbreak response requires user-friendly, reproducible and scalable pipelines that have been well validated. Single Nucleotide Variant Phylogenomics (SNVPhyl) is a bioinformatics pipeline for identifying high-quality single-nucleotide variants (SNVs) and constructing a whole-genome phylogeny from a collection of WGS reads and a reference genome. Individual pipeline components are integrated into the Galaxy bioinformatics framework, enabling data analysis in a user-friendly, reproducible and scalable environment. We show that SNVPhyl can detect SNVs with high sensitivity and specificity, and identify and remove regions of high SNV density (indicative of recombination). SNVPhyl is able to correctly distinguish outbreak from non-outbreak isolates across a range of variant-calling settings, sequencing-coverage thresholds or in the presence of contamination. SNVPhyl is available as a Galaxy workflow, Docker and virtual machine images, and a Unix-based command-line application. SNVPhyl is released under the Apache 2.0 license and available at http://snvphyl.readthedocs.io/ or at https://github.com/phac-nml/snvphyl-galaxy.

Subject(s)

Computational Biology , Disease Outbreaks , Genome, Microbial , Infections , Phylogeny , Whole Genome Sequencing , Workflow , Humans , Infections/epidemiology , Infections/genetics , Infections/microbiology

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL