Búsqueda | Portal Regional de la BVS

Tracking SARS-CoV-2 variants of concern in wastewater: an assessment of nine computational tools using simulated genomic data.

Sutcliffe, Steven G; Kraemer, Susanne A; Ellmen, Isaac; Knapp, Jennifer J; Overton, Alyssa K; Nash, Delaney; Nissimov, Jozef I; Charles, Trevor C; Dreifuss, David; Topolsky, Ivan; Baykal, Pelin I; Fuhrmann, Lara; Jablonski, Kim P; Beerenwinkel, Niko; Levy, Joshua I; Olabode, Abayomi S; Becker, Devan G; Gugan, Gopi; Brintnell, Erin; Poon, Art F Y; Valieris, Renan; Drummond, Rodrigo D; Defelicibus, Alexandre; Dias-Neto, Emmanuel; Rosales, Rafael A; Tojal da Silva, Israel; Orfanou, Aspasia; Psomopoulos, Fotis; Pechlivanis, Nikolaos; Pipes, Lenore; Chen, Zihao; Baaijens, Jasmijn A; Baym, Michael; Shapiro, B Jesse.

Microb Genom ; 10(5)2024 May.

Artículo en Inglés | MEDLINE | ID: mdl-38785221

RESUMEN

Wastewater-based surveillance (WBS) is an important epidemiological and public health tool for tracking pathogens across the scale of a building, neighbourhood, city, or region. WBS gained widespread adoption globally during the SARS-CoV-2 pandemic for estimating community infection levels by qPCR. Sequencing pathogen genes or genomes from wastewater adds information about pathogen genetic diversity, which can be used to identify viral lineages (including variants of concern) that are circulating in a local population. Capturing the genetic diversity by WBS sequencing is not trivial, as wastewater samples often contain a diverse mixture of viral lineages with real mutations and sequencing errors, which must be deconvoluted computationally from short sequencing reads. In this study we assess nine different computational tools that have recently been developed to address this challenge. We simulated 100 wastewater sequence samples consisting of SARS-CoV-2 BA.1, BA.2, and Delta lineages, in various mixtures, as well as a Delta-Omicron recombinant and a synthetic 'novel' lineage. Most tools performed well in identifying the true lineages present and estimating their relative abundances and were generally robust to variation in sequencing depth and read length. While many tools identified lineages present down to 1â% frequency, results were more reliable above a 5â% threshold. The presence of an unknown synthetic lineage, which represents an unclassified SARS-CoV-2 lineage, increases the error in relative abundance estimates of other lineages, but the magnitude of this effect was small for most tools. The tools also varied in how they labelled novel synthetic lineages and recombinants. While our simulated dataset represents just one of many possible use cases for these methods, we hope it helps users understand potential sources of error or bias in wastewater sequencing analysis and to appreciate the commonalities and differences across methods.

Asunto(s)

COVID-19 , Genoma Viral , SARS-CoV-2 , Aguas Residuales , Aguas Residuales/virología , SARS-CoV-2/genética , SARS-CoV-2/clasificación , COVID-19/virología , COVID-19/epidemiología , Humanos , Biología Computacional/métodos , Genómica/métodos , Monitoreo Epidemiológico Basado en Aguas Residuales , Filogenia

SUP: a probabilistic framework to propagate genome sequence uncertainty, with applications.

Becker, Devan; Champredon, David; Chato, Connor; Gugan, Gopi; Poon, Art.

NAR Genom Bioinform ; 5(2): lqad038, 2023 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-37101658

RESUMEN

Genetic sequencing is subject to many different types of errors, but most analyses treat the resultant sequences as if they are known without error. Next generation sequencing methods rely on significantly larger numbers of reads than previous sequencing methods in exchange for a loss of accuracy in each individual read. Still, the coverage of such machines is imperfect and leaves uncertainty in many of the base calls. In this work, we demonstrate that the uncertainty in sequencing techniques will affect downstream analysis and propose a straightforward method to propagate the uncertainty. Our method (which we have dubbed Sequence Uncertainty Propagation, or SUP) uses a probabilistic matrix representation of individual sequences which incorporates base quality scores as a measure of uncertainty that naturally lead to resampling and replication as a framework for uncertainty propagation. With the matrix representation, resampling possible base calls according to quality scores provides a bootstrap- or prior distribution-like first step towards genetic analysis. Analyses based on these re-sampled sequences will include a more complete evaluation of the error involved in such analyses. We demonstrate our resampling method on SARS-CoV-2 data. The resampling procedures add a linear computational cost to the analyses, but the large impact on the variance in downstream estimates makes it clear that ignoring this uncertainty may lead to overly confident conclusions. We show that SARS-CoV-2 lineage designations via Pangolin are much less certain than the bootstrap support reported by Pangolin would imply and the clock rate estimates for SARS-CoV-2 are much more variable than reported.

CoVizu: Rapid analysis and visualization of the global diversity of SARS-CoV-2 genomes.

Ferreira, Roux-Cil; Wong, Emmanuel; Gugan, Gopi; Wade, Kaitlyn; Liu, Molly; Baena, Laura Muñoz; Chato, Connor; Lu, Bonnie; Olabode, Abayomi S; Poon, Art F Y.

Virus Evol ; 7(2): veab092, 2021 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-37124703

RESUMEN

Phylogenetics has played a pivotal role in the genomic epidemiology of severe acute respiratory syndrome coronavirus 2, such as tracking the emergence and global spread of variants and scientific communication. However, the rapid accumulation of genomic data from around the world-with over two million genomes currently available in the Global Initiative on Sharing All Influenza Data database-is testing the limits of standard phylogenetic methods. Here, we describe a new approach to rapidly analyze and visualize large numbers of SARS-CoV-2 genomes. Using Python, genomes are filtered for problematic sites, incomplete coverage, and excessive divergence from a strict molecular clock. All differences from the reference genome, including indels, are extracted using minimap2 and compactly stored as a set of features for each genome. For each Pango lineage (https://cov-lineages.org), we collapse genomes with identical features into 'variants', generate 100 bootstrap samples of the feature set union to generate weights, and compute the symmetric differences between the weighted feature sets for every pair of variants. The resulting distance matrices are used to generate neighbor-joining trees in RapidNJ that are converted into a majority-rule consensus tree for each lineage. Branches with support values below 50 per cent or mean lengths below 0.5 differences are collapsed, and tip labels on affected branches are mapped to internal nodes as directly sampled ancestral variants. Currently, we process about 2 million genomes in approximately 9 h on 52 cores. The resulting trees are visualized using the JavaScript framework D3.js as 'beadplots', in which variants are represented by horizontal line segments, annotated with beads representing samples by collection date. Variants are linked by vertical edges to represent branches in the consensus tree. These visualizations are published at https://filogeneti.ca/CoVizu. All source code was released under an MIT license at https://github.com/PoonLab/covizu.

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA