RESUMO
As the COVID-19 transits to endemicity, the frequency of clinical testing and its utility for determining lineage prevalence has declined. This situation is not unique to Slovakia but reflects a global trend, as attention shifts from COVID-19 to other post-pandemic issues and emerging global health challenges. Nevertheless, the pandemic itself has spurred advancements in monitoring the epidemiological situation. At the beginning of the pandemic, genomic surveillance was carried out through sequencing of individual COVID-19 cases. Subsequently, many countries implemented wastewater surveillance to monitor the prevalence of SARS-CoV-2 variants in the community. In the present study, we collected and analysed 1715 virus-positive samples from 64 wastewater treatment plants across Slovakia, serving 69 % of the population connected to the wastewater treatment pipelines. Here, we show that wastewater sequencing is effective in detecting the emergence of new virus lineages. Additionally, we can assume that wastewater surveillance provides results that are approximately consistent when compared with clinical testing at both national and city levels, concurrently providing information on variant lineages which have not been detected in clinical cases due to reduced clinical testing. Our study demonstrates and concludes the value of wastewater-based surveillance strategies in the Slovakia, establishing it as an important and supportive tool for monitoring public health and serving as an early warning system in times when clinical testing is either declining or unavailable.
RESUMO
With the rapid growth of massively parallel sequencing technologies, still more laboratories are utilising sequenced DNA fragments for genomic analyses. Interpretation of sequencing data is, however, strongly dependent on bioinformatics processing, which is often too demanding for clinicians and researchers without a computational background. Another problem represents the reproducibility of computational analyses across separated computational centres with inconsistent versions of installed libraries and bioinformatics tools. We propose an easily extensible set of computational pipelines, called SnakeLines, for processing sequencing reads; including mapping, assembly, variant calling, viral identification, transcriptomics, and metagenomics analysis. Individual steps of an analysis, along with methods and their parameters can be readily modified in a single configuration file. Provided pipelines are embedded in virtual environments that ensure isolation of required resources from the host operating system, rapid deployment, and reproducibility of analysis across different Unix-based platforms. SnakeLines is a powerful framework for the automation of bioinformatics analyses, with emphasis on a simple set-up, modifications, extensibility, and reproducibility. The framework is already routinely used in various research projects and their applications, especially in the Slovak national surveillance of SARS-CoV-2.
Assuntos
Genômica , Software , Reprodutibilidade dos Testes , Genômica/métodos , Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodosRESUMO
MOTIVATION: Short tandem repeats (STRs) are regions of a genome containing many consecutive copies of the same short motif, possibly with small variations. Analysis of STRs has many clinical uses but is limited by technology mainly due to STRs surpassing the used read length. Nanopore sequencing, as one of long-read sequencing technologies, produces very long reads, thus offering more possibilities to study and analyze STRs. Basecalling of nanopore reads is however particularly unreliable in repeating regions, and therefore direct analysis from raw nanopore data is required. RESULTS: Here, we present WarpSTR, a novel method for characterizing both simple and complex tandem repeats directly from raw nanopore signals using a finite-state automaton and a search algorithm analogous to dynamic time warping. By applying this approach to determine the lengths of 241 STRs, we demonstrate that our approach decreases the mean absolute error of the STR length estimate compared to basecalling and STRique. AVAILABILITY AND IMPLEMENTATION: WarpSTR is freely available at https://github.com/fmfi-compbio/warpstr.
Assuntos
Nanoporos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Genoma , Algoritmos , Repetições de Microssatélites , Análise de Sequência de DNARESUMO
To explore a genomic pool of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) during the pandemic, the Ministry of Health of the Slovak Republic formed a genomics surveillance workgroup, and the Public Health Authority of the Slovak Republic launched a systematic national epidemiological surveillance using whole-genome sequencing (WGS). Six out of seven genomic centers implementing Illumina sequencing technology were involved in the national SARS-CoV-2 virus sequencing program. Here we analyze a total of 33,024 SARS-CoV-2 isolates collected from the Slovak population from 1 March 2021, to 31 March 2022, that were sequenced and analyzed in a consistent manner. Overall, 28,005 out of 30,793 successfully sequenced samples met the criteria to be deposited in the global GISAID database. During this period, we identified four variants of concern (VOC)-Alpha (B.1.1.7), Beta (B.1.351), Delta (B.1.617.2) and Omicron (B.1.1.529). In detail, we observed 165 lineages in our dataset, with dominating Alpha, Delta and Omicron in three major consecutive incidence waves. This study aims to describe the results of a routine but high-level SARS-CoV-2 genomic surveillance program. Our study of SARS-CoV-2 genomes in collaboration with the Public Health Authority of the Slovak Republic also helped to inform the public about the epidemiological situation during the pandemic.
Assuntos
COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , Eslováquia/epidemiologia , COVID-19/epidemiologia , Genoma Viral , Sequenciamento de Nucleotídeos em Larga Escala , GenômicaRESUMO
Computing similarity between 2 nucleotide sequences is one of the fundamental problems in bioinformatics. Current methods are based mainly on 2 major approaches: (1) sequence alignment, which is computationally expensive, and (2) faster, but less accurate, alignment-free methods based on various statistical summaries, for example, short word counts. We propose a new distance measure based on mathematical transforms from the domain of signal processing. To tolerate large-scale rearrangements in the sequences, the transform is computed across sliding windows. We compare our method on several data sets with current state-of-art alignment-free methods. Our method compares favorably in terms of accuracy and outperforms other methods in running time and memory requirements. In addition, it is massively scalable up to dozens of processing units without the loss of performance due to communication overhead. Source files and sample data are available at https://bitbucket.org/fiitstubioinfo/swspm/src.