ABSTRACT
Hantavirus Pulmonary Syndrome (HPS), characterized by its high fatality rate, poses a significant public health concern in Argentina due to the increasing evidence of person-to-person transmission of Andes virus. Several orthohantaviruses were described in the country, but their phylogenetic relationships were inferred from partial genomic sequences. The objectives of this work were to assess the viral diversity of the most prevalent orthohantaviruses associated with HPS cases in the Central-East (CE) region of Argentina, elucidate the geographic patterns of distribution of each variant and reconstruct comprehensive phylogenetic relationships utilizing complete genomic sequencing. To accomplish this, a detailed analysis was conducted of the geographic distribution of reported cases within the most impacted province of the region. A representative sample of cases was then selected to generate a geographic map illustrating the distribution of viral variants. Complete viral genomes were obtained from HPS cases reported in the region, including some from epidemiologically linked cases. The phylogenetic analysis based on complete genomes defined two separate clades in Argentina: Andes virus in the Southwestern region and Andes-like viruses in other parts of the country. In the CE region, Buenos Aires virus and Lechiguanas virus clearly segregate in two subclades. Complete genomes were useful to distinguish person-to-person transmission from environmental co-exposure to rodent population. This study enhances the understanding of the genetic diversity, geographical spread, and transmission dynamics of orthohantaviruses in Central Argentina and prompt to consider the inclusion of Buenos Aires virus and Lechiguanas virus in the species Orthohantavirus andesense, as named viruses.
Subject(s)
Genetic Variation , Genome, Viral , Orthohantavirus , Phylogeny , Argentina/epidemiology , Orthohantavirus/genetics , Orthohantavirus/classification , Humans , Whole Genome Sequencing , Hantavirus Pulmonary Syndrome/transmission , Hantavirus Pulmonary Syndrome/virology , Hantavirus Pulmonary Syndrome/epidemiology , Male , Female , Adult , Animals , Middle Aged , Hantavirus Infections/transmission , Hantavirus Infections/virology , Hantavirus Infections/epidemiology , Hantavirus Infections/veterinary , Young AdultABSTRACT
The SARS-CoV-2 genome occupies a unique place in infection biology - it is the most highly sequenced genome on earth (making up over 20% of public sequencing datasets) with fine scale information on sampling date and geography, and has been subject to unprecedented intense analysis. As a result, these phylogenetic data are an incredibly valuable resource for science and public health. However, the vast majority of the data was sequenced by tiling amplicons across the full genome, with amplicon schemes that changed over the pandemic as mutations in the viral genome interacted with primer binding sites. In combination with the disparate set of genome assembly workflows and lack of consistent quality control (QC) processes, the current genomes have many systematic errors that have evolved with the virus and amplicon schemes. These errors have significant impacts on the phylogeny, and therefore over the last few years, many thousands of hours of researchers time has been spent in "eyeballing" trees, looking for artefacts, and then patching the tree. Given the huge value of this dataset, we therefore set out to reprocess the complete set of public raw sequence data in a rigorous amplicon-aware manner, and build a cleaner phylogeny. Here we provide a global tree of 3,960,704 samples, built from a consistently assembled set of high quality consensus sequences from all available public data as of March 2023, viewable at https://viridian.taxonium.org. Each genome was constructed using a novel assembly tool called Viridian (https://github.com/iqbal-lab-org/viridian), developed specifically to process amplicon sequence data, eliminating artefactual errors and mask the genome at low quality positions. We provide simulation and empirical validation of the methodology, and quantify the improvement in the phylogeny. Phase 2 of our project will address the fact that the data in the public archives is heavily geographically biased towards the Global North. We therefore have contributed new raw data to ENA/SRA from many countries including Ghana, Thailand, Laos, Sri Lanka, India, Argentina and Singapore. We will incorporate these, along with all public raw data submitted between March 2023 and the current day, into an updated set of assemblies, and phylogeny. We hope the tree, consensus sequences and Viridian will be a valuable resource for researchers.