Your browser doesn't support javascript.
loading
Addressing pandemic-wide systematic errors in the SARS-CoV-2 phylogeny.
Hunt, Martin; Hinrichs, Angie S; Anderson, Daniel; Karim, Lily; Dearlove, Bethany L; Knaggs, Jeff; Constantinides, Bede; Fowler, Philip W; Rodger, Gillian; Street, Teresa; Lumley, Sheila; Webster, Hermione; Sanderson, Theo; Ruis, Christopher; de Maio, Nicola; Amenga-Etego, Lucas N; Amuzu, Dominic S Y; Avaro, Martin; Awandare, Gordon A; Ayivor-Djanie, Reuben; Bashton, Matthew; Batty, Elizabeth M; Bediako, Yaw; De Belder, Denise; Benedetti, Estefania; Bergthaler, Andreas; Boers, Stefan A; Campos, Josefina; Carr, Rosina Afua Ampomah; Cuba, Facundo; Dattero, Maria Elena; Dejnirattisai, Wanwisa; Dilthey, Alexander; Duedu, Kwabena Obeng; Endler, Lukas; Engelmann, Ilka; Francisco, Ngiambudulu M; Fuchs, Jonas; Gnimpieba, Etienne Z; Groc, Soraya; Gyamfi, Jones; Heemskerk, Dennis; Houwaart, Torsten; Hsiao, Nei-Yuan; Huska, Matthew; Hölzer, Martin; Iranzadeh, Arash; Jarva, Hanna; Jeewandara, Chandima; Jolly, Bani.
Afiliação
  • Hunt M; European Molecular Biology Laboratory - European Bioinformatics Institute, Hinxton, UK.
  • Hinrichs AS; Nuffield Department of Medicine, University of Oxford, Oxford, UK.
  • Anderson D; National Institute of Health Research Oxford Biomedical Research Centre, John Radcliffe Hospital, Headley Way, Oxford, UK.
  • Karim L; Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance, University of Oxford, Oxford, UK.
  • Dearlove BL; Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA.
  • Knaggs J; European Molecular Biology Laboratory - European Bioinformatics Institute, Hinxton, UK.
  • Constantinides B; Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA.
  • Fowler PW; Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA.
  • Rodger G; Institute for Hygiene and Applied Immunology, Center for Pathophysiology, Infectiology and Immunology, Medical University of Vienna, Vienna 1090, Austria.
  • Street T; European Molecular Biology Laboratory - European Bioinformatics Institute, Hinxton, UK.
  • Lumley S; Nuffield Department of Medicine, University of Oxford, Oxford, UK.
  • Webster H; National Institute of Health Research Oxford Biomedical Research Centre, John Radcliffe Hospital, Headley Way, Oxford, UK.
  • Sanderson T; Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance, University of Oxford, Oxford, UK.
  • Ruis C; Nuffield Department of Medicine, University of Oxford, Oxford, UK.
  • de Maio N; Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance, University of Oxford, Oxford, UK.
  • Amenga-Etego LN; Nuffield Department of Medicine, University of Oxford, Oxford, UK.
  • Amuzu DSY; National Institute of Health Research Oxford Biomedical Research Centre, John Radcliffe Hospital, Headley Way, Oxford, UK.
  • Avaro M; Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance, University of Oxford, Oxford, UK.
  • Awandare GA; Nuffield Department of Medicine, University of Oxford, Oxford, UK.
  • Ayivor-Djanie R; Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance, University of Oxford, Oxford, UK.
  • Bashton M; Nuffield Department of Medicine, University of Oxford, Oxford, UK.
  • Batty EM; National Institute of Health Research Oxford Biomedical Research Centre, John Radcliffe Hospital, Headley Way, Oxford, UK.
  • Bediako Y; Nuffield Department of Medicine, University of Oxford, Oxford, UK.
  • De Belder D; Department of Infectious Diseases and Microbiology, John Radcliffe Hospital, Oxford, UK.
  • Benedetti E; Nuffield Department of Medicine, University of Oxford, Oxford, UK.
  • Bergthaler A; Francis Crick Institute, London, UK.
  • Boers SA; Victor Phillip Dahdaleh Heart & Lung Research Institute, University of Cambridge, Cambridge, UK.
  • Campos J; Department of Veterinary Medicine, University of Cambridge, Cambridge, UK.
  • Carr RAA; European Molecular Biology Laboratory - European Bioinformatics Institute, Hinxton, UK.
  • Cuba F; West African Centre for Cell Biology of Infectious Pathogens (WACCBIP), University of Ghana, Accra, Ghana.
  • Dattero ME; West African Centre for Cell Biology of Infectious Pathogens (WACCBIP), University of Ghana, Accra, Ghana.
  • Dejnirattisai W; Servicio de Virus Respiratorios, Instituto Nacional Enfermedades Infecciosas, ANLIS "Dr. Carlos G. Malbrán", Buenos Aires, Argentina.
  • Dilthey A; West African Centre for Cell Biology of Infectious Pathogens (WACCBIP), University of Ghana, Accra, Ghana.
  • Duedu KO; Laboratory for Medical Biotechnology and Biomanufacturing, International Centre for Genetic Engineering and Biotechnology, Tristie, Italy.
  • Endler L; Department of Biomedical Sciences, University of Health and Allied Sciences, Ho, Ghana.
  • Engelmann I; The Hub for Biotechnology in the Built Environment, Department of Applied Sciences, Faculty of Health and Life Sciences, Northumbria University, Newcastle upon Tyne, NE1 8ST, UK.
  • Francisco NM; Centre for Tropical Medicine and Global Health, Nuffield Department of Medicine, University of Oxford, Oxford, UK.
  • Fuchs J; Mahidol-Oxford Tropical Medicine Research Unit, Bangkok, Thailand.
  • Gnimpieba EZ; West African Centre for Cell Biology of Infectious Pathogens (WACCBIP), University of Ghana, Accra, Ghana.
  • Groc S; Unidad Operativa Centro Nacional de Genómica y Bioinformática, ANLIS "Dr. Carlos G. Malbrán", Buenos Aires, Argentina.
  • Gyamfi J; Servicio de Virus Respiratorios, Instituto Nacional Enfermedades Infecciosas, ANLIS "Dr. Carlos G. Malbrán", Buenos Aires, Argentina.
  • Heemskerk D; Institute for Hygiene and Applied Immunology, Center for Pathophysiology, Infectiology and Immunology, Medical University of Vienna, Vienna 1090, Austria.
  • Houwaart T; Dept. Medical Microbiology, Leiden University Medical Center, Albinusdreef 2, 2333 ZA, Leiden, The Netherlands.
  • Hsiao NY; Unidad Operativa Centro Nacional de Genómica y Bioinformática, ANLIS "Dr. Carlos G. Malbrán", Buenos Aires, Argentina.
  • Huska M; Department of Biomedical Sciences, University of Health and Allied Sciences, Ho, Ghana.
  • Hölzer M; Department of Computational Medicine and Bioinformatics, University of Michigan, Michigan, Ann Arbor, MI, USA.
  • Iranzadeh A; Unidad Operativa Centro Nacional de Genómica y Bioinformática, ANLIS "Dr. Carlos G. Malbrán", Buenos Aires, Argentina.
  • Jarva H; Servicio de Virus Respiratorios, Instituto Nacional Enfermedades Infecciosas, ANLIS "Dr. Carlos G. Malbrán", Buenos Aires, Argentina.
  • Jeewandara C; Division of Emerging Infectious Disease, Research Department, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkoknoi, Bangkok 10700, Thailand.
  • Jolly B; Institute of Medical Microbiology and Hospital Hygiene, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany.
bioRxiv ; 2024 Apr 30.
Article em En | MEDLINE | ID: mdl-38746185
ABSTRACT
The SARS-CoV-2 genome occupies a unique place in infection biology - it is the most highly sequenced genome on earth (making up over 20% of public sequencing datasets) with fine scale information on sampling date and geography, and has been subject to unprecedented intense analysis. As a result, these phylogenetic data are an incredibly valuable resource for science and public health. However, the vast majority of the data was sequenced by tiling amplicons across the full genome, with amplicon schemes that changed over the pandemic as mutations in the viral genome interacted with primer binding sites. In combination with the disparate set of genome assembly workflows and lack of consistent quality control (QC) processes, the current genomes have many systematic errors that have evolved with the virus and amplicon schemes. These errors have significant impacts on the phylogeny, and therefore over the last few years, many thousands of hours of researchers time has been spent in "eyeballing" trees, looking for artefacts, and then patching the tree. Given the huge value of this dataset, we therefore set out to reprocess the complete set of public raw sequence data in a rigorous amplicon-aware manner, and build a cleaner phylogeny. Here we provide a global tree of 3,960,704 samples, built from a consistently assembled set of high quality consensus sequences from all available public data as of March 2023, viewable at https//viridian.taxonium.org. Each genome was constructed using a novel assembly tool called Viridian (https//github.com/iqbal-lab-org/viridian), developed specifically to process amplicon sequence data, eliminating artefactual errors and mask the genome at low quality positions. We provide simulation and empirical validation of the methodology, and quantify the improvement in the phylogeny. Phase 2 of our project will address the fact that the data in the public archives is heavily geographically biased towards the Global North. We therefore have contributed new raw data to ENA/SRA from many countries including Ghana, Thailand, Laos, Sri Lanka, India, Argentina and Singapore. We will incorporate these, along with all public raw data submitted between March 2023 and the current day, into an updated set of assemblies, and phylogeny. We hope the tree, consensus sequences and Viridian will be a valuable resource for researchers.

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Revista: BioRxiv Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Reino Unido

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Revista: BioRxiv Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Reino Unido