Your browser doesn't support javascript.
loading
Unsupervised identification of significant lineages of SARS-CoV-2 through scalable machine learning methods.
Cahuantzi, Roberto; Lythgoe, Katrina A; Hall, Ian; Pellis, Lorenzo; House, Thomas.
  • Cahuantzi R; Department of Mathematics, The University of Manchester, Manchester M13 9PL, United Kingdom.
  • Lythgoe KA; United Kingdom Health Security Agency, University of Oxford, Oxford OX3 7LF, United Kingdom.
  • Hall I; Department of Biology, University of Oxford, Oxford OX1 3SZ, United Kingdom.
  • Pellis L; Big Data Institute, University of Oxford, Oxford OX3 7LF, United Kingdom.
  • House T; Pandemic Sciences Institute, University of Oxford, Oxford OX3 7LF, United Kingdom.
Proc Natl Acad Sci U S A ; 121(12): e2317284121, 2024 Mar 19.
Article en En | MEDLINE | ID: mdl-38478692
ABSTRACT
Since its emergence in late 2019, SARS-CoV-2 has diversified into a large number of lineages and caused multiple waves of infection globally. Novel lineages have the potential to spread rapidly and internationally if they have higher intrinsic transmissibility and/or can evade host immune responses, as has been seen with the Alpha, Delta, and Omicron variants of concern. They can also cause increased mortality and morbidity if they have increased virulence, as was seen for Alpha and Delta. Phylogenetic methods provide the "gold standard" for representing the global diversity of SARS-CoV-2 and to identify newly emerging lineages. However, these methods are computationally expensive, struggle when datasets get too large, and require manual curation to designate new lineages. These challenges provide a motivation to develop complementary methods that can incorporate all of the genetic data available without down-sampling to extract meaningful information rapidly and with minimal curation. In this paper, we demonstrate the utility of using algorithmic approaches based on word-statistics to represent whole sequences, bringing speed, scalability, and interpretability to the construction of genetic topologies. While not serving as a substitute for current phylogenetic analyses, the proposed methods can be used as a complementary, and fully automatable, approach to identify and confirm new emerging variants.
Asunto(s)
Palabras clave

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: SARS-CoV-2 / COVID-19 Límite: Humans Idioma: En Año: 2024 Tipo del documento: Article

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: SARS-CoV-2 / COVID-19 Límite: Humans Idioma: En Año: 2024 Tipo del documento: Article