Your browser doesn't support javascript.
loading
Unsupervised clustering analysis of SARS-Cov-2 population structure reveals six major subtypes at early stage across the world.
Li, Yawei; Liu, Qingyun; Zeng, Zexian; Luo, Yuan.
Afiliação
  • Li Y; Department of Preventive Medicine, Northwestern University, Feinberg School of Medicine, Chicago, IL 60611, USA.
  • Liu Q; Department of Immunology and Infectious Diseases, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA.
  • Zeng Z; Department of Data Science, Dana Farber Cancer Institute, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA.
  • Luo Y; Department of Preventive Medicine, Northwestern University, Feinberg School of Medicine, Chicago, IL 60611, USA.
bioRxiv ; 2021 Nov 24.
Article em En | MEDLINE | ID: mdl-34845455
ABSTRACT
Identifying the population structure of the newly emerged coronavirus SARS-CoV-2 has significant potential to inform public health management and diagnosis. As SARS-CoV-2 sequencing data accrued, grouping them into clusters is important for organizing the landscape of the population structure of the virus. Due to the limited prior information on the newly emerged coronavirus, we utilized four different clustering algorithms to group 16,873 SARS-CoV-2 strains, which automatically enables the identification of spatial structure for SARS-CoV-2. A total of six distinct genomic clusters were identified using mutation profiles as input features. Comparison of the clustering results reveals that the four algorithms produced highly consistent results, but the state-of-the-art unsupervised deep learning clustering algorithm performed best and produced the smallest intra-cluster pairwise genetic distances. The varied proportions of the six clusters within different continents revealed specific geographical distributions. In particular, our analysis found that Oceania was the only continent on which the strains were dispersively distributed into six clusters. In summary, this study provides a concrete framework for the use of clustering methods to study the global population structure of SARS-CoV-2. In addition, clustering methods can be used for future studies of variant population structures in specific regions of these fast-growing viruses.
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Revista: BioRxiv Ano de publicação: 2021 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Revista: BioRxiv Ano de publicação: 2021 Tipo de documento: Article País de afiliação: Estados Unidos