Your browser doesn't support javascript.
loading
Data-driven identification of SARS-CoV-2 subpopulations using PhenoGraph and binary-coded genomic data.
Yang, Zhi-Kai; Pan, Lingyu; Zhang, Yanming; Luo, Hao; Gao, Feng.
Afiliação
  • Yang ZK; Fifth Affiliated Hospital of Guangzhou Medical University, Guangzhou 510700, China.
  • Pan L; Guangzhou Nanxin Pharmaceutical Co., Ltd., Guangzhou 510700, China.
  • Zhang Y; SinoGenoMax Co., Ltd./Chinese National Human Genome Center, Guangzhou 510700, China.
  • Luo H; Department of Physics, School of Science, Tianjin University, Tianjin University, Tianjin 300072, China.
  • Gao F; Department of Physics, School of Science, and the Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China.
Brief Bioinform ; 22(6)2021 11 05.
Article em En | MEDLINE | ID: mdl-34382087
ABSTRACT
For epidemic prevention and control, the identification of SARS-CoV-2 subpopulations sharing similar micro-epidemiological patterns and evolutionary histories is necessary for a more targeted investigation into the links among COVID-19 outbreaks caused by SARS-CoV-2 with similar genetic backgrounds. Genomic sequencing analysis has demonstrated the ability to uncover viral genetic diversity. However, an objective analysis is necessary for the identification of SARS-CoV-2 subpopulations. Herein, we detected all the mutations in 186 682 SARS-CoV-2 isolates. We found that the GC content of the SARS-CoV-2 genome had evolved to be lower, which may be conducive to viral spread, and the frameshift mutation was rare in the global population. Next, we encoded the genomic mutations in binary form and used an unsupervised learning classifier, namely PhenoGraph, to classify this information. Consequently, PhenoGraph successfully identified 303 SARS-CoV-2 subpopulations, and we found that the PhenoGraph classification was consistent with, but more detailed and precise than the known GISAID clades (S, L, V, G, GH, GR, GV and O). By the change trend analysis, we found that the growth rate of SARS-CoV-2 diversity has slowed down significantly. We also analyzed the temporal, spatial and phylogenetic relationships among the subpopulations and revealed the evolutionary trajectory of SARS-CoV-2 to a certain extent. Hence, our results provide a better understanding of the patterns and trends in the genomic evolution and epidemiology of SARS-CoV-2.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Genômica / Epidemias / SARS-CoV-2 / COVID-19 Tipo de estudo: Diagnostic_studies / Prognostic_studies Limite: Humans Idioma: En Ano de publicação: 2021 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Genômica / Epidemias / SARS-CoV-2 / COVID-19 Tipo de estudo: Diagnostic_studies / Prognostic_studies Limite: Humans Idioma: En Ano de publicação: 2021 Tipo de documento: Article