Your browser doesn't support javascript.
loading
Machine learning methods accurately predict host specificity of coronaviruses based on spike sequences alone.
Kuzmin, Kiril; Adeniyi, Ayotomiwa Ezekiel; DaSouza, Arthur Kevin; Lim, Deuk; Nguyen, Huyen; Molina, Nuria Ramirez; Xiong, Lanqiao; Weber, Irene T; Harrison, Robert W.
Afiliação
  • Kuzmin K; Department of Computer Science, Georgia State University, 1 Park Place, Atlanta, GA, 30303, USA. Electronic address: kkuzmin1@gsu.edu.
  • Adeniyi AE; Department of Chemistry, Georgia State University, 145 Piedmont Ave SE, Atlanta, GA, 30303, USA.
  • DaSouza AK; Department of Biology, Georgia State University, 145 Piedmont Ave SE, Atlanta, GA, 30303, USA.
  • Lim D; Department of Biology, Georgia State University, 145 Piedmont Ave SE, Atlanta, GA, 30303, USA.
  • Nguyen H; Department of Biology, Georgia State University, 145 Piedmont Ave SE, Atlanta, GA, 30303, USA.
  • Molina NR; Department of Biology, Georgia State University, 145 Piedmont Ave SE, Atlanta, GA, 30303, USA.
  • Xiong L; Department of Biology, Georgia State University, 145 Piedmont Ave SE, Atlanta, GA, 30303, USA.
  • Weber IT; Department of Biology, Georgia State University, 145 Piedmont Ave SE, Atlanta, GA, 30303, USA.
  • Harrison RW; Department of Computer Science, Georgia State University, 1 Park Place, Atlanta, GA, 30303, USA; Department of Biology, Georgia State University, 145 Piedmont Ave SE, Atlanta, GA, 30303, USA. Electronic address: rwh@gsu.edu.
Biochem Biophys Res Commun ; 533(3): 553-558, 2020 Dec 10.
Article em En | MEDLINE | ID: mdl-32981683
Coronaviruses infect many animals, including humans, due to interspecies transmission. Three of the known human coronaviruses: MERS, SARS-CoV-1, and SARS-CoV-2, the pathogen for the COVID-19 pandemic, cause severe disease. Improved methods to predict host specificity of coronaviruses will be valuable for identifying and controlling future outbreaks. The coronavirus S protein plays a key role in host specificity by attaching the virus to receptors on the cell membrane. We analyzed 1238 spike sequences for their host specificity. Spike sequences readily segregate in t-SNE embeddings into clusters of similar hosts and/or virus species. Machine learning with SVM, Logistic Regression, Decision Tree, Random Forest gave high average accuracies, F1 scores, sensitivities and specificities of 0.95-0.99. Importantly, sites identified by Decision Tree correspond to protein regions with known biological importance. These results demonstrate that spike sequences alone can be used to predict host specificity.
Assuntos
Palavras-chave

Texto completo: 1 Bases de dados: MEDLINE Assunto principal: Coronavirus / Biologia Computacional / Especificidade de Hospedeiro / Glicoproteína da Espícula de Coronavírus / Aprendizado de Máquina Tipo de estudo: Prognostic_studies / Risk_factors_studies Limite: Animals / Humans Idioma: En Revista: Biochem Biophys Res Commun Ano de publicação: 2020 Tipo de documento: Article

Texto completo: 1 Bases de dados: MEDLINE Assunto principal: Coronavirus / Biologia Computacional / Especificidade de Hospedeiro / Glicoproteína da Espícula de Coronavírus / Aprendizado de Máquina Tipo de estudo: Prognostic_studies / Risk_factors_studies Limite: Animals / Humans Idioma: En Revista: Biochem Biophys Res Commun Ano de publicação: 2020 Tipo de documento: Article