Pathogenic potential assessment of the Shiga toxin-producing <i>Escherichia coli</i> by a source attribution-considered machine learning model.

Im, Hanhyeok; Hwang, Seung-Ho; Kim, Byoung Sik; Choi, Sang Ho

Pathogenic potential assessment of the Shiga toxin-producing Escherichia coli by a source attribution-considered machine learning model.

Im, Hanhyeok; Hwang, Seung-Ho; Kim, Byoung Sik; Choi, Sang Ho.

Afiliação

Im H; National Research Laboratory of Molecular Microbiology and Toxicology, Seoul National University, 08826 Seoul, Republic of Korea.
Hwang SH; Department of Agricultural Biotechnology and Center for Food Safety and Toxicology, Seoul National University, 08826 Seoul, Republic of Korea.
Kim BS; National Research Laboratory of Molecular Microbiology and Toxicology, Seoul National University, 08826 Seoul, Republic of Korea.
Choi SH; Department of Agricultural Biotechnology and Center for Food Safety and Toxicology, Seoul National University, 08826 Seoul, Republic of Korea.

Proc Natl Acad Sci U S A ; 118(20)2021 05 18.

Article em En | MEDLINE | ID: mdl-33986113

RESUMO

Instead of conventional serotyping and virulence gene combination methods, methods have been developed to evaluate the pathogenic potential of newly emerging pathogens. Among them, the machine learning (ML)-based method using whole-genome sequencing (WGS) data are getting attention because of the recent advances in ML algorithms and sequencing technologies. Here, we developed various ML models to predict the pathogenicity of Shiga toxin-producing Escherichia coli (STEC) isolates using their WGS data. The input dataset for the ML models was generated using distinct gene repertoires from positive (pathogenic) and negative (nonpathogenic) control groups in which each STEC isolate was designated based on the source attribution, the relative risk potential of the isolation sources. Among the various ML models examined, a model using the support vector machine (SVM) algorithm, the SVM model, discriminated between the two control groups most accurately. The SVM model successfully predicted the pathogenicity of the isolates from the major sources of STEC outbreaks, the isolates with the history of outbreaks, and the isolates that cannot be assessed by conventional methods. Furthermore, the SVM model effectively differentiated the pathogenic potentials of the isolates at a finer resolution. Permutation importance analyses of the input dataset further revealed the genes important for the estimation, proposing the genes potentially essential for the pathogenicity of STEC. Altogether, these results suggest that the SVM model is a more reliable and broadly applicable method to evaluate the pathogenic potential of STEC isolates compared with conventional methods.

Assuntos

Proteínas de Escherichia coli/genética; Aprendizado de Máquina; Toxina Shiga II/genética; Escherichia coli Shiga Toxigênica/genética; Máquina de Vetores de Suporte; Infecções por Escherichia coli/diagnóstico; Infecções por Escherichia coli/microbiologia; Proteínas de Escherichia coli/metabolismo; Humanos; Curva ROC; Reprodutibilidade dos Testes; Toxina Shiga II/metabolismo; Escherichia coli Shiga Toxigênica/classificação; Escherichia coli Shiga Toxigênica/patogenicidade; Virulência/genética; Sequenciamento Completo do Genoma/métodos

Palavras-chave

STEC; machine learning; pathogenic potential; risk assessment

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Toxina Shiga II / Proteínas de Escherichia coli / Escherichia coli Shiga Toxigênica / Máquina de Vetores de Suporte / Aprendizado de Máquina Tipo de estudo: Diagnostic_studies / Etiology_studies / Prognostic_studies / Risk_factors_studies Limite: Humans Idioma: En Revista: Proc Natl Acad Sci U S A Ano de publicação: 2021 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google