Evaluation of parameters affecting performance and reliability of machine learning-based antibiotic susceptibility testing from whole genome sequencing data.

Hicks, Allison L; Wheeler, Nicole; Sánchez-Busó, Leonor; Rakeman, Jennifer L; Harris, Simon R; Grad, Yonatan H

Hicks, Allison L; Wheeler, Nicole; Sánchez-Busó, Leonor; Rakeman, Jennifer L; Harris, Simon R; Grad, Yonatan H.

Afiliação

Hicks AL; Department of Immunology and Infectious Diseases, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America.
Wheeler N; Centre for Genomic Pathogen Surveillance, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, United Kingdom.
Sánchez-Busó L; Centre for Genomic Pathogen Surveillance, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, United Kingdom.
Rakeman JL; Big Data Institute, Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom.
Harris SR; Public Health Laboratory, Division of Disease Control, New York City Department of Health and Mental Hygiene, New York, New York, United States of America.
Grad YH; Microbiotica Ltd, Biodata Innovation Centre, Wellcome Genome Campus, Hinxton, Cambridgeshire, United Kingdom.

PLoS Comput Biol ; 15(9): e1007349, 2019 09.

Article em En | MEDLINE | ID: mdl-31479500

ABSTRACT

ABSTRACT

Prediction of antibiotic resistance phenotypes from whole genome sequencing data by machine learning methods has been proposed as a promising platform for the development of sequence-based diagnostics. However, there has been no systematic evaluation of factors that may influence performance of such models, how they might apply to and vary across clinical populations, and what the implications might be in the clinical setting. Here, we performed a meta-analysis of seven large Neisseria gonorrhoeae datasets, as well as Klebsiella pneumoniae and Acinetobacter baumannii datasets, with whole genome sequence data and antibiotic susceptibility phenotypes using set covering machine classification, random forest classification, and random forest regression models to predict resistance phenotypes from genotype. We demonstrate how model performance varies by drug, dataset, resistance metric, and species, reflecting the complexities of generating clinically relevant conclusions from machine learning-derived models. Our findings underscore the importance of incorporating relevant biological and epidemiological knowledge into model design and assessment and suggest that doing so can inform tailored modeling for individual drugs, pathogens, and clinical populations. We further suggest that continued comprehensive sampling and incorporation of up-to-date whole genome sequence data, resistance phenotypes, and treatment outcome data into model training will be crucial to the clinical utility and sustainability of machine learning-based molecular diagnostics.

Assuntos

Antibacterianos/farmacologia; Genoma Bacteriano/genética; Aprendizado de Máquina; Testes de Sensibilidade Microbiana/métodos; Sequenciamento Completo do Genoma; Algoritmos; Bactérias/efeitos dos fármacos; Bactérias/genética; Infecções Bacterianas/microbiologia; Biologia Computacional; Bases de Dados Genéticas; Humanos; Reprodutibilidade dos Testes

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Testes de Sensibilidade Microbiana / Genoma Bacteriano / Aprendizado de Máquina / Sequenciamento Completo do Genoma / Antibacterianos Tipo de estudo: Prognostic_studies / Systematic_reviews Limite: Humans Idioma: En Revista: PLoS Comput Biol Assunto da revista: BIOLOGIA / INFORMATICA MEDICA Ano de publicação: 2019 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google