Your browser doesn't support javascript.
loading
Assessing putative bias in prediction of anti-microbial resistance from real-world genotyping data under explicit causal assumptions.
Prosperi, Mattia; Boucher, Christina; Bian, Jiang; Marini, Simone.
Afiliación
  • Prosperi M; Department of Epidemiology, University of Florida, Gainesville 32610, FL, USA.
  • Boucher C; Department of Computer Science and Information and Engineering, University of Florida, Gainesville 32611, FL, USA.
  • Bian J; Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville 32610, FL, USA.
  • Marini S; Department of Epidemiology, University of Florida, Gainesville 32610, FL, USA. Electronic address: simone.marini@ufl.edu.
Artif Intell Med ; 130: 102326, 2022 08.
Article en En | MEDLINE | ID: mdl-35809965
ABSTRACT
Whole genome sequencing (WGS) is quickly becoming the customary means for identification of antimicrobial resistance (AMR) due to its ability to obtain high resolution information about the genes and mechanisms that are causing resistance and driving pathogen mobility. By contrast, traditional phenotypic (antibiogram) testing cannot easily elucidate such information. Yet development of AMR prediction tools from genotype-phenotype data can be biased, since sampling is non-randomized. Sample provenience, period of collection, and species representation can confound the association of genetic traits with AMR. Thus, prediction models can perform poorly on new data with sampling distribution shifts. In this work -under an explicit set of causal assumptions- we evaluate the effectiveness of propensity-based rebalancing and confounding adjustment on antibiotic resistance prediction using genotype-phenotype AMR data from the Pathosystems Resource Integration Center (PATRIC). We select bacterial genotypes (encoded as k-mer signatures, i.e., DNA fragments of length k), country, year, species, and AMR phenotypes for the tetracycline drug class, preparing test data with recent genomes coming from a single country. We test boosted logistic regression (BLR) and random forests (RF) with/without bias-handling. On 10,936 instances, we find evidence of species, location and year imbalance with respect to the AMR phenotype. The crude versus bias-adjusted change in effect of genetic signatures on AMR varies but only moderately (selecting the top 20,000 out of 40+ million k-mers). The area under the receiver operating characteristic (AUROC) of the RF (0.95) is comparable to that of BLR (0.94) on both out-of-bag samples from bootstrap and the external test (n = 1085), where AUROCs do not decrease. We observe a 1 %-5 % gain in AUROC with bias-handling compared to the sole use of genetic signatures. In conclusion, we recommend using causally-informed prediction methods for modeling real-world AMR data; however, traditional adjustment or propensity-based methods may not provide advantage in all use cases and further methodological development should be sought.
Asunto(s)
Palabras clave

Texto completo: 1 Base de datos: MEDLINE Asunto principal: Genoma Bacteriano / Antibacterianos Tipo de estudio: Clinical_trials / Prognostic_studies / Risk_factors_studies Idioma: En Revista: Artif Intell Med Asunto de la revista: INFORMATICA MEDICA Año: 2022 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo: 1 Base de datos: MEDLINE Asunto principal: Genoma Bacteriano / Antibacterianos Tipo de estudio: Clinical_trials / Prognostic_studies / Risk_factors_studies Idioma: En Revista: Artif Intell Med Asunto de la revista: INFORMATICA MEDICA Año: 2022 Tipo del documento: Article País de afiliación: Estados Unidos