Your browser doesn't support javascript.
loading
Use of random forest to estimate population attributable fractions from a case-control study of Salmonella enterica serotype Enteritidis infections.
Gu, W; Vieira, A R; Hoekstra, R M; Griffin, P M; Cole, D.
Afiliação
  • Gu W; Centers for Disease Control and Prevention, Enteric Diseases Epidemiology Branch,Atlanta,GA,USA.
  • Vieira AR; Centers for Disease Control and Prevention, Enteric Diseases Epidemiology Branch,Atlanta,GA,USA.
  • Hoekstra RM; Centers for Disease Control,Division of Foodborne,Waterborne and Environmental Diseases Atlanta,GA,USA.
  • Griffin PM; Centers for Disease Control and Prevention, Enteric Diseases Epidemiology Branch,Atlanta,GA,USA.
  • Cole D; Centers for Disease Control and Prevention, Enteric Diseases Epidemiology Branch,Atlanta,GA,USA.
Epidemiol Infect ; 143(13): 2786-94, 2015 Oct.
Article em En | MEDLINE | ID: mdl-25672399
ABSTRACT
To design effective food safety programmes we need to estimate how many sporadic foodborne illnesses are caused by specific food sources based on case-control studies. Logistic regression has substantive limitations for analysing structured questionnaire data with numerous exposures and missing values. We adapted random forest to analyse data of a case-control study of Salmonella enterica serotype Enteritidis illness for source attribution. For estimation of summary population attributable fractions (PAFs) of exposures grouped into transmission routes, we devised a counterfactual estimator to predict reductions in illness associated with removing grouped exposures. For the purpose of comparison, we fitted the data using logistic regression models with stepwise forward and backward variable selection. Our results show that the forward and backward variable selection of logistic regression models were not consistent for parameter estimation, with different significant exposures identified. By contrast, the random forest model produced estimated PAFs of grouped exposures consistent in rank order with results obtained from outbreak data, with egg-related exposures having the highest estimated PAF (22·1%, 95% confidence interval 8·5-31·8). Random forest might be structurally more coherent and efficient than logistic regression models for attributing Salmonella illnesses to sources involving many causal pathways.
Assuntos
Palavras-chave

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Salmonella enteritidis / Intoxicação Alimentar por Salmonella / Árvores de Decisões País/Região como assunto: America do norte Idioma: En Ano de publicação: 2015 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Salmonella enteritidis / Intoxicação Alimentar por Salmonella / Árvores de Decisões País/Região como assunto: America do norte Idioma: En Ano de publicação: 2015 Tipo de documento: Article País de afiliação: Estados Unidos