Predicting obesity and smoking using medication data: A machine-learning approach.

Ali, Sitwat; Na, Renhua; Waterhouse, Mary; Jordan, Susan J; Olsen, Catherine M; Whiteman, David C; Neale, Rachel E

Ali, Sitwat; Na, Renhua; Waterhouse, Mary; Jordan, Susan J; Olsen, Catherine M; Whiteman, David C; Neale, Rachel E.

Afiliação

Ali S; Population Health Department, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia.
Na R; School of Population Health, University of Queensland, Brisbane, Queensland, Australia.
Waterhouse M; Population Health Department, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia.
Jordan SJ; Population Health Department, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia.
Olsen CM; Population Health Department, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia.
Whiteman DC; School of Population Health, University of Queensland, Brisbane, Queensland, Australia.
Neale RE; Population Health Department, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia.

Pharmacoepidemiol Drug Saf ; 31(1): 91-99, 2022 01.

Article em En | MEDLINE | ID: mdl-34611961

ABSTRACT

ABSTRACT

PURPOSE:

Administrative health datasets are widely used in public health research but often lack information about common confounders. We aimed to develop and validate machine learning (ML)-based models using medication data from Australia's Pharmaceutical Benefits Scheme (PBS) database to predict obesity and smoking.

METHODS:

We used data from the D-Health Trial (N = 18 000) and the QSkin Study (N = 43 794). Smoking history, and height and weight were self-reported at study entry. Linkage to the PBS dataset captured 5 years of medication data after cohort entry. We used age, sex, and medication use, classified using anatomical therapeutic classification codes, as potential predictors of smoking (current or quit <10 years ago; never or quit ≥10 years ago) and obesity (obese; non-obese). We trained gradient-boosted machine learning models using data for the first 80% of participants enrolled; models were validated using the remaining 20%. We assessed model performance overall and by sex and age, and compared models generated using 3 and 5 years of PBS data.

RESULTS:

Based on the validation dataset using 3 years of PBS data, the area under the receiver operating characteristic curve was 0.70 (95% confidence interval [CI] 0.68-0.71) for predicting obesity and 0.71 (95% CI 0.70-0.72) for predicting smoking. Models performed better in women than in men. Using 5 years of PBS data resulted in marginal improvement.

CONCLUSIONS:

Medication data in combination with age and sex can be used to predict obesity and smoking. These models may be of value to researchers using data collected for administrative purposes.

Assuntos

Aprendizado de Máquina; Obesidade; Criança; Estudos de Coortes; Feminino; Humanos; Masculino; Obesidade/epidemiologia; Curva ROC; Fumar/epidemiologia

Palavras-chave

gradient boosting machine; obesity; prediction model; smoking

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Aprendizado de Máquina / Obesidade Tipo de estudo: Etiology_studies / Incidence_studies / Observational_studies / Prognostic_studies / Risk_factors_studies Limite: Child / Female / Humans / Male Idioma: En Ano de publicação: 2022 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google