Your browser doesn't support javascript.
loading
Identifying inpatient mortality in MarketScan claims data using machine learning.
Xie, Fenglong; Beukelman, Timothy; Sun, Dongmei; Yun, Huifeng; Curtis, Jeffrey R.
Afiliação
  • Xie F; Department of Medicine, Division of Clinical Immunology and Rheumatology, University of Alabama at Birmingham, Birmingham, Alabama, USA.
  • Beukelman T; Foundation for Science, Technology, Education, and Research (FASTER), Birmingham, Alabama, USA.
  • Sun D; Foundation for Science, Technology, Education, and Research (FASTER), Birmingham, Alabama, USA.
  • Yun H; Department of Medicine, Division of Clinical Immunology and Rheumatology, University of Alabama at Birmingham, Birmingham, Alabama, USA.
  • Curtis JR; Department of Medicine, Division of Clinical Immunology and Rheumatology, University of Alabama at Birmingham, Birmingham, Alabama, USA.
Pharmacoepidemiol Drug Saf ; 32(11): 1299-1305, 2023 11.
Article em En | MEDLINE | ID: mdl-37344984
ABSTRACT

PURPOSE:

Inpatient mortality is an important variable in epidemiology studies using claims data. In 2016, MarketScan data began obscuring specific hospital discharge status types for patient privacy, including inpatient deaths, by setting the values to missing. We used a machine learning approach to correctly identify hospitalizations that resulted in inpatient death using data prior to 2016.

METHODS:

All hospitalizations from 2011 to 2015 with discharge status of missing, died, or one of the other subsequently obscured values were identified and divided into a training set and two test sets. Predictor variables included age, sex, elapsed time from hospital discharge until last observed claim and until healthcare plan disenrollment, and absence of any discharge diagnoses. Four machine learning methods were used to train statistical models and assess sensitivity and positive predictive value (PPV) for inpatient mortality.

RESULTS:

Overall 1 307 917 hospitalizations were included. All four machine learning approaches performed well in all datasets. Random forest performed best with 88% PPV and 93% sensitivity for the training set and both test sets. The two factors with the highest relative importance for identifying inpatient mortality were having no observed claims for the patient on days 2-91 following hospital discharge and patient disenrollment from the healthcare plan within 60 days following hospital discharge.

CONCLUSION:

We successfully developed machine learning algorithms to identify inpatient mortality. This approach can be applied to obscured data to accurately identify inpatient mortality among hospitalizations with missing discharge status.
Assuntos
Palavras-chave

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Aprendizado de Máquina / Pacientes Internados Tipo de estudo: Observational_studies / Prognostic_studies / Risk_factors_studies Limite: Humans Idioma: En Ano de publicação: 2023 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Aprendizado de Máquina / Pacientes Internados Tipo de estudo: Observational_studies / Prognostic_studies / Risk_factors_studies Limite: Humans Idioma: En Ano de publicação: 2023 Tipo de documento: Article