Targeted learning with daily EHR data.

Sofrygin, Oleg; Zhu, Zheng; Schmittdiel, Julie A; Adams, Alyce S; Grant, Richard W; van der Laan, Mark J; Neugebauer, Romain

Sofrygin, Oleg; Zhu, Zheng; Schmittdiel, Julie A; Adams, Alyce S; Grant, Richard W; van der Laan, Mark J; Neugebauer, Romain.

Afiliação

Sofrygin O; Division of Research, Kaiser Permanente, Northern California, Oakland, California.
Zhu Z; Division of Biostatistics, University of California, Berkeley, California.
Schmittdiel JA; Division of Research, Kaiser Permanente, Northern California, Oakland, California.
Adams AS; Division of Research, Kaiser Permanente, Northern California, Oakland, California.
Grant RW; Division of Research, Kaiser Permanente, Northern California, Oakland, California.
van der Laan MJ; Division of Research, Kaiser Permanente, Northern California, Oakland, California.
Neugebauer R; Division of Biostatistics, University of California, Berkeley, California.

Stat Med ; 38(16): 3073-3090, 2019 07 20.

Article em En | MEDLINE | ID: mdl-31025411

ABSTRACT

ABSTRACT

Electronic health records (EHR) data provide a cost- and time-effective opportunity to conduct cohort studies of the effects of multiple time-point interventions in the diverse patient population found in real-world clinical settings. Because the computational cost of analyzing EHR data at daily (or more granular) scale can be quite high, a pragmatic approach has been to partition the follow-up into coarser intervals of pre-specified length (eg, quarterly or monthly intervals). The feasibility and practical impact of analyzing EHR data at a granular scale has not been previously evaluated. We start filling these gaps by leveraging large-scale EHR data from a diabetes study to develop a scalable targeted learning approach that allows analyses with small intervals. We then study the practical effects of selecting different coarsening intervals on inferences by reanalyzing data from the same large-scale pool of patients. Specifically, we map daily EHR data into four analytic datasets using 90-, 30-, 15-, and 5-day intervals. We apply a semiparametric and doubly robust estimation approach, the longitudinal Targeted Minimum Loss-Based Estimation (TMLE), to estimate the causal effects of four dynamic treatment rules with each dataset, and compare the resulting inferences. To overcome the computational challenges presented by the size of these data, we propose a novel TMLE implementation, the "long-format TMLE," and rely on the latest advances in scalable data-adaptive machine-learning software, xgboost and h2o, for estimation of the TMLE nuisance parameters.

Assuntos

Algoritmos; Registros Eletrônicos de Saúde; Estudos Longitudinais; Causalidade; Simulação por Computador; Diabetes Mellitus; Humanos; Aprendizado de Máquina; Reprodutibilidade dos Testes

Palavras-chave

EHR; Targeted Minimum Loss-Based Estimation; big data; causal inference; dynamic treatment regimes; machine learning

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Algoritmos / Estudos Longitudinais / Registros Eletrônicos de Saúde Tipo de estudo: Observational_studies / Prognostic_studies Limite: Humans Idioma: En Revista: Stat Med Ano de publicação: 2019 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google