Challenges in risk estimation using routinely collected clinical data: The example of estimating cervical cancer risks from electronic health-records.

Landy, Rebecca; Cheung, Li C; Schiffman, Mark; Gage, Julia C; Hyun, Noorie; Wentzensen, Nicolas; Kinney, Walter K; Castle, Philip E; Fetterman, Barbara; Poitras, Nancy E; Lorey, Thomas; Sasieni, Peter D; Katki, Hormuzd A

Landy, Rebecca; Cheung, Li C; Schiffman, Mark; Gage, Julia C; Hyun, Noorie; Wentzensen, Nicolas; Kinney, Walter K; Castle, Philip E; Fetterman, Barbara; Poitras, Nancy E; Lorey, Thomas; Sasieni, Peter D; Katki, Hormuzd A.

Affiliation

Landy R; Centre for Cancer Prevention, Wolfson Institute of Preventive Medicine, Barts and the London School of Medicine and Dentistry, Queen Mary, University of London, Charterhouse Square, London EC1M 6BQ, UK. Electronic address: R.Landy@qmul.ac.uk.
Cheung LC; Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, DHHS, Bethesda, MD, USA.
Schiffman M; Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, DHHS, Bethesda, MD, USA.
Gage JC; Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, DHHS, Bethesda, MD, USA.
Hyun N; Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, DHHS, Bethesda, MD, USA.
Wentzensen N; Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, DHHS, Bethesda, MD, USA.
Kinney WK; Division of Gynecologic Oncology, Kaiser Permanente Medical Care Program, Oakland, CA, USA.
Castle PE; Albert Einstein College of Medicine, Bronx, NY, USA.
Fetterman B; Regional Laboratory, Kaiser Permanente Northern California, Berkeley, CA, USA.
Poitras NE; Regional Laboratory, Kaiser Permanente Northern California, Berkeley, CA, USA.
Lorey T; Regional Laboratory, Kaiser Permanente Northern California, Berkeley, CA, USA.
Sasieni PD; Centre for Cancer Prevention, Wolfson Institute of Preventive Medicine, Barts and the London School of Medicine and Dentistry, Queen Mary, University of London, Charterhouse Square, London EC1M 6BQ, UK.
Katki HA; Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, DHHS, Bethesda, MD, USA.

Prev Med ; 111: 429-435, 2018 06.

Article in En | MEDLINE | ID: mdl-29222045

ABSTRACT

ABSTRACT

Electronic health-records (EHR) are increasingly used by epidemiologists studying disease following surveillance testing to provide evidence for screening intervals and referral guidelines. Although cost-effective, undiagnosed prevalent disease and interval censoring (in which asymptomatic disease is only observed at the time of testing) raise substantial analytic issues when estimating risk that cannot be addressed using Kaplan-Meier methods. Based on our experience analysing EHR from cervical cancer screening, we previously proposed the logistic-Weibull model to address these issues. Here we demonstrate how the choice of statistical method can impact risk estimates. We use observed data on 41,067 women in the cervical cancer screening program at Kaiser Permanente Northern California, 2003-2013, as well as simulations to evaluate the ability of different methods (Kaplan-Meier, Turnbull, Weibull and logistic-Weibull) to accurately estimate risk within a screening program. Cumulative risk estimates from the statistical methods varied considerably, with the largest differences occurring for prevalent disease risk when baseline disease ascertainment was random but incomplete. Kaplan-Meier underestimated risk at earlier times and overestimated risk at later times in the presence of interval censoring or undiagnosed prevalent disease. Turnbull performed well, though was inefficient and not smooth. The logistic-Weibull model performed well, except when event times didn't follow a Weibull distribution. We have demonstrated that methods for right-censored data, such as Kaplan-Meier, result in biased estimates of disease risks when applied to interval-censored data, such as screening programs using EHR data. The logistic-Weibull model is attractive, but the model fit must be checked against Turnbull non-parametric risk estimates.

Subject(s)

Early Detection of Cancer; Electronic Health Records/statistics & numerical data; Mass Screening; Models, Statistical; Risk Assessment; Uterine Cervical Neoplasms/diagnosis; Adult; California; Female; Humans; Middle Aged; Prevalence

Key words

Cervix; Electronic health-records; Epidemiology; Risk estimation; Screening; Statistical methods

Fulltext

XML

PubMed Links

Search on Google

Full text: 1 Database: MEDLINE Main subject: Uterine Cervical Neoplasms / Mass Screening / Models, Statistical / Risk Assessment / Early Detection of Cancer / Electronic Health Records Type of study: Diagnostic_studies / Etiology_studies / Prevalence_studies / Prognostic_studies / Risk_factors_studies / Screening_studies Limits: Adult / Female / Humans / Middle aged Country/Region as subject: America do norte Language: En Year: 2018 Type: Article

Fulltext

XML

PubMed Links

Search on Google