RESUMEN
In cohort studies, it can be infeasible to collect specimens on an entire cohort. For example, to estimate sensitivity of multiple Multi-Cancer Detection (MCD) assays, we desire an extra 80mL of cell-free DNA (cfDNA) blood, but this much extra blood is too expensive for us to collect on everyone. We propose a novel epidemiologic study design that efficiently oversamples those at highest baseline disease risk from whom to collect specimens, to increase the number of future cases with cfDNA blood collection. The variance reduction ratio from our risk-based subsample versus a simple random (sub)sample (SRS) depends primarily on the ratio of risk model sensitivity to the fraction of the cohort selected for specimen collection subject to constraining the risk model specificity. In a simulation where we chose 34% of Prostate, Lung, Colorectal, and Ovarian Screening Trial cohort at highest risk of lung cancer for cfDNA blood collection, we could enrich the number of lung cancers 2.42-fold and the standard deviation of lung-cancer MCD sensitivity was 31-33% reduced versus SRS. Risk-based collection of specimens on a subsample of the cohort could be a feasible and efficient approach to collecting extra specimens for molecular epidemiology.