Your browser doesn't support javascript.
loading
Two-phase stratified sampling and analysis for predicting binary outcomes.
Cao, Yaqi; Haneuse, Sebastien; Zheng, Yingye; Chen, Jinbo.
Afiliação
  • Cao Y; Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA 19104, USA and Department of Mathematical Sciences, Tsinghua University, Beijing 100084, China.
  • Haneuse S; Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA 02115, USA.
  • Zheng Y; Department of Biostatistics, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave. N., Seattle, WA 98109, USA.
  • Chen J; Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA 19104, USA.
Biostatistics ; 24(3): 585-602, 2023 Jul 14.
Article em En | MEDLINE | ID: mdl-34923588
The two-phase study design is a cost-efficient sampling strategy when certain data elements are expensive and, thus, can only be collected on a sub-sample of subjects. To date guidance on how best to allocate resources within the design has assumed that primary interest lies in estimating association parameters. When primary interest lies in the development and evaluation of a risk prediction tool, however, such guidance may, in fact, be detrimental. To resolve this, we propose a novel strategy for resource allocation based on oversampling cases and subjects who have more extreme risk estimates according to a preliminary model developed using fully observed predictors. Key to the proposed strategy is that it focuses on enhancing efficiency regarding estimation of measures of predictive accuracy, rather than on efficiency regarding association parameters which is the standard paradigm. Towards valid estimation and inference for accuracy measures using the resultant data, we extend an existing semiparametric maximum likelihood ethod for estimating odds ratio association parameters to accommodate the biased sampling scheme and data incompleteness. Motivated by our sampling design, we additionally propose a general post-stratification scheme for analyzing general two-phase data for estimating predictive accuracy measures. Through theoretical calculations and simulation studies, we show that the proposed sampling strategy and post-stratification scheme achieve the promised efficiency improvement. Finally, we apply the proposed methods to develop and evaluate a preliminary model for predicting the risk of hospital readmission after cardiac surgery using data from the Pennsylvania Health Care Cost Containment Council.
Assuntos
Palavras-chave

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Projetos de Pesquisa Tipo de estudo: Prognostic_studies / Risk_factors_studies Limite: Humans Idioma: En Ano de publicação: 2023 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Projetos de Pesquisa Tipo de estudo: Prognostic_studies / Risk_factors_studies Limite: Humans Idioma: En Ano de publicação: 2023 Tipo de documento: Article