Your browser doesn't support javascript.
loading
Multivariate survival analysis in big data: A divide-and-combine approach.
Wang, Wei; Lu, Shou-En; Cheng, Jerry Q; Xie, Minge; Kostis, John B.
Afiliação
  • Wang W; Department of Biostatistics and Epidemiology, Rutgers University, Piscataway, New Jersey, USA.
  • Lu SE; Department of Biostatistics and Epidemiology, Rutgers University, Piscataway, New Jersey, USA.
  • Cheng JQ; Department of Computer Science, New York Institute of Technology, New York, New York, USA.
  • Xie M; Department of Statistics, Rutgers University, Piscataway, New Jersey, USA.
  • Kostis JB; Cardiovascular Institute, Rutgers Robert Wood Johnson Medical School, New Brunswick, New Jersey, USA.
Biometrics ; 78(3): 852-866, 2022 09.
Article em En | MEDLINE | ID: mdl-33847371
ABSTRACT
Multivariate failure time data are frequently analyzed using the marginal proportional hazards models and the frailty models. When the sample size is extraordinarily large, using either approach could face computational challenges. In this paper, we focus on the marginal model approach and propose a divide-and-combine method to analyze large-scale multivariate failure time data. Our method is motivated by the Myocardial Infarction Data Acquisition System (MIDAS), a New Jersey statewide database that includes 73,725,160 admissions to nonfederal hospitals and emergency rooms (ERs) from 1995 to 2017. We propose to randomly divide the full data into multiple subsets and propose a weighted method to combine these estimators obtained from individual subsets using three weights. Under mild conditions, we show that the combined estimator is asymptotically equivalent to the estimator obtained from the full data as if the data were analyzed all at once. In addition, to screen out risk factors with weak signals, we propose to perform the regularized estimation on the combined estimator using its combined confidence distribution. Theoretical properties, such as consistency, oracle properties, and asymptotic equivalence between the divide-and-combine approach and the full data approach are studied. Performance of the proposed method is investigated using simulation studies. Our method is applied to the MIDAS data to identify risk factors related to multivariate cardiovascular-related health outcomes.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Análise de Sobrevida Tipo de estudo: Prognostic_studies / Risk_factors_studies Idioma: En Ano de publicação: 2022 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Análise de Sobrevida Tipo de estudo: Prognostic_studies / Risk_factors_studies Idioma: En Ano de publicação: 2022 Tipo de documento: Article