Búsqueda | Portal de Búsqueda de la BVS

Multivariate survival analysis in big data: A divide-and-combine approach.

Wang, Wei; Lu, Shou-En; Cheng, Jerry Q; Xie, Minge; Kostis, John B.

Biometrics ; 78(3): 852-866, 2022 09.

Artículo en Inglés | MEDLINE | ID: mdl-33847371

RESUMEN

Multivariate failure time data are frequently analyzed using the marginal proportional hazards models and the frailty models. When the sample size is extraordinarily large, using either approach could face computational challenges. In this paper, we focus on the marginal model approach and propose a divide-and-combine method to analyze large-scale multivariate failure time data. Our method is motivated by the Myocardial Infarction Data Acquisition System (MIDAS), a New Jersey statewide database that includes 73,725,160 admissions to nonfederal hospitals and emergency rooms (ERs) from 1995 to 2017. We propose to randomly divide the full data into multiple subsets and propose a weighted method to combine these estimators obtained from individual subsets using three weights. Under mild conditions, we show that the combined estimator is asymptotically equivalent to the estimator obtained from the full data as if the data were analyzed all at once. In addition, to screen out risk factors with weak signals, we propose to perform the regularized estimation on the combined estimator using its combined confidence distribution. Theoretical properties, such as consistency, oracle properties, and asymptotic equivalence between the divide-and-combine approach and the full data approach are studied. Performance of the proposed method is investigated using simulation studies. Our method is applied to the MIDAS data to identify risk factors related to multivariate cardiovascular-related health outcomes.

Asunto(s)

Análisis de Supervivencia , Simulación por Computador , Análisis Multivariante , Modelos de Riesgos Proporcionales , Tamaño de la Muestra

Double-Parallel Monte Carlo for Bayesian Analysis of Big Data.

Xue, Jingnan; Liang, Faming.

Stat Comput ; 29(1): 23-32, 2019 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-31011242

RESUMEN

This paper proposes a simple, practical and efficient MCMC algorithm for Bayesian analysis of big data. The proposed algorithm suggests to divide the big dataset into some smaller subsets and provides a simple method to aggregate the subset posteriors to approximate the full data posterior. To further speed up computation, the proposed algorithm employs the population stochastic approximation Monte Carlo (Pop-SAMC) algorithm, a parallel MCMC algorithm, to simulate from each subset posterior. Since this algorithm consists of two levels of parallel, data parallel and simulation parallel, it is coined as "Double Parallel Monte Carlo". The validity of the proposed algorithm is justified mathematically and numerically.

RESUMEN

Asunto(s)

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA