RESUMO
Random effect models for time-to-event data, also known as frailty models, provide a conceptually appealing way of quantifying association between survival times and of representing heterogeneities resulting from factors which may be difficult or impossible to measure. In the literature, the random effect is usually assumed to have a continuous distribution. However, in some areas of application, discrete frailty distributions may be more appropriate. The present paper is about the implementation and interpretation of the Addams family of discrete frailty distributions. We propose methods of estimation for this family of densities in the context of shared frailty models for the hazard rates for case I interval-censored data. Our optimization framework allows for stratification of random effect distributions by covariates. We highlight interpretational advantages of the Addams family of discrete frailty distributions and theK-point distribution as compared to other frailty distributions. A unique feature of the Addams family and the K-point distribution is that the support of the frailty distribution depends on its parameters. This feature is best exploited by imposing a model on the distributional parameters, resulting in a model with non-homogeneous covariate effects that can be analyzed using standard measures such as the hazard ratio. Our methods are illustrated with applications to multivariate case I interval-censored infection data.
RESUMO
Time-to-event endpoints are widely used as measures of patients' well-being and indicators of prognosis. In imaging-based biomarker studies, there are increasingly more studies that focus on examining imaging biomarkers' prognostic or predictive utilities on those endpoints, whether in a trial or an observational study setting. In this educational review article, we briefly introduce some basic concepts of time-to-event endpoints and point out potential pitfalls in the context of imaging biomarker research in hope of improving radiologists' understanding of related subjects. Besides, we have included some review and discussions on the benefits of using time-to-event endpoints and considerations on selecting overall survival or progression-free survival for primary analysis. LEVEL OF EVIDENCE: 5 TECHNICAL EFFICACY: Stage 3.
RESUMO
Time-to-event data are often recorded on a discrete scale with multiple, competing risks as potential causes for the event. In this context, application of continuous survival analysis methods with a single risk suffers from biased estimation. Therefore, we propose the multivariate Bernoulli detector for competing risks with discrete times involving a multivariate change point model on the cause-specific baseline hazards. Through the prior on the number of change points and their location, we impose dependence between change points across risks, as well as allowing for data-driven learning of their number. Then, conditionally on these change points, a multivariate Bernoulli prior is used to infer which risks are involved. Focus of posterior inference is cause-specific hazard rates and dependence across risks. Such dependence is often present due to subject-specific changes across time that affect all risks. Full posterior inference is performed through a tailored local-global Markov chain Monte Carlo (MCMC) algorithm, which exploits a data augmentation trick and MCMC updates from nonconjugate Bayesian nonparametric methods. We illustrate our model in simulations and on ICU data, comparing its performance with existing approaches.
Assuntos
Algoritmos , Teorema de Bayes , Simulação por Computador , Cadeias de Markov , Método de Monte Carlo , Humanos , Análise de Sobrevida , Modelos Estatísticos , Análise Multivariada , Biometria/métodosRESUMO
We propose a new simultaneous variable selection and estimation procedure with the Gaussian seamless- L 0 $$ {L}_0 $$ (GSELO) penalty for Cox proportional hazard model and additive hazards model. The GSELO procedure shows good potential to improve the existing variable selection methods by taking strength from both best subset selection (BSS) and regularization. In addition, we develop an iterative algorithm to implement the proposed procedure in a computationally efficient way. Theoretically, we establish the convergence properties of the algorithm and asymptotic theoretical properties of the proposed procedure. Since parameter tuning is crucial to the performance of the GSELO procedure, we also propose an extended Bayesian information criteria (EBIC) parameter selector for the GSELO procedure. Simulated and real data studies have demonstrated the prediction performance and effectiveness of the proposed method over several state-of-the-art methods.
Assuntos
Algoritmos , Humanos , Teorema de Bayes , Modelos de Riscos ProporcionaisRESUMO
In cancer and other medical studies, time-to-event (eg, death) data are common. One major task to analyze time-to-event (or survival) data is usually to compare two medical interventions (eg, a treatment and a control) regarding their effect on patients' hazard to have the event in concern. In such cases, we need to compare two hazard curves of the two related patient groups. In practice, a medical treatment often has a time-lag effect, that is, the treatment effect can only be observed after a time period since the treatment is applied. In such cases, the two hazard curves would be similar in an initial time period, and the traditional testing procedures, such as the log-rank test, would be ineffective in detecting the treatment effect because the similarity between the two hazard curves in the initial time period would attenuate the difference between the two hazard curves that is reflected in the related testing statistics. In this paper, we suggest a new method for comparing two hazard curves when there is a potential treatment time-lag effect based on a weighted log-rank test with a flexible weighting scheme. The new method is shown to be more effective than some representative existing methods in various cases when a treatment time-lag effect is present.
Assuntos
Modelos de Riscos Proporcionais , Humanos , Fatores de Tempo , Análise de Sobrevida , Simulação por Computador , FemininoRESUMO
It is becoming increasingly common for researchers to consider leveraging information from external sources to enhance the analysis of small-scale studies. While much attention has focused on univariate survival data, correlated survival data are prevalent in epidemiological investigations. In this article, we propose a unified framework to improve the estimation of the marginal accelerated failure time model with correlated survival data by integrating additional information given in the form of covariate effects evaluated in a reduced accelerated failure time model. Such auxiliary information can be summarized by using valid estimating equations and hence can then be combined with the internal linear rank-estimating equations via the generalized method of moments. We investigate the asymptotic properties of the proposed estimator and show that it is more efficient than the conventional estimator using internal data only. When population heterogeneity exists, we revise the proposed estimation procedure and present a shrinkage estimator to protect against bias and loss of efficiency. Moreover, the proposed estimation procedure can be further refined to accommodate the non-negligible uncertainty in the auxiliary information, leading to more trustable inference conclusions. Simulation results demonstrate the finite sample performance of the proposed methods, and empirical application on the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial substantiates its practical relevance.
RESUMO
BACKGROUND: In large multiregional cohort studies, survival data is often collected at small geographical levels (such as counties) and aggregated at larger levels, leading to correlated patterns that are associated with location. Traditional studies typically analyze such data globally or locally by region, often neglecting the spatial information inherent in the data, which can introduce bias in effect estimates and potentially reduce statistical power. METHOD: We propose a Geographically Weighted Accelerated Failure Time Model for spatial survival data to investigate spatial heterogeneity. We establish a weighting scheme and bandwidth selection based on quasi-likelihood information criteria. Theoretical properties of the proposed estimators are thoroughly examined. To demonstrate the efficacy of the model in various scenarios, we conduct a simulation study with different sample sizes and adherence to the proportional hazards assumption or not. Additionally, we apply the proposed method to analyze ovarian cancer survival data from the Surveillance, Epidemiology, and End Results cancer registry in the state of New Jersey. RESULTS: Our simulation results indicate that the proposed model exhibits superior performance in terms of four measurements compared to existing methods, including the geographically weighted Cox model, when the proportional hazards assumption is violated. Furthermore, in scenarios where the sample size per location is 20-25, the simulation data failed to fit the local model, while our proposed model still demonstrates satisfactory performance. In the empirical study, we identify clear spatial variations in the effects of all three covariates. CONCLUSION: Our proposed model offers a novel approach to exploring spatial heterogeneity of survival data compared to global and local models, providing an alternative to geographically weighted Cox regression when the proportional hazards assumption is not met. It addresses the issue of certain counties' survival data being unable to fit the model due to limited samples, particularly in the context of rare diseases.
Assuntos
Neoplasias Ovarianas , Modelos de Riscos Proporcionais , Humanos , Feminino , Neoplasias Ovarianas/mortalidade , New Jersey/epidemiologia , Análise de Sobrevida , Simulação por Computador , Programa de SEER/estatística & dados numéricos , Modelos Estatísticos , Algoritmos , Funções VerossimilhançaRESUMO
Subgroup analysis may be used to investigate treatment effect heterogeneity among subsets of the study population defined by baseline characteristics. Several methodologies have been proposed in recent years and with these, statistical issues such as multiplicity, complexity, and selection bias have been widely discussed. Some methods adjust for one or more of these issues; however, few of them discuss or consider the stability of the subgroup assignments. We propose exploring the stability of subgroups as a sensitivity analysis step for stratified medicine to assess the robustness of the identified subgroups besides identifying possible factors that may drive this instability. After applying Bayesian credible subgroups, a nonparametric bootstrap can be used to assess stability at subgroup-level and patient-level. Our findings illustrate that when the treatment effect is small or not so evident, patients are more likely to switch to different subgroups (jumpers) across bootstrap resamples. In contrast, when the treatment effect is large or extremely convincing, patients generally remain in the same subgroup. While the proposed subgroup stability method is illustrated through Bayesian credible subgroups method on time-to-event data, this general approach can be used with other subgroup identification methods and endpoints.
RESUMO
In many biomedical applications, outcome is measured as a "time-to-event" (e.g., disease progression or death). To assess the connection between features of a patient and this outcome, it is common to assume a proportional hazards model and fit a proportional hazards regression (or Cox regression). To fit this model, a log-concave objective function known as the "partial likelihood" is maximized. For moderate-sized data sets, an efficient Newton-Raphson algorithm that leverages the structure of the objective function can be employed. However, in large data sets this approach has two issues: (i) The computational tricks that leverage structure can also lead to computational instability; (ii) The objective function does not naturally decouple: Thus, if the data set does not fit in memory, the model can be computationally expensive to fit. This additionally means that the objective is not directly amenable to stochastic gradient-based optimization methods. To overcome these issues, we propose a simple, new framing of proportional hazards regression: This results in an objective function that is amenable to stochastic gradient descent. We show that this simple modification allows us to efficiently fit survival models with very large data sets. This also facilitates training complex, for example, neural-network-based, models with survival data.
RESUMO
OBJECTIVE: To assess long-term outcomes of patients with advanced-stage ovarian cancer by treatment type. METHODS: Patients with newly diagnosed stage III-IV ovarian cancer who underwent primary treatment at our tertiary cancer center from 01/01/2015-12/31/2015 were included. We reviewed electronic medical records for clinicopathological, treatment, and survival characteristics. RESULTS: Of 153 patients, 88 (58%) had stage III and 65 (42%) stage IV disease. Median follow-up was 65.8 months (range, 3.6-75.3). Eighty-nine patients (58%) underwent primary debulking surgery (PDS), 50 (33%) received neoadjuvant chemotherapy followed by interval debulking surgery (IDS), and 14 (9%) received chemotherapy alone, without surgery (NSx). Median PFS to first recurrence was 26.2 months (range, 20.1-36.2), 13.5 months (range, 12-15.1), and 4.2 months (range, 1.1-5.8) in the PDS, IDS, and NSx groups, respectively (P < .001). At first recurrence/progression, 80 patients (72.7%) were treated with chemotherapy, 28 (25.5%) underwent secondary cytoreductive surgery (CRS) followed by chemotherapy, and 2 (1.8%) received no treatment. Seven patients (4.6%) underwent palliative surgery for malignant bowel obstruction. Overall, 62.7% received 1-3 lines of chemotherapy. The 5-year OS rates were 53.2% (95% CI: 44.7%-61%) for the entire cohort, 71.5% (95% CI: 60.2%-80%) for the PDS group, 35.2% (95% CI: 22.2-48.5%) for the IDS group, and 7.9% (95% CI: 0.5%-29.9%) for the NSx group. CONCLUSION: The longitudinal treatment modalities and outcomes of patients with advanced ovarian cancer described here can be useful for patient counseling, long-term planning, and future comparison studies.
Assuntos
Neoplasias Ovarianas , Humanos , Feminino , Seguimentos , Estudos Retrospectivos , Estadiamento de Neoplasias , Quimioterapia Adjuvante , Carcinoma Epitelial do Ovário/tratamento farmacológico , Neoplasias Ovarianas/cirurgia , Neoplasias Ovarianas/tratamento farmacológico , Terapia Neoadjuvante , Procedimentos Cirúrgicos de CitorreduçãoRESUMO
Genetic interactions play an important role in the progression of complex diseases, providing explanation of variations in disease phenotype missed by main genetic effects. Comparatively, there are fewer studies on survival time, given its challenging characteristics such as censoring. In recent biomedical research, two-level analysis of both genes and their involved pathways has received much attention and been demonstrated as more effective than single-level analysis. However, such analysis is usually limited to main effects. Pathways are not isolated, and their interactions have also been suggested to have important contributions to the prognosis of complex diseases. In this paper, we develop a novel two-level Bayesian interaction analysis approach for survival data. This approach is the first to conduct the analysis of lower-level gene-gene interactions and higher-level pathway-pathway interactions simultaneously. Significantly advancing from the existing Bayesian studies based on the Markov Chain Monte Carlo (MCMC) technique, we propose a variational inference framework based on the accelerated failure time model with effective priors to accommodate two-level selection as well as censoring. Its computational efficiency is much desirable for high-dimensional interaction analysis. We examine performance of the proposed approach using extensive simulation. The application to TCGA melanoma and lung adenocarcinoma data leads to biologically sensible findings with satisfactory prediction accuracy and selection stability.
Assuntos
Teorema de Bayes , Simulação por Computador , Fenótipo , Cadeias de Markov , Método de Monte CarloRESUMO
Disease registries, surveillance data, and other datasets with extremely large sample sizes become increasingly available in providing population-based information on disease incidence, survival probability, or other important public health characteristics. Such information can be leveraged in studies that collect detailed measurements but with smaller sample sizes. In contrast to recent proposals that formulate additional information as constraints in optimization problems, we develop a general framework to construct simple estimators that update the usual regression estimators with some functionals of data that incorporate the additional information. We consider general settings that incorporate nuisance parameters in the auxiliary information, non-i.i.d. data such as those from case-control studies, and semiparametric models with infinite-dimensional parameters common in survival analysis. Details of several important data and sampling settings are provided with numerical examples.
Assuntos
Modelos Estatísticos , Funções Verossimilhança , Simulação por Computador , Interpretação Estatística de Dados , Análise de Sobrevida , Estudos de Casos e ControlesRESUMO
Additive frailty models are used to model correlated survival data. However, the complexity of the models increases with cluster size to the extent that practical usage becomes increasingly challenging. We present a modification of the additive genetic gamma frailty (AGGF) model, the lean AGGF (L-AGGF) model, which alleviates some of these challenges by using a leaner additive decomposition of the frailty. The performances of the models were compared and evaluated in a simulation study. The L-AGGF model was used to analyze population-wide data on clustering of melanoma in 2 391 125 two-generational Norwegian families, 1960-2015. Using this model, we could analyze the complete data set, while the original model limited the analysis to a restricted data set (with cluster sizes ≤ 7 $$ \le 7 $$ ). We found a substantial clustering of melanoma in Norwegian families and large heterogeneity in melanoma risk across the population, where 52% of the frailty was attributed to the 10% of the population at highest unobserved risk. Due to the improved scalability, the L-AGGF model enables a wider range of analyses of population-wide data compared to the AGGF model. Moreover, the methods outlined here make it possible to perform these analyses in a computationally efficient manner.
Assuntos
Fragilidade , Melanoma , Humanos , Modelos Estatísticos , Fragilidade/epidemiologia , Simulação por Computador , Análise por Conglomerados , Melanoma/epidemiologia , Melanoma/genética , Análise de SobrevidaRESUMO
The accelerated failure time (AFT) model offers an important and useful alternative to the conventional Cox proportional hazards model, particularly when the proportional hazards assumption for a Cox model is violated. Since an AFT model is basically a log-linear model, meaningful interpretations of covariate effects on failure times can be made directly. However, estimation of a semiparametric AFT model imposes computational challenges even when it only has time-fixed covariates, and the situation becomes much more complicated when time-varying covariates are included. In this paper, we propose a penalised likelihood approach to estimate the semiparametric AFT model with right-censored failure time, where both time-fixed and time-varying covariates are permitted. We adopt the Gaussian basis functions to construct a smooth approximation to the nonparametric baseline hazard. This model fitting method requires a constrained optimisation approach. A comprehensive simulation study is conducted to demonstrate the performance of the proposed method. An application of our method to a motor neuron disease data set is provided.
Assuntos
Modelos Estatísticos , Humanos , Funções Verossimilhança , Modelos de Riscos Proporcionais , Simulação por Computador , Modelos LinearesRESUMO
Recurrent events are commonly encountered in biomedical studies. In many situations, there exist terminal events, such as death, which are potentially related to the recurrent events. Joint models of recurrent and terminal events have been proposed to address the correlation between recurrent events and terminal events. However, there is a dearth of suitable methods to rigorously investigate the causal mechanisms between specific exposures, recurrent events, and terminal events. For example, it is of interest to know how much of the total effect of the primary exposure of interest on the terminal event is through the recurrent events, and whether preventing recurrent event occurrences could lead to better overall survival. In this work, we propose a formal causal mediation analysis method to compute the natural direct and indirect effects. A novel joint modeling approach is used to take the recurrent event process as the mediator and the survival endpoint as the outcome. This new joint modeling approach allows us to relax the commonly used "sequential ignorability" assumption. Simulation studies show that our new model has good finite sample performance in estimating both model parameters and mediation effects. We apply our method to an AIDS study to evaluate how much of the comparative effectiveness of the two treatments and the effect of CD4 counts on the overall survival are mediated by recurrent opportunistic infections.
Assuntos
Modelos Estatísticos , Humanos , Simulação por Computador , CausalidadeRESUMO
Phase II/III clinical trials are efficient two-stage designs that test multiple experimental treatments. In stage 1, patients are allocated to the control and all experimental treatments, with the data collected from them used to select experimental treatments to continue to stage 2. Patients recruited in stage 2 are allocated to the selected treatments and the control. Combined data of stage 1 and stage 2 are used for a confirmatory phase III analysis. Appropriate analysis needs to adjust for selection bias of the stage 1 data. Point estimators exist for normally distributed outcome data. Extending these estimators to time to event data is not straightforward because treatment selection is based on correlated treatment effects and stage 1 patients who do not get events in stage 1 are followed-up in stage 2. We have derived an approximately uniformly minimum variance conditional unbiased estimator (UMVCUE) and compared its biases and mean squared errors to existing bias adjusted estimators. In simulations, one existing bias adjusted estimator has similar properties as the practically unbiased UMVCUE while the others can have noticeable biases but they are less variable than the UMVCUE. For confirmatory phase II/III clinical trials where unbiased estimators are desired, we recommend the UMVCUE or the existing estimator with which it has similar properties.
Assuntos
Seleção de Pacientes , Humanos , Viés , Viés de SeleçãoRESUMO
Window mean survival time (WMST), a simple extension of restricted mean survival time (RMST), allows for clinicians to evaluate the mean survival difference between treatment groups in specific windows of time during the follow-up period of a trial. The advantages of WMST are numerous. Not only does it produce estimates of treatment effect that can be meaningfully interpreted, but also has power advantages over competing methods when hazards are non-proportional (NPH). WMST, like RMST, is currently underutilized due to clinicians' lack of familiarity with tests comparing mean survival times and the lack of tools to facilitate trial design with this endpoint. The aim of this article is to provide investigators with insights and software to design trials with WMST as the primary endpoint. Functions for performing power and sample size calculations are provided in the survWMST package in R available on GitHub.
Assuntos
Projetos de Pesquisa , Humanos , Modelos de Riscos Proporcionais , Taxa de Sobrevida , Tamanho da Amostra , Fatores de Tempo , Análise de SobrevidaRESUMO
The restricted mean survival time (RMST) is an appealing measurement in clinical or epidemiological studies with censored survival outcome and receives a lot of attention in the past decades. It provides a useful alternative to the Cox model for evaluating the covariate effect on survival time. The covariate effect on RMST usually varies with the restriction time. However, existing methods cannot address this problem properly. In this article, we propose a semiparametric framework that directly models RMST as a function of the restriction time. Our proposed model adopts a widely-used proportional form, enabling the estimation of RMST predictions across an interval using a unified model. Furthermore, the covariate effect for multiple restriction time points can be derived simultaneously. We develop estimators based on estimating equations theories and establish the asymptotic properties of the proposed estimators. The finite sample properties of the estimators are evaluated through extensive simulation studies. We further illustrate the application of our proposed method through the analysis of two real data examples. Supplementary Material are available online.
Assuntos
Taxa de Sobrevida , Humanos , Modelos de Riscos Proporcionais , Simulação por Computador , Análise de SobrevidaRESUMO
Joint modeling and landmark modeling are two mainstream approaches to dynamic prediction in longitudinal studies, that is, the prediction of a clinical event using longitudinally measured predictor variables available up to the time of prediction. It is an important research question to the methodological research field and also to practical users to understand which approach can produce more accurate prediction. There were few previous studies on this topic, and the majority of results seemed to favor joint modeling. However, these studies were conducted in scenarios where the data were simulated from the joint models, partly due to the widely recognized methodological difficulty on whether there exists a general joint distribution of longitudinal and survival data so that the landmark models, which consists of infinitely many working regression models for survival, hold simultaneously. As a result, the landmark models always worked under misspecification, which caused difficulty in interpreting the comparison. In this paper, we solve this problem by using a novel algorithm to generate longitudinal and survival data that satisfies the working assumptions of the landmark models. This innovation makes it possible for a "fair" comparison of joint modeling and landmark modeling in terms of model specification. Our simulation results demonstrate that the relative performance of these two modeling approaches depends on the data settings and one does not always dominate the other in terms of prediction accuracy. These findings stress the importance of methodological development for both approaches. The related methodology is illustrated with a kidney transplantation dataset.
Assuntos
Modelos Estatísticos , Humanos , Simulação por Computador , Estudos LongitudinaisRESUMO
Joint models and statistical inference for longitudinal and survival data have been an active area of statistical research and have mostly coupled a longitudinal biomarker-based mixed-effects model with normal distribution and an event time-based survival model. In practice, however, the following issues may standout: (i) Normality of model error in longitudinal models is a routine assumption, but it may be unrealistically violating data features of subject variations. (ii) Data collected are often featured by the mixed types of multiple longitudinal outcomes which are significantly correlated, ignoring their correlation may lead to biased estimation. Additionally, a parametric model specification may be inflexible to capture the complicated patterns of longitudinal data. (iii) Missing observations in the longitudinal data are often encountered; the missing measures are likely to be informative (nonignorable) and ignoring this phenomenon may result in inaccurate inference. Multilevel item response theory (MLIRT) models have been increasingly used to analyze the multiple longitudinal data of mixed types (ie, continuous and categorical) in clinical studies. In this article, we develop an MLIRT-based semiparametric joint model with skew-t distribution that consists of an extended MLIRT model for the mixed types of multiple longitudinal data and a Cox proportional hazards model, linked through random-effects. A Bayesian approach is employed for joint modeling. Simulation studies are conducted to assess performance of the proposed models and method. A real example from primary biliary cirrhosis clinical study is analyzed to estimate parameters in the joint model and also evaluate sensitivity of parameter estimates for various plausible nonignorable missing data mechanisms.