RESUMEN
To optimize colorectal cancer (CRC) surveillance, accurate information on the risk of developing CRC from premalignant lesions is essential. However, directly observing this risk is challenging since precursor lesions, i.e., advanced adenomas (AAs), are removed upon detection. Statistical methods for multistate models can estimate risks, but estimation is challenging due to low CRC incidence. We propose an outcome-dependent sampling (ODS) design for this problem in which we oversample CRCs. More specifically, we propose a three-state model for jointly estimating the time distributions from baseline colonoscopy to AA and from AA onset to CRC accounting for the ODS design using a weighted likelihood approach. We applied the methodology to a sample from a Norwegian adenoma cohort (1993-2007), comprising 1, 495 individuals (median follow-up 6.8 years [IQR: 1.1 - 12.8 years]) of whom 648 did and 847 did not develop CRC. We observed a 5-year AA risk of 13% and 34% for individuals having non-advanced adenoma (NAA) and AA removed at baseline colonoscopy, respectively. Upon AA development, the subsequent risk to develop CRC in 5 years was 17% and age-dependent. These estimates provide a basis for optimizing surveillance intensity and determining the optimal trade-off between CRC prevention, costs, and use of colonoscopy resources.
RESUMEN
In longitudinal follow-up studies, panel count data arise from discrete observations on recurrent events. We investigate a more general situation where a partly interval-censored failure event is informative to recurrent events. The existing methods for the informative failure event are based on the latent variable model, which provides indirect interpretation for the effect of failure event. To solve this problem, we propose a failure-time-dependent proportional mean model with panel count data through an unspecified link function. For estimation of model parameters, we consider a conditional expectation of least squares function to overcome the challenges from partly interval-censoring, and develop a two-stage estimation procedure by treating the distribution function of the failure time as a functional nuisance parameter and using the B-spline functions to approximate unknown baseline mean and link functions. Furthermore, we derive the overall convergence rate of the proposed estimators and establish the asymptotic normality of finite-dimensional estimator and functionals of infinite-dimensional estimator. The proposed estimation procedure is evaluated by extensive simulation studies, in which the finite-sample performances coincide with the theoretical results. We further illustrate our method with a longitudinal healthy longevity study and draw some insightful conclusions.
Asunto(s)
Estado de Salud , Simulación por ComputadorRESUMEN
This paper discusses regression analysis of interval-censored failure time data arising from semiparametric transformation models in the presence of missing covariates. Although some methods have been developed for the problem, they either apply only to limited situations or may have some computational issues. Corresponding to these, we propose a new and unified two-step inference procedure that can be easily implemented using the existing or standard software. The proposed method makes use of a set of working models to extract partial information from incomplete observations and yields a consistent estimator of regression parameters assuming missing at random. An extensive simulation study is conducted and indicates that it performs well in practical situations. Finally, we apply the proposed approach to an Alzheimer's Disease study that motivated this study.
Asunto(s)
Enfermedad de Alzheimer , Simulación por Computador , Modelos Estadísticos , Humanos , Análisis de Regresión , Interpretación Estadística de DatosRESUMEN
BACKGROUND: Hormone therapy (HT) use among menopausal women declined after negative information from the 2002 Women's Health Initiative (WHI) HT study. The 2017 post-intervention follow-up WHI study revealed that HT did not increase long-term mortality. However, studies on the effects of the updated WHI findings are lacking. Thus, we assessed the impact of the 2017 WHI findings on HT use in Taiwan. METHODS: We identified 1,869,050 women aged 50-60 years, between June and December 2017, from health insurance claims data to compare HT use in the 3 months preceding and following September 2017. To address the limitations associated with interval-censored data, we employed an emulated repeated cross-sectional design. Using logistic regression analysis, we evaluated the impact of the 2017 WHI study on menopausal symptom-related outpatient visits and HT use. In a scenario analysis, we examined the impact of the 2002 trial on HT use to validate our study design. RESULTS: Study participants' baseline characteristics before and after the 2017 WHI study were not significantly different. Logistic regressions demonstrated that the 2017 study had no significant effect on outpatient visits for menopause-related symptoms or HT use among women with outpatient visits. The scenario analysis confirmed the negative impact of the 2002 WHI trial on HT use. CONCLUSIONS: The 2017 WHI study did not demonstrate any impact on either menopause-related outpatient visits or HT use among middle-aged women in Taiwan. Our emulated cross-sectional study design may be employed in similar population-based policy intervention studies using interval-censored data.
Asunto(s)
Salud de la Mujer , Humanos , Femenino , Estudios Transversales , Persona de Mediana Edad , Taiwán , Terapia de Reemplazo de Estrógeno/estadística & datos numéricos , Menopausia , Terapia de Reemplazo de Hormonas/estadística & datos numéricosRESUMEN
The proportional hazards mixture cure model is a popular analysis method for survival data where a subgroup of patients are cured. When the data are interval-censored, the estimation of this model is challenging due to its complex data structure. In this article, we propose a computationally efficient semiparametric Bayesian approach, facilitated by spline approximation and Poisson data augmentation, for model estimation and inference with interval-censored data and a cure rate. The spline approximation and Poisson data augmentation greatly simplify the MCMC algorithm and enhance the convergence of the MCMC chains. The empirical properties of the proposed method are examined through extensive simulation studies and also compared with the R package "GORCure". The use of the proposed method is illustrated through analyzing a data set from the Aerobics Center Longitudinal Study.
Asunto(s)
Algoritmos , Modelos Estadísticos , Humanos , Teorema de Bayes , Estudios Longitudinales , Modelos de Riesgos Proporcionales , Simulación por ComputadorRESUMEN
Group variable selection is often required in many areas, and for this many methods have been developed under various situations. Unlike the individual variable selection, the group variable selection can select the variables in groups, and it is more efficient to identify both important and unimportant variables or factors by taking into account the existing group structure. In this paper, we consider the situation where one only observes interval-censored failure time data arising from the Cox model, for which there does not seem to exist an established method. More specifically, a penalized sieve maximum likelihood variable selection and estimation procedure is proposed and the oracle property of the proposed method is established. Also, an extensive simulation study is performed and suggests that the proposed approach works well in practical situations. An application of the method to a set of real data is provided.
Asunto(s)
Modelos de Riesgos Proporcionales , Funciones de Verosimilitud , Análisis de Regresión , Simulación por ComputadorRESUMEN
Current status data arise when each subject under study is examined only once at an observation time, and one only knows the failure status of the event of interest at the observation time rather than the exact failure time. Moreover, the obtained failure status is frequently subject to misclassification due to imperfect tests, yielding misclassified current status data. This article conducts regression analysis of such data with the semiparametric probit model, which serves as an important alternative to existing semiparametric models and has recently received considerable attention in failure time data analysis. We consider the nonparametric maximum likelihood estimation and develop an expectation-maximization (EM) algorithm by incorporating the generalized pool-adjacent-violators (PAV) algorithm to maximize the intractable likelihood function. The resulting estimators of regression parameters are shown to be consistent, asymptotically normal, and semiparametrically efficient. Furthermore, the numerical results in simulation studies indicate that the proposed method performs satisfactorily in finite samples and outperforms the naive method that ignores misclassification. We then apply the proposed method to a real dataset on chlamydia infection.
RESUMEN
Quarantine length for individuals who have been at risk for infection with SARS-CoV-2 has been based on estimates of the incubation time distribution. The time of infection is often not known exactly, yielding data with an interval censored time origin. We give a detailed account of the data structure, likelihood formulation and assumptions usually made in the literature: (i) the risk of infection is assumed constant on the exposure window and (ii) the incubation time follows a specific parametric distribution. The impact of these assumptions remains unclear, especially for the right tail of the distribution which informs quarantine policy. We quantified bias in percentiles by means of simulation studies that mimic reality as close as possible. If assumption (i) is not correct, then median and upper percentiles are affected similarly, whereas misspecification of the parametric approach (ii) mainly affects upper percentiles. The latter may yield considerable bias. We suggest a semiparametric method that provides more robust estimates without the need of a parametric choice. Additionally, we used a simulation study to evaluate a method that has been suggested if all infection times are left censored. It assumes that the width of the interval from infection to latest possible exposure follows a uniform distribution. This assumption gave biased results in the exponential phase of an outbreak. Our application to open source data suggests that focus should be on the level of information in the observations, as expressed by the width of exposure windows, rather than the number of observations.
Asunto(s)
COVID-19 , Humanos , COVID-19/epidemiología , SARS-CoV-2 , Probabilidad , Simulación por Computador , SesgoRESUMEN
BACKGROUND: Failure time data frequently occur in many medical studies and often accompany with various types of censoring. In some applications, left truncation may occur and can induce biased sampling, which makes the practical data analysis become more complicated. The existing analysis methods for left-truncated data have some limitations in that they either focus only on a special type of censored data or fail to flexibly utilize the distribution information of the truncation times for inference. Therefore, it is essential to develop a reliable and efficient method for the analysis of left-truncated failure time data with various types of censoring. METHOD: This paper concerns regression analysis of left-truncated failure time data with the proportional hazards model under various types of censoring mechanisms, including right censoring, interval censoring and a mixture of them. The proposed pairwise pseudo-likelihood estimation method is essentially built on a combination of the conditional likelihood and the pairwise likelihood that eliminates the nuisance truncation distribution function or avoids its estimation. To implement the presented method, a flexible EM algorithm is developed by utilizing the idea of self-consistent estimating equation. A main feature of the algorithm is that it involves closed-form estimators of the large-dimensional nuisance parameters and is thus computationally stable and reliable. In addition, an R package LTsurv is developed. RESULTS: The numerical results obtained from extensive simulation studies suggest that the proposed pairwise pseudo-likelihood method performs reasonably well in practical situations and is obviously more efficient than the conditional likelihood approach as expected. The analysis results of the MHCPS data with the proposed pairwise pseudo-likelihood method indicate that males have significantly higher risk of losing active life than females. In contrast, the conditional likelihood method recognizes this effect as non-significant, which is because the conditional likelihood method often loses some estimation efficiency compared with the proposed method. CONCLUSIONS: The proposed method provides a general and helpful tool to conduct the Cox's regression analysis of left-truncated failure time data under various types of censoring.
Asunto(s)
Funciones de Verosimilitud , Humanos , Interpretación Estadística de Datos , Modelos de Riesgos Proporcionales , Análisis de Regresión , Simulación por ComputadorRESUMEN
The case-cohort design was developed to reduce costs when disease incidence is low and covariates are difficult to obtain. However, most of the existing methods are for right-censored data and there exists only limited research on interval-censored data, especially on regression analysis of bivariate interval-censored data. Interval-censored failure time data frequently occur in many areas and a large literature on their analyses has been established. In this paper, we discuss the situation of bivariate interval-censored data arising from case-cohort studies. For the problem, a class of semiparametric transformation frailty models is presented and for inference, a sieve weighted likelihood approach is developed. The large sample properties, including the consistency of the proposed estimators and the asymptotic normality of the regression parameter estimators, are established. Moreover, a simulation is conducted to assess the finite sample performance of the proposed method and suggests that it performs well in practice.
Asunto(s)
Modelos Estadísticos , Humanos , Funciones de Verosimilitud , Simulación por Computador , Análisis de Regresión , Estudios de CohortesRESUMEN
The proportional hazards (PH) model is, arguably, the most popular model for the analysis of lifetime data arising from epidemiological studies, among many others. In such applications, analysts may be faced with censored outcomes and/or studies which institute enrollment criterion leading to left truncation. Censored outcomes arise when the event of interest is not observed but rather is known relevant to an observation time(s). Left truncated data occur in studies that exclude participants who have experienced the event prior to being enrolled in the study. If not accounted for, both of these features can lead to inaccurate inferences about the population under study. Thus, to overcome this challenge, herein we propose a novel unified PH model that can be used to accommodate both of these features. In particular, our approach can seamlessly analyze exactly observed failure times along with interval-censored observations, while aptly accounting for left truncation. To facilitate model fitting, an expectation-maximization algorithm is developed through the introduction of carefully structured latent random variables. To provide modeling flexibility, a monotone spline representation is used to approximate the cumulative baseline hazard function. The performance of our methodology is evaluated through a simulation study and is further illustrated through the analysis of two motivating data sets; one that involves child mortality in Nigeria and the other prostate cancer.
Asunto(s)
Algoritmos , Masculino , Niño , Humanos , Modelos de Riesgos Proporcionales , Simulación por ComputadorRESUMEN
Interval-censored data analysis is important in biomedical statistics for any type of time-to-event response where the time of response is not known exactly, but rather only known to occur between two assessment times. Many clinical trials and longitudinal studies generate interval-censored data; one common example occurs in medical studies that entail periodic follow-up. In this article, we propose a survival forest method for interval-censored data based on the conditional inference framework. We describe how this framework can be adapted to the situation of interval-censored data. We show that the tuning parameters have a non-negligible effect on the survival forest performance and guidance is provided on how to tune the parameters in a data-dependent way to improve the overall performance of the method. Using Monte Carlo simulations, we find that the proposed survival forest is at least as effective as a survival tree method when the underlying model has a tree structure, performs similarly to an interval-censored Cox proportional hazards model fit when the true relationship is linear, and outperforms the survival tree method and Cox model when the true relationship is nonlinear. We illustrate the application of the method on a tooth emergence data set.
RESUMEN
This paper discusses variable selection in the context of joint analysis of longitudinal data and failure time data. A large literature has been developed for either variable selection or the joint analysis but there exists only limited literature for variable selection in the context of the joint analysis when failure time data are right censored. Corresponding to this, we will consider the situation where instead of right-censored data, one observes interval-censored failure time data, a more general and commonly occurring form of failure time data. For the problem, a class of penalized likelihood-based procedures will be developed for simultaneous variable selection and estimation of relevant covariate effects for both longitudinal and failure time variables of interest. In particular, a Monte Carlo EM (MCEM) algorithm is presented for the implementation of the proposed approach. The proposed method allows for the number of covariates to be diverging with the sample size and is shown to have the oracle property. An extensive simulation study is conducted to assess the finite sample performance of the proposed approach and indicates that it works well in practical situations. An application is also provided.
Asunto(s)
Algoritmos , Proyectos de Investigación , Simulación por Computador , Funciones de Verosimilitud , Modelos Estadísticos , Tamaño de la MuestraRESUMEN
BACKGROUND : In spite of the global reduction of 21% in malaria incidence between 2010 and 2015, the disease still threatens many lives of children and pregnant mothers in African countries. A correct assessment and evaluation of the impact of malaria control strategies still remains quintessential in order to eliminate the disease and its burden. Malaria follow-up studies typically involve routine visits at pre-scheduled time points and/or clinical visits whenever individuals experience malaria-like symptoms. In the latter case, infection triggers outcome assessment, thereby leading to outcome-dependent sampling (ODS). Commonly used methods to analyze such longitudinal data ignore ODS and potentially lead to biased estimates of malaria-specific transmission parameters, hence, inducing an incorrect assessment and evaluation of malaria control strategies. METHODS : In this paper, a new method is proposed to handle ODS by use of a joint model for the longitudinal binary outcome measured at routine visits and the clinical event times. The methodology is applied to malaria parasitaemia data from a cohort of [Formula: see text] Ugandan children aged 0.5-10 years from 3 regions (Walukuba-300 children, Kihihi-355 children and Nagongera-333 children) with varying transmission intensities (entomological inoculation rate equal to 2.8, 32 and 310 infectious bites per unit year, respectively) collected between 2011-2014. RESULTS : The results indicate that malaria parasite prevalence and force of infection (FOI) increase with age in the region of high malaria intensity with highest FOI in age group 5-10 years. For the region of medium intensity, the prevalence slightly increases with age and the FOI for the routine process is highest in age group 5-10 years, yet for the clinical infections, the FOI gradually decreases with increasing age. For the region with low intensity, both the prevalence and FOI peak at the age of 1 year after which the former remains constant with age yet the latter suddenly decreases with age for the clinically observed infections. CONCLUSION : Malaria parasite prevalence and FOI increase with age in the region of high malaria intensity. In all study sites, both the prevalence and FOI are highest among previously asymptomatic children and lowest among their symptomatic counterparts. Using a simulation study inspired by the malaria data at hand, the proposed methodology shows to have the smallest bias, especially when consecutive positive malaria parasitaemia presence results within a time period of 35 days were considered to be due to the same infection.
Asunto(s)
Malaria , Niño , Humanos , Estudios de Cohortes , Malaria/prevención & control , Parasitemia/epidemiología , Incidencia , PrevalenciaRESUMEN
We consider efficient estimation of flexible transformation models with interval-censored data. To reduce the dimension of semiparametric models, the unknown monotone function is approximated via a monotone B-spline. A penalization technique is used to provide computationally efficient estimation of all parameters. To accomplish model fitting and inference, an easy to implement nested iterative expectation-maximization (EM) algorithm is developed for estimation, and a simple variance-covariance estimation approach is proposed which makes large-sample inference for the regression parameters possible. Theoretically, we show that the estimator of the unknown monotone increasing function achieves the optimal rate of convergence, and the estimators of the regression parameters are asymptotically normal and efficient under the appropriate selection of the order of the smoothing parameter and the knots of the spline space. The proposed penalized procedure is assessed through extensive numerical experiments and implemented in R package PenIC. The proposed methodology is further illustrated via a signal tandmobiel study.
Asunto(s)
Algoritmos , Simulación por Computador , HumanosRESUMEN
In many scientific fields, partly interval-censored data, which consist of exactly observed and interval-censored observations on the failure time of interest, appear frequently. However, methodological developments in the analysis of partly interval-censored data are relatively limited and have mainly focused on additive or proportional hazards models. The general linear transformation model provides a highly flexible modeling framework that includes several familiar survival models as special cases. Despite such nice features, the inference procedure for this class of models has not been developed for partly interval-censored data. We propose a fully Bayesian approach coped with efficient Markov chain Monte Carlo methods to fill this gap. A four-stage data augmentation procedure is introduced to tackle the challenges presented by the complex model and data structure. The proposed method is easy to implement and computationally attractive. The empirical performance of the proposed method is evaluated through two simulation studies, and the model is then applied to a dental health study.
Asunto(s)
Teorema de Bayes , Simulación por Computador , Humanos , Cadenas de Markov , Método de Montecarlo , Modelos de Riesgos ProporcionalesRESUMEN
To compare two or more survival distributions with interval-censored data, various nonparametric tests have been proposed. Some are based on the G ρ $$ {G}^{\rho } $$ -family introduced by Harrington and Fleming (1991) that allows flexibility for situations in which the hazard ratio decreases monotonically to unity. However, it is unclear how to choose the appropriate value of the parameter ρ $$ \rho $$ . In this work, we propose a novel linear rank-type test for analyzing interval-censored data that derived from a proportional reversed hazard model. We show its relationship with decreasing hazard ratio. This test statistic provides an alternative to the G ρ $$ {G}^{\rho } $$ -based test statistics by bypassing the choice of the ρ $$ \rho $$ parameter. Simulation results show its good behavior. Two studies on breast cancer and drug users illustrate its practical uses and highlight findings that would have been overlooked if other tests had been used. The test is easy to implement with standard software and can be used for a wide range of situations with interval-censored data to test the equality of survival distributions between two or more independent groups.
Asunto(s)
Programas Informáticos , Estudios de Cohortes , Simulación por Computador , Humanos , Modelos de Riesgos Proporcionales , Análisis de SupervivenciaRESUMEN
BACKGROUND: To optimize colorectal cancer (CRC) screening and surveillance, information regarding the time-dependent risk of advanced adenomas (AA) to develop into CRC is crucial. However, since AA are removed after diagnosis, the time from AA to CRC cannot be observed in an ethically acceptable manner. We propose a statistical method to indirectly infer this time in a progressive three-state disease model using surveillance data. METHODS: Sixteen models were specified, with and without covariates. Parameters of the parametric time-to-event distributions from the adenoma-free state (AF) to AA and from AA to CRC were estimated simultaneously, by maximizing the likelihood function. Model performance was assessed via simulation. The methodology was applied to a random sample of 878 individuals from a Norwegian adenoma cohort. RESULTS: Estimates of the parameters of the time distributions are consistent and the 95% confidence intervals (CIs) have good coverage. For the Norwegian sample (AF: 78%, AA: 20%, CRC: 2%), a Weibull model for both transition times was selected as the final model based on information criteria. The mean time among those who have made the transition to CRC since AA onset within 50 years was estimated to be 4.80 years (95% CI: 0; 7.61). The 5-year and 10-year cumulative incidence of CRC from AA was 13.8% (95% CI: 7.8%;23.8%) and 15.4% (95% CI: 8.2%;34.0%), respectively. CONCLUSIONS: The time-dependent risk from AA to CRC is crucial to explain differences in the outcomes of microsimulation models used for the optimization of CRC prevention. Our method allows for improving models by the inclusion of data-driven time distributions.
Asunto(s)
Adenoma , Neoplasias Colorrectales , Adenoma/diagnóstico , Adenoma/epidemiología , Colonoscopía , Neoplasias Colorrectales/diagnóstico , Neoplasias Colorrectales/epidemiología , Detección Precoz del Cáncer/métodos , Humanos , Incidencia , Funciones de VerosimilitudRESUMEN
Interval-censored data occur in a study where the exact event time of each participant is not observed but it is known to be within a certain time interval. Multiple tests were proposed for such data, including the logrank test by Sun, the proportional hazard test by Finkelstein, and the Wilcoxon-type test by Peto and Peto. We propose sample size calculations based on these tests for a parallel one-stage or two-stage design. When the proportional hazard assumption is met, the proportional hazard test and the logrank test need smaller sample sizes than the Wilcoxon-type test, and the sample size savings are substantial. But this trend is reversed when the proportional hazard assumption does not hold, and the sample size savings using the Wilcoxon-type test are sizable. An example from a lung cancer clinical trial is used to illustrate the application of the proposed sample size calculations.
Asunto(s)
Modelos de Riesgos Proporcionales , Simulación por Computador , Humanos , Tamaño de la Muestra , Análisis de Supervivencia , Factores de TiempoRESUMEN
This paper discusses the fitting of the proportional hazards model to interval-censored failure time data with missing covariates. Many authors have discussed the problem when complete covariate information is available or the missing is completely at random. In contrast to this, we will focus on the situation where the missing is at random. For the problem, a sieve maximum likelihood estimation approach is proposed with the use of I-spline functions to approximate the unknown cumulative baseline hazard function in the model. For the implementation of the proposed method, we develop an EM algorithm based on a two-stage data augmentation. Furthermore, we show that the proposed estimators of regression parameters are consistent and asymptotically normal. The proposed approach is then applied to a set of the data concerning Alzheimer Disease that motivated this study.