RESUMO
Quantitative bias analysis (QBA) permits assessment of the expected impact of various imperfections of the available data on the results and conclusions of a particular real-world study. This article extends QBA methodology to multivariable time-to-event analyses with right-censored endpoints, possibly including time-varying exposures or covariates. The proposed approach employs data-driven simulations, which preserve important features of the data at hand while offering flexibility in controlling the parameters and assumptions that may affect the results. First, the steps required to perform data-driven simulations are described, and then two examples of real-world time-to-event analyses illustrate their implementation and the insights they may offer. The first example focuses on the omission of an important time-invariant predictor of the outcome in a prognostic study of cancer mortality, and permits separating the expected impact of confounding bias from non-collapsibility. The second example assesses how imprecise timing of an interval-censored event - ascertained only at sparse times of clinic visits - affects its estimated association with a time-varying drug exposure. The simulation results also provide a basis for comparing the performance of two alternative strategies for imputing the unknown event times in this setting. The R scripts that permit the reproduction of our examples are provided.
RESUMO
Multivariate panel count data arise when there are multiple types of recurrent events, and the observation for each study subject consists of the number of recurrent events of each type between two successive examinations. We formulate the effects of potentially time-dependent covariates on multiple types of recurrent events through proportional rates models, while leaving the dependence structures of the related recurrent events completely unspecified. We employ nonparametric maximum pseudo-likelihood estimation under the working assumptions that all types of events are independent and each type of event is a nonhomogeneous Poisson process, and we develop a simple and stable EM-type algorithm. We show that the resulting estimators of the regression parameters are consistent and asymptotically normal, with a covariance matrix that can be estimated consistently by a sandwich estimator. In addition, we develop a class of graphical and numerical methods for checking the adequacy of the fitted model. Finally, we evaluate the performance of the proposed methods through simulation studies and analysis of a skin cancer clinical trial.
Assuntos
Neoplasias Cutâneas , Humanos , Simulação por Computador , Modelos Estatísticos , Neoplasias Cutâneas/epidemiologia , Ensaios Clínicos como AssuntoRESUMO
Interval-censored failure time data frequently arise in various scientific studies where each subject experiences periodical examinations for the occurrence of the failure event of interest, and the failure time is only known to lie in a specific time interval. In addition, collected data may include multiple observed variables with a certain degree of correlation, leading to severe multicollinearity issues. This work proposes a factor-augmented transformation model to analyze interval-censored failure time data while reducing model dimensionality and avoiding multicollinearity elicited by multiple correlated covariates. We provide a joint modeling framework by comprising a factor analysis model to group multiple observed variables into a few latent factors and a class of semiparametric transformation models with the augmented factors to examine their and other covariate effects on the failure event. Furthermore, we propose a nonparametric maximum likelihood estimation approach and develop a computationally stable and reliable expectation-maximization algorithm for its implementation. We establish the asymptotic properties of the proposed estimators and conduct simulation studies to assess the empirical performance of the proposed method. An application to the Alzheimer's Disease Neuroimaging Initiative (ADNI) study is provided. An R package ICTransCFA is also available for practitioners. Data used in preparation of this article were obtained from the ADNI database.
Assuntos
Doença de Alzheimer , Simulação por Computador , Modelos Estatísticos , Humanos , Funções Verossimilhança , Algoritmos , Neuroimagem , Análise Fatorial , Interpretação Estatística de Dados , Fatores de TempoRESUMO
Many longitudinal studies are designed to monitor participants for major events related to the progression of diseases. Data arising from such longitudinal studies are usually subject to interval censoring since the events are only known to occur between two monitoring visits. In this work, we propose a new method to handle interval-censored multistate data within a proportional hazards model framework where the hazard rate of events is modeled by a nonparametric function of time and the covariates affect the hazard rate proportionally. The main idea of this method is to simplify the likelihood functions of a discrete-time multistate model through an approximation and the application of data augmentation techniques, where the assumed presence of censored information facilitates a simpler parameterization. Then the expectation-maximization algorithm is used to estimate the parameters in the model. The performance of the proposed method is evaluated by numerical studies. Finally, the method is employed to analyze a dataset on tracking the advancement of coronary allograft vasculopathy following heart transplantation.
Assuntos
Algoritmos , Transplante de Coração , Modelos de Riscos Proporcionais , Humanos , Funções Verossimilhança , Transplante de Coração/estatística & dados numéricos , Estudos Longitudinais , Simulação por Computador , Modelos Estatísticos , Interpretação Estatística de DadosRESUMO
In studies that assess disease status periodically, time of disease onset is interval censored between visits. Participants who die between two visits may have unknown disease status after their last visit. In this work, we consider an additional scenario where diagnosis requires two consecutive positive tests, such that disease status can also be unknown at the last visit preceding death. We show that this impacts the choice of censoring time for those who die without an observed disease diagnosis. We investigate two classes of models that quantify the effect of risk factors on disease outcome: a Cox proportional hazards model with death as a competing risk and an illness death model that treats disease as a possible intermediate state. We also consider four censoring strategies: participants without observed disease are censored at death (Cox model only), the last visit, the last visit with a negative test, or the second last visit. We evaluate the performance of model and censoring strategy combinations on simulated data with a binary risk factor and illustrate with a real data application. We find that the illness death model with censoring at the second last visit shows the best performance in all simulation settings. Other combinations show bias that varies in magnitude and direction depending on the differential mortality between diseased and disease-free subjects, the gap between visits, and the choice of the censoring time.
Assuntos
Modelos de Riscos Proporcionais , Humanos , Simulação por Computador , Fatores de RiscoRESUMO
BACKGROUND: Estimation of the SARS-CoV-2 incubation time distribution is hampered by incomplete data about infection. We discuss two biases that may result from incorrect handling of such data. Notified cases may recall recent exposures more precisely (differential recall). This creates bias if the analysis is restricted to observations with well-defined exposures, as longer incubation times are more likely to be excluded. Another bias occurred in the initial estimates based on data concerning travellers from Wuhan. Only individuals who developed symptoms after their departure were included, leading to under-representation of cases with shorter incubation times (left truncation). This issue was not addressed in the analyses performed in the literature. METHODS: We performed simulations and provide a literature review to investigate the amount of bias in estimated percentiles of the SARS-CoV-2 incubation time distribution. RESULTS: Depending on the rate of differential recall, restricting the analysis to a subset of narrow exposure windows resulted in underestimation in the median and even more in the 95th percentile. Failing to account for left truncation led to an overestimation of multiple days in both the median and the 95th percentile. CONCLUSION: We examined two overlooked sources of bias concerning exposure information that the researcher engaged in incubation time estimation needs to be aware of.
Assuntos
Viés , COVID-19 , Período de Incubação de Doenças Infecciosas , SARS-CoV-2 , Humanos , COVID-19/epidemiologia , Simulação por ComputadorRESUMO
Breast cancer is the most common cancer in women. Previous studies have investigated estimating and predicting the proportional hazard rates and survival in breast cancer. This study deals with predicting accelerated hazards (AH) rate based on age categories in breast cancer patients using deep learning methods. The AH has a time-dependent structure whose rate changes according to time and variable effects. We have collected data related to 1225 female patients with breast cancer at the Mandarin University of Medical Sciences. The patients' demographic and clinical characteristics including family history, age, history of tobacco use, hysterectomy, first menstruation age, gravida, number of breastfeeding, disease grade, marital status, and survival status have been recorded. Initially, we dealt with predicting three age groups of patients: ≤ 40, 41-60, and ≥ 61 years. Then, the prediction of accelerated risk value based on age categories for each breast cancer patient through deep learning and the importance of variables using LightGBM is discussed. Improving clinical management and treatment of breast cancer requires advanced methods such as time-dependent AH calculation. When the behavioral effect is assumed as a time scale change between hazard functions, the AH model is more appropriate for randomized clinical trials. The study results demonstrate the proper performance of the proposed model for predicting AH by age categories based on breast cancer patients' demographic and clinical characteristics.
Assuntos
Neoplasias da Mama , Aprendizado Profundo , Humanos , Feminino , Pessoa de Meia-Idade , Adulto , Fatores Etários , Idoso , Modelos de Riscos Proporcionais , Fatores de TempoRESUMO
In practical survival analysis, the situation of no event for a patient can arise even after a long period of waiting time, which means a portion of the population may never experience the event of interest. Under this circumstance, one remedy is to adopt a mixture cure Cox model to analyze the survival data. However, if there clearly exhibits an acceleration (or deceleration) factor among their survival times, then an accelerated failure time (AFT) model will be preferred, leading to a mixture cure AFT model. In this paper, we consider a penalized likelihood method to estimate the mixture cure semiparametric AFT models, where the unknown baseline hazard is approximated using Gaussian basis functions. We allow partly interval-censored survival data which can include event times and left-, right-, and interval-censoring times. The penalty function helps to achieve a smooth estimate of the baseline hazard function. We will also provide asymptotic properties to the estimates so that inferences can be made on regression parameters and hazard-related quantities. Simulation studies are conducted to evaluate the model performance, which includes a comparative study with an existing method from the smcure R package. The results show that our proposed penalized likelihood method has acceptable performance in general and produces less bias when faced with the identifiability issue compared to smcure. To illustrate the application of our method, a real case study involving melanoma recurrence is conducted and reported. Our model is implemented in our R package aftQnp which is available from https://github.com/Isabellee4555/aftQnP.
Assuntos
Biometria , Modelos Estatísticos , Humanos , Biometria/métodos , Análise de Sobrevida , Fatores de Tempo , Modelos de Riscos Proporcionais , Funções Verossimilhança , Melanoma/tratamento farmacológicoRESUMO
Competing risk data are frequently interval-censored, that is, the exact event time is not observed but only known to lie between two examination time points such as clinic visits. In addition to interval censoring, another common complication is that the event type is missing for some study participants. In this article, we propose an augmented inverse probability weighted sieve maximum likelihood estimator for the analysis of interval-censored competing risk data in the presence of missing event types. The estimator imposes weaker than usual missing at random assumptions by allowing for the inclusion of auxiliary variables that are potentially associated with the probability of missingness. The proposed estimator is shown to be doubly robust, in the sense that it is consistent even if either the model for the probability of missingness or the model for the probability of the event type is misspecified. Extensive Monte Carlo simulation studies show good performance of the proposed method even under a large amount of missing event types. The method is illustrated using data from an HIV cohort study in sub-Saharan Africa, where a significant portion of events types is missing. The proposed method can be readily implemented using the new function ciregic_aipw in the R package intccr.
Assuntos
Incidência , Estudos de Coortes , Simulação por Computador , Humanos , Método de Monte Carlo , ProbabilidadeRESUMO
We develop methods for assessing the predictive accuracy of a given event time model when the validation sample is comprised of case $K$ interval-censored data. An imputation-based, an inverse probability weighted (IPW), and an augmented inverse probability weighted (AIPW) estimator are developed and evaluated for the mean prediction error and the area under the receiver operating characteristic curve when the goal is to predict event status at a landmark time. The weights used for the IPW and AIPW estimators are obtained by fitting a multistate model which jointly considers the event process, the recurrent assessment process, and loss to follow-up. We empirically investigate the performance of the proposed methods and illustrate their application in the context of a motivating rheumatology study in which human leukocyte antigen markers are used to predict disease progression status in patients with psoriatic arthritis.
Assuntos
Curva ROC , Biomarcadores , Humanos , ProbabilidadeRESUMO
Copula is a popular method for modeling the dependence among marginal distributions in multivariate censored data. As many copula models are available, it is essential to check if the chosen copula model fits the data well for analysis. Existing approaches to testing the fitness of copula models are mainly for complete or right-censored data. No formal goodness-of-fit (GOF) test exists for interval-censored or recurrent events data. We develop a general GOF test for copula-based survival models using the information ratio (IR) to address this research gap. It can be applied to any copula family with a parametric form, such as the frequently used Archimedean, Gaussian, and D-vine families. The test statistic is easy to calculate, and the test procedure is straightforward to implement. We establish the asymptotic properties of the test statistic. The simulation results show that the proposed test controls the type-I error well and achieves adequate power when the dependence strength is moderate to high. Finally, we apply our method to test various copula models in analyzing multiple real datasets. Our method consistently separates different copula models for all these datasets in terms of model fitness.
Assuntos
Modelos Estatísticos , Humanos , Simulação por ComputadorRESUMO
Alzheimer's disease (AD) is a progressive and polygenic disorder that affects millions of individuals each year. Given that there have been few effective treatments yet for AD, it is highly desirable to develop an accurate model to predict the full disease progression profile based on an individual's genetic characteristics for early prevention and clinical management. This work uses data composed of all four phases of the Alzheimer's Disease Neuroimaging Initiative (ADNI) study, including 1740 individuals with 8 million genetic variants. We tackle several challenges in this data, characterized by large-scale genetic data, interval-censored outcome due to intermittent assessments, and left truncation in one study phase (ADNIGO). Specifically, we first develop a semiparametric transformation model on interval-censored and left-truncated data and estimate parameters through a sieve approach. Then we propose a computationally efficient generalized score test to identify variants associated with AD progression. Next, we implement a novel neural network on interval-censored data (NN-IC) to construct a prediction model using top variants identified from the genome-wide test. Comprehensive simulation studies show that the NN-IC outperforms several existing methods in terms of prediction accuracy. Finally, we apply the NN-IC to the full ADNI data and successfully identify subgroups with differential progression risk profiles. Data used in the preparation of this article were obtained from the ADNI database.
Assuntos
Doença de Alzheimer , Humanos , Doença de Alzheimer/diagnóstico por imagem , Doença de Alzheimer/genética , Progressão da Doença , Neuroimagem/métodos , Resultado do Tratamento , Redes Neurais de ComputaçãoRESUMO
The restricted mean survival time (RMST) evaluates the expectation of survival time truncated by a prespecified time point, because the mean survival time in the presence of censoring is typically not estimable. The frequentist inference procedure for RMST has been widely advocated for comparison of two survival curves, while research from the Bayesian perspective is rather limited. For the RMST of both right- and interval-censored data, we propose Bayesian nonparametric estimation and inference procedures. By assigning a mixture of Dirichlet processes (MDP) prior to the distribution function, we can estimate the posterior distribution of RMST. We also explore another Bayesian nonparametric approach using the Dirichlet process mixture model and make comparisons with the frequentist nonparametric method. Simulation studies demonstrate that the Bayesian nonparametric RMST under diffuse MDP priors leads to robust estimation and under informative priors it can incorporate prior knowledge into the nonparametric estimator. Analysis of real trial examples demonstrates the flexibility and interpretability of the Bayesian nonparametric RMST for both right- and interval-censored data.
Assuntos
Teorema de Bayes , Taxa de Sobrevida , Simulação por Computador , Análise de SobrevidaRESUMO
The rapid acceleration of genetic data collection in biomedical settings has recently resulted in the rise of genetic compendiums filled with rich longitudinal disease data. One common feature of these data sets is their plethora of interval-censored outcomes. However, very few tools are available for the analysis of genetic data sets with interval-censored outcomes, and in particular, there is a lack of methodology available for set-based inference. Set-based inference is used to associate a gene, biological pathway, or other genetic construct with outcomes and is one of the most popular strategies in genetics research. This work develops three such tests for interval-censored settings beginning with a variance components test for interval-censored outcomes, the interval-censored sequence kernel association test (ICSKAT). We also provide the interval-censored version of the Burden test, and then we integrate ICSKAT and Burden to construct the interval censored sequence kernel association test-optimal (ICSKATO) combination. These tests unlock set-based analysis of interval-censored data sets with analogs of three highly popular set-based tools commonly applied to continuous and binary outcomes. Simulation studies illustrate the advantages of the developed methods over ad hoc alternatives, including protection of the type I error rate at very low levels and increased power. The proposed approaches are applied to the investigation that motivated this study, an examination of the genes associated with bone mineral density deficiency and fracture risk.
Assuntos
Simulação por Computador , Estudos de Associação Genética , Interpretação Estatística de DadosRESUMO
We propose a unified framework for likelihood-based regression modeling when the response variable has finite support. Our work is motivated by the fact that, in practice, observed data are discrete and bounded. The proposed methods assume a model which includes models previously considered for interval-censored variables with log-concave distributions as special cases. The resulting log-likelihood is concave, which we use to establish asymptotic normality of its maximizer as the number of observations n tends to infinity with the number of parameters d fixed, and rates of convergence of L1 -regularized estimators when the true parameter vector is sparse and d and n both tend to infinity with log ( d ) / n â 0 $\log (d) / n \rightarrow 0$ . We consider an inexact proximal Newton algorithm for computing estimates and give theoretical guarantees for its convergence. The range of possible applications is wide, including but not limited to survival analysis in discrete time, the modeling of outcomes on scored surveys and questionnaires, and, more generally, interval-censored regression. The applicability and usefulness of the proposed methods are illustrated in simulations and data examples.
Assuntos
Algoritmos , Funções Verossimilhança , Análise de Regressão , Simulação por Computador , Análise de SobrevidaRESUMO
Assessing causal treatment effect on a time-to-event outcome is of key interest in many scientific investigations. Instrumental variable (IV) is a useful tool to mitigate the impact of endogenous treatment selection to attain unbiased estimation of causal treatment effect. Existing development of IV methodology, however, has not attended to outcomes subject to interval censoring, which are ubiquitously present in studies with intermittent follow-up but are challenging to handle in terms of both theory and computation. In this work, we fill in this important gap by studying a general class of causal semiparametric transformation models with interval-censored data. We propose a nonparametric maximum likelihood estimator of the complier causal treatment effect. Moreover, we design a reliable and computationally stable expectation-maximization (EM) algorithm, which has a tractable objective function in the maximization step via the use of Poisson latent variables. The asymptotic properties of the proposed estimators, including the consistency, asymptotic normality, and semiparametric efficiency, are established with empirical process techniques. We conduct extensive simulation studies and an application to a colorectal cancer screening data set, showing satisfactory finite-sample performance of the proposed method as well as its prominent advantages over naive methods.
Assuntos
Algoritmos , Projetos de Pesquisa , Funções Verossimilhança , Simulação por Computador , CausalidadeRESUMO
Important scientific insights into chronic diseases affecting several organ systems can be gained from modeling spatial dependence of sites experiencing damage progression. We describe models and methods for studying spatial dependence of joint damage in psoriatic arthritis (PsA). Since a large number of joints may remain unaffected even among individuals with a long disease history, spatial dependence is first modeled in latent joint-specific indicators of susceptibility. Among susceptible joints, a Gaussian copula is adopted for dependence modeling of times to damage. Likelihood and composite likelihoods are developed for settings, where individuals are under intermittent observation and progression times are subject to type K interval censoring. Two-stage estimation procedures help mitigate the computational burden arising when a large number of processes (i.e., joints) are under consideration. Simulation studies confirm that the proposed methods provide valid inference, and an application to the motivating data from the University of Toronto Psoriatic Arthritis Clinic yields important insights which can help physicians distinguish PsA from arthritic conditions with different dependence patterns.
Assuntos
Artrite Psoriásica , Humanos , Doença Crônica , Probabilidade , Simulação por ComputadorRESUMO
The Botswana Combination Prevention Project was a cluster-randomized HIV prevention trial whose follow-up period coincided with Botswana's national adoption of a universal test and treat strategy for HIV management. Of interest is whether, and to what extent, this change in policy modified the preventative effects of the study intervention. To address such questions, we adopt a stratified proportional hazards model for clustered interval-censored data with time-dependent covariates and develop a composite expectation maximization algorithm that facilitates estimation of model parameters without placing parametric assumptions on either the baseline hazard functions or the within-cluster dependence structure. We show that the resulting estimators for the regression parameters are consistent and asymptotically normal. We also propose and provide theoretical justification for the use of the profile composite likelihood function to construct a robust sandwich estimator for the variance. We characterize the finite-sample performance and robustness of these estimators through extensive simulation studies. Finally, we conclude by applying this stratified proportional hazards model to a re-analysis of the Botswana Combination Prevention Project, with the national adoption of a universal test and treat strategy now modeled as a time-dependent covariate.
Assuntos
Síndrome da Imunodeficiência Adquirida , Algoritmos , Humanos , Modelos de Riscos Proporcionais , Simulação por Computador , Funções Verossimilhança , Modelos EstatísticosRESUMO
In this article, a competitive risk survival model is considered in which the initial number of risks, assumed to follow a negative binomial distribution, is subject to a destructive mechanism. Assuming the population of interest to have a cure component, the form of the data as interval-censored, and considering both the number of initial risks and risks remaining active after destruction to be missing data, we develop two distinct estimation algorithms for this model. Making use of the conditional distributions of the missing data, we develop an expectation maximization (EM) algorithm, in which the conditional expected complete log-likelihood function is decomposed into simpler functions which are then maximized independently. A variation of the EM algorithm, called the stochastic EM (SEM) algorithm, is also developed with the goal of avoiding the calculation of complicated expectations and improving performance at parameter recovery. A Monte Carlo simulation study is carried out to evaluate the performance of both estimation methods through calculated bias, root mean square error, and coverage probability of the asymptotic confidence interval. We demonstrate the proposed SEM algorithm as the preferred estimation method through simulation and further illustrate the advantage of the SEM algorithm, as well as the use of a destructive model, with data from a children's mortality study.
Assuntos
Algoritmos , Modelos Estatísticos , Criança , Humanos , Funções Verossimilhança , Simulação por Computador , Método de Monte CarloRESUMO
Panel count data and interval-censored data are two types of incomplete data that often occur in event history studies. Almost all existing statistical methods are developed for their separate analysis. In this paper, we investigate a more general situation where a recurrent event process and an interval-censored failure event occur together. To intuitively and clearly explain the relationship between the recurrent current process and failure event, we propose a failure time-dependent mean model through a completely unspecified link function. To overcome the challenges arising from the blending of nonparametric components and parametric regression coefficients, we develop a two-stage conditional expected likelihood-based estimation procedure. We establish the consistency, the convergence rate and the asymptotic normality of the proposed two-stage estimator. Furthermore, we construct a class of two-sample tests for comparison of mean functions from different groups. The proposed methods are evaluated by extensive simulation studies and are illustrated with the skin cancer data that motivated this study.