RESUMO
Missing values (MVs) can adversely impact data analysis and machine-learning model development. We propose a novel mixed-model method for missing value imputation (MVI). This method, ProJect (short for Protein inJection), is a powerful and meaningful improvement over existing MVI methods such as Bayesian principal component analysis (PCA), probabilistic PCA, local least squares and quantile regression imputation of left-censored data. We rigorously tested ProJect on various high-throughput data types, including genomics and mass spectrometry (MS)-based proteomics. Specifically, we utilized renal cancer (RC) data acquired using DIA-SWATH, ovarian cancer (OC) data acquired using DIA-MS, bladder (BladderBatch) and glioblastoma (GBM) microarray gene expression dataset. Our results demonstrate that ProJect consistently performs better than other referenced MVI methods. It achieves the lowest normalized root mean square error (on average, scoring 45.92% less error in RC_C, 27.37% in RC_full, 29.22% in OC, 23.65% in BladderBatch and 20.20% in GBM relative to the closest competing method) and the Procrustes sum of squared error (Procrustes SS) (exhibits 79.71% less error in RC_C, 38.36% in RC full, 18.13% in OC, 74.74% in BladderBatch and 30.79% in GBM compared to the next best method). ProJect also leads with the highest correlation coefficient among all types of MV combinations (0.64% higher in RC_C, 0.24% in RC full, 0.55% in OC, 0.39% in BladderBatch and 0.27% in GBM versus the second-best performing method). ProJect's key strength is its ability to handle different types of MVs commonly found in real-world data. Unlike most MVI methods that are designed to handle only one type of MV, ProJect employs a decision-making algorithm that first determines if an MV is missing at random or missing not at random. It then employs targeted imputation strategies for each MV type, resulting in more accurate and reliable imputation outcomes. An R implementation of ProJect is available at https://github.com/miaomiao6606/ProJect.
Assuntos
Algoritmos , Genômica , Teorema de Bayes , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Espectrometria de Massas/métodosRESUMO
Lipidomic data often exhibit missing data points, which can be categorized as missing completely at random (MCAR), missing at random, or missing not at random (MNAR). In order to utilize statistical methods that require complete datasets or to improve the identification of potential effects in statistical comparisons, imputation techniques can be employed. In this study, we investigate commonly used methods such as zero, half-minimum, mean, and median imputation, as well as more advanced techniques such as k-nearest neighbor and random forest imputation. We employ a combination of simulation-based approaches and application to real datasets to assess the performance and effectiveness of these methods. Shotgun lipidomics datasets exhibit high correlations and missing values, often due to low analyte abundance, characterized as MNAR. In this context, k-nearest neighbor approaches based on correlation and truncated normal distributions demonstrate best performance. Importantly, both methods can effectively impute missing values independent of the type of missingness, the determination of which is nearly impossible in practice. The imputation methods still control the type I error rate.
Assuntos
Lipidômica , Lipidômica/métodos , Humanos , Algoritmos , Lipídeos/análise , Interpretação Estatística de DadosRESUMO
Studies of memory trajectories using longitudinal data often result in highly nonrepresentative samples due to selective study enrollment and attrition. An additional bias comes from practice effects that result in improved or maintained performance due to familiarity with test content or context. These challenges may bias study findings and severely distort the ability to generalize to the target population. In this study, we propose an approach for estimating the finite population mean of a longitudinal outcome conditioning on being alive at a specific time point. We develop a flexible Bayesian semiparametric predictive estimator for population inference when longitudinal auxiliary information is known for the target population. We evaluate the sensitivity of the results to untestable assumptions and further compare our approach to other methods used for population inference in a simulation study. The proposed approach is motivated by 15-year longitudinal data from the Betula longitudinal cohort study. We apply our approach to estimate lifespan trajectories in episodic memory, with the aim to generalize findings to a target population.
Assuntos
Modelos Estatísticos , Humanos , Estudos Longitudinais , Teorema de Bayes , Estudos de Coortes , Simulação por ComputadorRESUMO
BACKGROUND: Missing data are ubiquitous in randomised controlled trials. Although sensitivity analyses for different missing data mechanisms (missing at random vs. missing not at random) are widely recommended, they are rarely conducted in practice. The aim of the present study was to demonstrate sensitivity analyses for different assumptions regarding the missing data mechanism for randomised controlled trials using latent growth modelling (LGM). METHODS: Data from a randomised controlled brief alcohol intervention trial was used. The sample included 1646 adults (56% female; mean age = 31.0 years) from the general population who had received up to three individualized alcohol feedback letters or assessment-only. Follow-up interviews were conducted after 12 and 36 months via telephone. The main outcome for the analysis was change in alcohol use over time. A three-step LGM approach was used. First, evidence about the process that generated the missing data was accumulated by analysing the extent of missing values in both study conditions, missing data patterns, and baseline variables that predicted participation in the two follow-up assessments using logistic regression. Second, growth models were calculated to analyse intervention effects over time. These models assumed that data were missing at random and applied full-information maximum likelihood estimation. Third, the findings were safeguarded by incorporating model components to account for the possibility that data were missing not at random. For that purpose, Diggle-Kenward selection, Wu-Carroll shared parameter and pattern mixture models were implemented. RESULTS: Although the true data generating process remained unknown, the evidence was unequivocal: both the intervention and control group reduced their alcohol use over time, but no significant group differences emerged. There was no clear evidence for intervention efficacy, neither in the growth models that assumed the missing data to be at random nor those that assumed the missing data to be not at random. CONCLUSION: The illustrated approach allows the assessment of how sensitive conclusions about the efficacy of an intervention are to different assumptions regarding the missing data mechanism. For researchers familiar with LGM, it is a valuable statistical supplement to safeguard their findings against the possibility of nonignorable missingness. TRIAL REGISTRATION: The PRINT trial was prospectively registered at the German Clinical Trials Register (DRKS00014274, date of registration: 12th March 2018).
Assuntos
Interpretação Estatística de Dados , Adulto , Feminino , Humanos , Masculino , Ensaios Clínicos Controlados Aleatórios como AssuntoRESUMO
Randomized clinical trials with outcome measured longitudinally are frequently analyzed using either random effect models or generalized estimating equations. Both approaches assume that the dropout mechanism is missing at random (MAR) or missing completely at random (MCAR). We propose a Bayesian pattern-mixture model to incorporate missingness mechanisms that might be missing not at random (MNAR), where the distribution of the outcome measure at the follow-up time tk , conditional on the prior history, differs across the patterns of missing data. We then perform sensitivity analysis on estimates of the parameters of interest. The sensitivity parameters relate the distribution of the outcome of interest between subjects from a missing-data pattern at time tk with that of the observed subjects at time tk . The large number of the sensitivity parameters is reduced by treating them as random with a prior distribution having some pre-specified mean and variance, which are varied to explore the sensitivity of inferences. The missing at random (MAR) mechanism is a special case of the proposed model, allowing a sensitivity analysis of deviations from MAR. The proposed approach is applied to data from the Trial of Preventing Hypertension.
Assuntos
Modelos Estatísticos , Avaliação de Resultados em Cuidados de Saúde , Teorema de Bayes , Coleta de Dados , Humanos , Estudos Longitudinais , Pacientes Desistentes do Tratamento , Ensaios Clínicos Controlados Aleatórios como AssuntoRESUMO
BACKGROUND: Missing data are common in end-of-life care studies, but there is still relatively little exploration of which is the best method to deal with them, and, in particular, if the missing at random (MAR) assumption is valid or missing not at random (MNAR) mechanisms should be assumed. In this paper we investigated this issue through a sensitivity analysis within the ACTION study, a multicenter cluster randomized controlled trial testing advance care planning in patients with advanced lung or colorectal cancer. METHODS: Multiple imputation procedures under MAR and MNAR assumptions were implemented. Possible violation of the MAR assumption was addressed with reference to variables measuring quality of life and symptoms. The MNAR model assumed that patients with worse health were more likely to have missing questionnaires, making a distinction between single missing items, which were assumed to satisfy the MAR assumption, and missing values due to completely missing questionnaire for which a MNAR mechanism was hypothesized. We explored the sensitivity to possible departures from MAR on gender differences between key indicators and on simple correlations. RESULTS: Up to 39% of follow-up data were missing. Results under MAR reflected that missingness was related to poorer health status. Correlations between variables, although very small, changed according to the imputation method, as well as the differences in scores by gender, indicating a certain sensitivity of the results to the violation of the MAR assumption. CONCLUSIONS: The findings confirmed the importance of undertaking this kind of analysis in end-of-life care studies.
Assuntos
Qualidade de Vida , Assistência Terminal , Humanos , Modelos Estatísticos , Projetos de PesquisaRESUMO
BACKGROUND: High-throughput technologies enable the cost-effective collection and analysis of DNA methylation data throughout the human genome. This naturally entails missing values management that can complicate the analysis of the data. Several general and specific imputation methods are suitable for DNA methylation data. However, there are no detailed studies of their performances under different missing data mechanisms -(completely) at random or not- and different representations of DNA methylation levels (ß and M-value). RESULTS: We make an extensive analysis of the imputation performances of seven imputation methods on simulated missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR) methylation data. We further consider imputation performances on the popular ß- and M-value representations of methylation levels. Overall, ß-values enable better imputation performances than M-values. Imputation accuracy is lower for mid-range ß-values, while it is generally more accurate for values at the extremes of the ß-value range. The MAR values distribution is on the average more dense in the mid-range in comparison to the expected ß-value distribution. As a consequence, MAR values are on average harder to impute. CONCLUSIONS: The results of the analysis provide guidelines for the most suitable imputation approaches for DNA methylation data under different representations of DNA methylation levels and different missing data mechanisms.
Assuntos
Metilação de DNA , Coleta de Dados , Epigenômica/métodos , HumanosRESUMO
BACKGROUND: Quantitative real-time PCR (qPCR) is one of the most widely used methods to measure gene expression. An important aspect of qPCR data that has been largely ignored is the presence of non-detects: reactions failing to exceed the quantification threshold and therefore lacking a measurement of expression. While most current software replaces these non-detects with a value representing the limit of detection, this introduces substantial bias in the estimation of both absolute and differential expression. Single imputation procedures, while an improvement on previously used methods, underestimate residual variance, which can lead to anti-conservative inference. RESULTS: We propose to treat non-detects as non-random missing data, model the missing data mechanism, and use this model to impute missing values or obtain direct estimates of model parameters. To account for the uncertainty inherent in the imputation, we propose a multiple imputation procedure, which provides a set of plausible values for each non-detect. We assess the proposed methods via simulation studies and demonstrate the applicability of these methods to three experimental data sets. We compare our methods to mean imputation, single imputation, and a penalized EM algorithm incorporating non-random missingness (PEMM). The developed methods are implemented in the R/Bioconductor package nondetects. CONCLUSIONS: The statistical methods introduced here reduce discrepancies in gene expression values derived from qPCR experiments in the presence of non-detects, providing increased confidence in downstream analyses.
Assuntos
Algoritmos , Reação em Cadeia da Polimerase em Tempo Real/métodos , Simulação por Computador , Humanos , Modelos Estatísticos , Tamanho da AmostraRESUMO
We develop and demonstrate methods to perform sensitivity analyses to assess sensitivity to plausible departures from missing at random in incomplete repeated binary outcome data. We use multiple imputation in the not at random fully conditional specification framework, which includes one or more sensitivity parameters (SPs) for each incomplete variable. The use of an online elicitation questionnaire is demonstrated to obtain expert opinion on the SPs, and highest prior density regions are used alongside opinion pooling methods to display credible regions for SPs. We demonstrate that substantive conclusions can be far more sensitive to departures from the missing at random assumption (MAR) when control and intervention nonresponders depart from MAR differently, and show that the correlation of arm specific SPs in expert opinion is particularly important. We illustrate these methods on the iQuit in Practice smoking cessation trial, which compared the impact of a tailored text messaging system versus standard care on smoking cessation. We show that conclusions about the effect of intervention on smoking cessation outcomes at 8 week and 6 months are broadly insensitive to departures from MAR, with conclusions significantly affected only when the differences in behavior between the nonresponders in the two trial arms is larger than expert opinion judges to be realistic.
Assuntos
Projetos de Pesquisa , Abandono do Hábito de Fumar , Interpretação Estatística de Dados , Humanos , Inquéritos e QuestionáriosRESUMO
BACKGROUND: LC-MS technology makes it possible to measure the relative abundance of numerous molecular features of a sample in single analysis. However, especially non-targeted metabolite profiling approaches generate vast arrays of data that are prone to aberrations such as missing values. No matter the reason for the missing values in the data, coherent and complete data matrix is always a pre-requisite for accurate and reliable statistical analysis. Therefore, there is a need for proper imputation strategies that account for the missingness and reduce the bias in the statistical analysis. RESULTS: Here we present our results after evaluating nine imputation methods in four different percentages of missing values of different origin. The performance of each imputation method was analyzed by Normalized Root Mean Squared Error (NRMSE). We demonstrated that random forest (RF) had the lowest NRMSE in the estimation of missing values for Missing at Random (MAR) and Missing Completely at Random (MCAR). In case of absent values due to Missing Not at Random (MNAR), the left truncated data was best imputed with minimum value imputation. We also tested the different imputation methods for datasets containing missing data of various origin, and RF was the most accurate method in all cases. The results were obtained by repeating the evaluation process 100 times with the use of metabolomics datasets where the missing values were introduced to represent absent data of different origin. CONCLUSION: Type and rate of missingness affects the performance and suitability of imputation methods. RF-based imputation method performs best in most of the tested scenarios, including combinations of different types and rates of missingness. Therefore, we recommend using random forest-based imputation for imputing missing metabolomics data, and especially in situations where the types of missingness are not known in advance.
Assuntos
Metabolômica/estatística & dados numéricos , Viés , Cromatografia Líquida , Humanos , Espectrometria de Massas/métodos , Espectrometria de Massas/estatística & dados numéricos , Metabolômica/métodosRESUMO
BACKGROUND: With the rise of metabolomics, the development of methods to address analytical challenges in the analysis of metabolomics data is of great importance. Missing values (MVs) are pervasive, yet the treatment of MVs can have a substantial impact on downstream statistical analyses. The MVs problem in metabolomics is quite challenging and can arise because the metabolite is not biologically present in the sample, or is present in the sample but at a concentration below the lower limit of detection (LOD), or is present in the sample but undetected due to technical issues related to sample pre-processing steps. The former is considered missing not at random (MNAR) while the latter is an example of missing at random (MAR). Typically, such MVs are substituted by a minimum value, which may lead to severely biased results in downstream analyses. RESULTS: We develop a Bayesian model, called BayesMetab, that systematically accounts for missing values based on a Markov chain Monte Carlo (MCMC) algorithm that incorporates data augmentation by allowing MVs to be due to either truncation below the LOD or other technical reasons unrelated to its abundance. Based on a variety of performance metrics (power for detecting differential abundance, area under the curve, bias and MSE for parameter estimates), our simulation results indicate that BayesMetab outperformed other imputation algorithms when there is a mixture of missingness due to MAR and MNAR. Further, our approach was competitive with other methods tailored specifically to MNAR in situations where missing data were completely MNAR. Applying our approach to an analysis of metabolomics data from a mouse myocardial infarction revealed several statistically significant metabolites not previously identified that were of direct biological relevance to the study. CONCLUSIONS: Our findings demonstrate that BayesMetab has improved performance in imputing the missing values and performing statistical inference compared to other current methods when missing values are due to a mixture of MNAR and MAR. Analysis of real metabolomics data strongly suggests this mixture is likely to occur in practice, and thus, it is important to consider an imputation model that accounts for a mixture of missing data types.
Assuntos
Teorema de Bayes , Metabolômica/métodos , Algoritmos , Animais , Viés , Camundongos , Método de Monte CarloRESUMO
The analysis of time-to-event data typically makes the censoring at random assumption, ie, that-conditional on covariates in the model-the distribution of event times is the same, whether they are observed or unobserved (ie, right censored). When patients who remain in follow-up stay on their assigned treatment, then analysis under this assumption broadly addresses the de jure, or "while on treatment strategy" estimand. In such cases, we may well wish to explore the robustness of our inference to more pragmatic, de facto or "treatment policy strategy," assumptions about the behaviour of patients post-censoring. This is particularly the case when censoring occurs because patients change, or revert, to the usual (ie, reference) standard of care. Recent work has shown how such questions can be addressed for trials with continuous outcome data and longitudinal follow-up, using reference-based multiple imputation. For example, patients in the active arm may have their missing data imputed assuming they reverted to the control (ie, reference) intervention on withdrawal. Reference-based imputation has two advantages: (a) it avoids the user specifying numerous parameters describing the distribution of patients' postwithdrawal data and (b) it is, to a good approximation, information anchored, so that the proportion of information lost due to missing data under the primary analysis is held constant across the sensitivity analyses. In this article, we build on recent work in the survival context, proposing a class of reference-based assumptions appropriate for time-to-event data. We report a simulation study exploring the extent to which the multiple imputation estimator (using Rubin's variance formula) is information anchored in this setting and then illustrate the approach by reanalysing data from a randomized trial, which compared medical therapy with angioplasty for patients presenting with angina.
Assuntos
Ensaios Clínicos como Assunto/métodos , Interpretação Estatística de Dados , Modelos Estatísticos , Simulação por Computador , Seguimentos , Humanos , Ensaios Clínicos Controlados Aleatórios como Assunto/métodos , Projetos de Pesquisa , Fatores de TempoRESUMO
Missing data is almost always present in real datasets, and introduces several statistical issues. One fundamental issue is that, in the absence of strong uncheckable assumptions, effects of interest are typically not nonparametrically identified. In this article, we review the generic approach of the use of identifying restrictions from a likelihood-based perspective, and provide points of contact for several recently proposed methods. An emphasis of this review is on restrictions for nonmonotone missingness, a subject that has been treated sparingly in the literature. We also present a general, fully-Bayesian, approach which is widely applicable and capable of handling a variety of identifying restrictions in a uniform manner.
RESUMO
The not-at-random fully conditional specification (NARFCS) procedure provides a flexible means for the imputation of multivariable missing data under missing-not-at-random conditions. Recent work has outlined difficulties with eliciting the sensitivity parameters of the procedure from expert opinion due to their conditional nature. Failure to adequately account for this conditioning will generate imputations that are inconsistent with the assumptions of the user. In this paper, we clarify the importance of correct conditioning of NARFCS sensitivity parameters and develop procedures to calibrate these sensitivity parameters by relating them to more easily elicited quantities, in particular, the sensitivity parameters from simpler pattern mixture models. Additionally, we consider how to include the missingness indicators as part of the imputation models of NARFCS, recommending including all of them in each model as default practice. Algorithms are developed to perform the calibration procedure and demonstrated on data from the Avon Longitudinal Study of Parents and Children, as well as with simulation studies.
Assuntos
Interpretação Estatística de Dados , Algoritmos , Viés , Humanos , Estudos Longitudinais , Modelos Estatísticos , Estatística como AssuntoRESUMO
BACKGROUND: Multiple imputation by chained equations (MICE) requires specifying a suitable conditional imputation model for each incomplete variable and then iteratively imputes the missing values. In the presence of missing not at random (MNAR) outcomes, valid statistical inference often requires joint models for missing observations and their indicators of missingness. In this study, we derived an imputation model for missing binary data with MNAR mechanism from Heckman's model using a one-step maximum likelihood estimator. We applied this approach to improve a previously developed approach for MNAR continuous outcomes using Heckman's model and a two-step estimator. These models allow us to use a MICE process and can thus also handle missing at random (MAR) predictors in the same MICE process. METHODS: We simulated 1000 datasets of 500 cases. We generated the following missing data mechanisms on 30% of the outcomes: MAR mechanism, weak MNAR mechanism, and strong MNAR mechanism. We then resimulated the first three cases and added an additional 30% of MAR data on a predictor, resulting in 50% of complete cases. We evaluated and compared the performance of the developed approach to that of a complete case approach and classical Heckman's model estimates. RESULTS: With MNAR outcomes, only methods using Heckman's model were unbiased, and with a MAR predictor, the developed imputation approach outperformed all the other approaches. CONCLUSIONS: In the presence of MAR predictors, we proposed a simple approach to address MNAR binary or continuous outcomes under a Heckman assumption in a MICE procedure.
Assuntos
Algoritmos , Interpretação Estatística de Dados , Funções Verossimilhança , Modelos Teóricos , Confiabilidade dos Dados , Epidemiologia/normas , Epidemiologia/estatística & dados numéricos , Humanos , Método de Monte Carlo , Avaliação de Resultados em Cuidados de Saúde/métodos , Avaliação de Resultados em Cuidados de Saúde/estatística & dados numéricosRESUMO
Multiple imputation has become a widely accepted technique to deal with the problem of incomplete data. Typically, imputation of missing values and the statistical analysis are performed separately. Therefore, the imputation model has to be consistent with the analysis model. If the data are analyzed with a mixture model, the parameter estimates are usually obtained iteratively. Thus, if the data are missing not at random, parameter estimation and treatment of missingness should be combined. We solve both problems by simultaneously imputing values using the data augmentation method and estimating parameters using the EM algorithm. This iterative procedure ensures that the missing values are properly imputed given the current parameter estimates. Properties of the parameter estimates were investigated in a simulation study. The results are illustrated using data from the National Health and Nutrition Examination Survey.
Assuntos
Modelos Estatísticos , Algoritmos , Simulação por Computador , Coleta de Dados/normas , Interpretação Estatística de Dados , Humanos , Inquéritos Nutricionais/estatística & dados numéricos , IncertezaRESUMO
Standard implementations of multiple imputation (MI) approaches provide unbiased inferences based on an assumption of underlying missing at random (MAR) mechanisms. However, in the presence of missing data generated by missing not at random (MNAR) mechanisms, MI is not satisfactory. Originating in an econometric statistical context, Heckman's model, also called the sample selection method, deals with selected samples using two joined linear equations, termed the selection equation and the outcome equation. It has been successfully applied to MNAR outcomes. Nevertheless, such a method only addresses missing outcomes, and this is a strong limitation in clinical epidemiology settings, where covariates are also often missing. We propose to extend the validity of MI to some MNAR mechanisms through the use of the Heckman's model as imputation model and a two-step estimation process. This approach will provide a solution that can be used in an MI by chained equation framework to impute missing (either outcomes or covariates) data resulting either from a MAR or an MNAR mechanism when the MNAR mechanism is compatible with a Heckman's model. The approach is illustrated on a real dataset from a randomised trial in patients with seasonal influenza. Copyright © 2016 John Wiley & Sons, Ltd.
Assuntos
Confiabilidade dos Dados , Interpretação Estatística de Dados , Ensaios Clínicos Controlados Aleatórios como Assunto , Humanos , Influenza Humana/tratamento farmacológico , Modelos Estatísticos , Projetos de PesquisaRESUMO
By examining the outcome trajectories of the dropout patients with different reasons in the schizophrenia trials, we note that although patients are recruited from the same protocol that have compatible baseline characteristics, they may respond differently even to the same treatment. Some patients show consistent improvement while others only have temporary relief. This creates different patient subpopulations characterized by their response and dropout patterns. At the same time, those who continue to improve seem to be more likely to complete the study while those who only experience temporary relief have a higher chance to drop out. Such phenomenon appears to be quite general in schizophrenia clinical trials. This simultaneous inhomogeneity both in patient response as well as dropout patterns creates a scenario of missing not at random and therefore results in biases when we use the statistical methods based on the missing at random assumption to test treatment efficacy. In this paper, we propose to use the latent class growth mixture model, which is a special case of the latent mixture model, to conduct the statistical analyses in such situation. This model allows us to take the inhomogeneity among subpopulations into consideration to make more accurate inferences on the treatment effect at any visit time. Comparing with the conventional statistical methods such as mixed-effects model for repeated measures, we demonstrate through simulations that the proposed latent mixture model approach gives better control on the Type I error rate in testing treatment effect. Published 2016. This article is a U.S. Government work and is in the public domain in the USA.
Assuntos
Modelos Estatísticos , Pacientes Desistentes do Tratamento , Ensaios Clínicos Controlados Aleatórios como Assunto/estatística & dados numéricos , Esquizofrenia/terapia , Humanos , Ensaios Clínicos Controlados Aleatórios como Assunto/métodos , Esquizofrenia/diagnóstico , Resultado do TratamentoRESUMO
In clinical trials, there always is the possibility to use data-driven adaptation at the end of a study. There prevails, however, concern on whether the type I error rate of the trial could be inflated with such design, thus, necessitating multiplicity adjustment. In this project, a simulation experiment was set up to assess type I error rate inflation associated with switching dose group as a function of dropout rate at the end of the study, where the primary analysis is in terms of a longitudinal outcome. This simulation is inspired by a clinical trial in Alzheimer's disease. The type I error rate was assessed under a number of scenarios, in terms of differing correlations between efficacy and tolerance, different missingness mechanisms, and different probabilities of switching. A collection of parameter values was used to assess sensitivity of the analysis. Results from ignorable likelihood analysis show that the type I error rate with and without switching was approximately the posited error rate for the various scenarios. Under last observation carried forward (LOCF), the type I error rate blew up both with and without switching. The type I error inflation is clearly connected to the criterion used for switching. While in general switching, in a way related to the primary endpoint, may impact the type I error, this was not the case for most scenarios in the longitudinal Alzheimer trial setting under consideration, where patients are expected to worsen over time.
Assuntos
Doença de Alzheimer/tratamento farmacológico , Ensaios Clínicos Fase III como Assunto/estatística & dados numéricos , Interpretação Estatística de Dados , Modelos Estatísticos , Pacientes Desistentes do Tratamento/estatística & dados numéricos , Ensaios Clínicos Controlados Aleatórios como Assunto/estatística & dados numéricos , Doença de Alzheimer/epidemiologia , Doença de Alzheimer/psicologia , Ensaios Clínicos Fase III como Assunto/métodos , Simulação por Computador , Relação Dose-Resposta a Droga , Determinação de Ponto Final/estatística & dados numéricos , Humanos , Funções Verossimilhança , Estudos Longitudinais , Ensaios Clínicos Controlados Aleatórios como Assunto/métodosRESUMO
Statistical analyses of recurrent event data have typically been based on the missing at random assumption. One implication of this is that, if data are collected only when patients are on their randomized treatment, the resulting de jure estimator of treatment effect corresponds to the situation in which the patients adhere to this regime throughout the study. For confirmatory analysis of clinical trials, sensitivity analyses are required to investigate alternative de facto estimands that depart from this assumption. Recent publications have described the use of multiple imputation methods based on pattern mixture models for continuous outcomes, where imputation for the missing data for one treatment arm (e.g. the active arm) is based on the statistical behaviour of outcomes in another arm (e.g. the placebo arm). This has been referred to as controlled imputation or reference-based imputation. In this paper, we use the negative multinomial distribution to apply this approach to analyses of recurrent events and other similar outcomes. The methods are illustrated by a trial in severe asthma where the primary endpoint was rate of exacerbations and the primary analysis was based on the negative binomial model.