RESUMO
Complete case analyses of complete crossover designs provide an opportunity to make comparisons based on patients who can tolerate all treatments. It is argued that this provides a means of estimating a principal stratum strategy estimand, something which is difficult to do in parallel group trials. While some trial users will consider this a relevant aim, others may be interested in hypothetical strategy estimands, that is, the effect that would be found if all patients completed the trial. Whether these estimands differ importantly is a question of interest to the different users of the trial results. This paper derives the difference between principal stratum strategy and hypothetical strategy estimands, where the former is estimated by a complete-case analysis of the crossover design, and a model for the dropout process is assumed. Complete crossover designs, that is, those where all treatments appear in all sequences, and which compare t treatments over p periods with respect to a continuous outcome are considered. Numerical results are presented for Williams designs with four and six periods. Results from a trial of obstructive sleep apnoea-hypopnoea (TOMADO) are also used for illustration. The results demonstrate that the percentage difference between the estimands is modest, exceeding 5% only when the trial has been severely affected by dropouts or if the within-subject correlation is low.
Assuntos
Apneia Obstrutiva do Sono , Humanos , Estudos Cross-Over , Apneia Obstrutiva do Sono/terapia , Projetos de PesquisaRESUMO
In several applications, the assumption of normality is often violated in data with some level of skewness, so skewness affects the mean's estimation. The class of skew-normal distributions is considered, given their flexibility for modeling data with asymmetry parameter. In this paper, we considered two location parameter (µ) estimation methods in the skew-normal setting, where the coefficient of variation and the skewness parameter are known. Specifically, the least square estimator (LSE) and the best unbiased estimator (BUE) for µ are considered. The properties for BUE (which dominates LSE) using classic theorems of information theory are explored, which provides a way to measure the uncertainty of location parameter estimations. Specifically, inequalities based on convexity property enable obtaining lower and upper bounds for differential entropy and Fisher information. Some simulations illustrate the behavior of differential entropy and Fisher information bounds.
RESUMO
Cellular heterogeneity is known to have important effects on signal processing and cellular decision making. To understand these processes, multiple classes of mathematical models have been introduced. The hierarchical population model builds a novel class which allows for the mechanistic description of heterogeneity and explicitly takes into account subpopulation structures. However, this model requires a parametric distribution assumption for the cell population and, so far, only the normal distribution has been employed. Here, we incorporate alternative distribution assumptions into the model, assess their robustness against outliers and evaluate their influence on the performance of model calibration in a simulation study and a real-world application example. We found that alternative distributions provide reliable parameter estimates even in the presence of outliers, and can in fact increase the convergence of model calibration.
Assuntos
Modelos Estatísticos , Modelos Teóricos , Calibragem , Simulação por Computador , Distribuição NormalRESUMO
The normality assumption of measurement error is a widely used distribution in joint models of longitudinal and survival data, but it may lead to unreasonable or even misleading results when longitudinal data reveal skewness feature. This paper proposes a new joint model for multivariate longitudinal and multivariate survival data by incorporating a nonparametric function into the trajectory function and hazard function and assuming that measurement errors in longitudinal measurement models follow a skew-normal distribution. A Monte Carlo Expectation-Maximization (EM) algorithm together with the penalized-splines technique and the Metropolis-Hastings algorithm within the Gibbs sampler is developed to estimate parameters and nonparametric functions in the considered joint models. Case deletion diagnostic measures are proposed to identify the potential influential observations, and an extended local influence method is presented to assess local influence of minor perturbations. Simulation studies and a real example from a clinical trial are presented to illustrate the proposed methodologies. Copyright © 2017 John Wiley & Sons, Ltd.
Assuntos
Estudos Longitudinais , Modelos Estatísticos , Análise Multivariada , Análise de Sobrevida , Causalidade , Humanos , Método de Monte Carlo , Modelos de Riscos Proporcionais , Estatísticas não ParamétricasRESUMO
Quantile regression (QR) models have recently received increasing attention in longitudinal studies where measurements of the same individuals are taken repeatedly over time. When continuous (longitudinal) responses follow a distribution that is quite different from a normal distribution, usual mean regression (MR)-based linear models may fail to produce efficient estimators, whereas QR-based linear models may perform satisfactorily. To the best of our knowledge, there have been very few studies on QR-based nonlinear models for longitudinal data in comparison to MR-based nonlinear models. In this article, we study QR-based nonlinear mixed-effects (NLME) joint models for longitudinal data with non-central location and outliers and/or heavy tails in response, and non-normality and measurement errors in covariate under Bayesian framework. The proposed QR-based modeling method is compared with an MR-based one by an AIDS clinical dataset and through simulation studies. The proposed QR joint modeling approach can be not only applied to AIDS clinical studies, but also may have general applications in other fields as long as relevant technical specifications are met.
Assuntos
Interpretação Estatística de Dados , Bases de Dados Factuais/estatística & dados numéricos , Dinâmica não Linear , Síndrome da Imunodeficiência Adquirida/sangue , Síndrome da Imunodeficiência Adquirida/epidemiologia , Síndrome da Imunodeficiência Adquirida/terapia , Teorema de Bayes , Método Duplo-Cego , Humanos , Estudos Longitudinais , Ensaios Clínicos Controlados Aleatórios como Assunto/estatística & dados numéricos , Análise de RegressãoRESUMO
In many practical cases of multiple hypothesis problems, it can be expected that the alternatives are not symmetrically distributed. If it is known a priori that the distributions of the alternatives are skewed, we show that this information yields high power procedures as compared to the procedures based on symmetric alternatives when testing multiple hypotheses. We propose a Bayesian decision theoretic rule for multiple directional hypothesis testing, when the alternatives are distributed as skewed, under a constraint on a mixed directional false discovery rate. We compare the proposed rule with a frequentist's rule of Benjamini and Yekutieli (2005) using simulations. We apply our method to a well-studied HIV dataset.
Assuntos
Artefatos , Biologia Computacional/estatística & dados numéricos , Modelos Estatísticos , Teorema de Bayes , Simulação por Computador , Perfilação da Expressão Gênica , Infecções por HIV/genética , Humanos , Método de Monte Carlo , Análise de Sequência com Séries de OligonucleotídeosRESUMO
Single-arm two-stage designs for phase II of clinical trials typically focus on a binary endpoint obtained by dichotomizing an underlying continuous measure of treatment efficacy. To avoid the resulting loss of information, we propose a two-stage design based on a Bayesian predictive approach that directly uses the original continuous endpoint. Numerical results are provided with reference to phase II cancer trials aimed at assessing tumor shrinking effect of an experimental treatment.
Assuntos
Teorema de Bayes , Determinação de Ponto Final , Projetos de Pesquisa , Ensaios Clínicos Fase II como Assunto , Humanos , Neoplasias/terapia , Resultado do TratamentoRESUMO
Growth curve models are widely used in social and behavioral sciences. However, typical growth curve models often assume that the errors are normally distributed although non-normal data may be even more common than normal data. In order to avoid possible statistical inference problems in blindly assuming normality, a general Bayesian framework is proposed to flexibly model normal and non-normal data through the explicit specification of the error distributions. A simulation study shows when the distribution of the error is correctly specified, one can avoid the loss in the efficiency of standard error estimates. A real example on the analysis of mathematical ability growth data from the Early Childhood Longitudinal Study, Kindergarten Class of 1998-99 is used to show the application of the proposed methods. Instructions and code on how to conduct growth curve analysis with both normal and non-normal error distributions using the the MCMC procedure of SAS are provided.
Assuntos
Teorema de Bayes , Desenvolvimento Infantil , Pré-Escolar , Simulação por Computador , Humanos , Estudos Longitudinais , Modelos EstatísticosRESUMO
We propose a semiparametric multivariate skew-normal joint model for multivariate longitudinal and multivariate survival data. One main feature of the posited model is that we relax the commonly used normality assumption for random effects and within-subject error by using a centered Dirichlet process prior to specify the random effects distribution and using a multivariate skew-normal distribution to specify the within-subject error distribution and model trajectory functions of longitudinal responses semiparametrically. A Bayesian approach is proposed to simultaneously obtain Bayesian estimates of unknown parameters, random effects and nonparametric functions by combining the Gibbs sampler and the Metropolis-Hastings algorithm. Particularly, a Bayesian local influence approach is developed to assess the effect of minor perturbations to within-subject measurement error and random effects. Several simulation studies and an example are presented to illustrate the proposed methodologies.
Assuntos
Teorema de Bayes , Modelos Estatísticos , Algoritmos , Bioestatística/métodos , Neoplasias da Mama/mortalidade , Neoplasias da Mama/psicologia , Ensaios Clínicos como Assunto/estatística & dados numéricos , Simulação por Computador , Feminino , Humanos , Estudos Longitudinais , Análise Multivariada , Qualidade de Vida , Análise de SobrevidaRESUMO
In health services research, it is common to encounter semicontinuous data characterized by a point mass at zero followed by a right-skewed continuous distribution with positive support. Examples include health expenditures, in which the zeros represent a subpopulation of patients who do not use health services, while the continuous distribution describes the level of expenditures among health services users. Semicontinuous data are typically analyzed using two-part mixture models that separately model the probability of health services use and the distribution of positive expenditures among users. However, because the second part conditions on a non-zero response, conventional two-part models do not provide a marginal interpretation of covariate effects on the overall population of health service users and non-users, even though this is often of greatest interest to investigators. Here, we propose a marginalized two-part model that yields more interpretable effect estimates in two-part models by parameterizing the model in terms of the marginal mean. This model maintains many of the important features of conventional two-part models, such as capturing zero-inflation and skewness, but allows investigators to examine covariate effects on the overall marginal mean, a target of primary interest in many applications. Using a simulation study, we examine properties of the maximum likelihood estimates from this model. We illustrate the approach by evaluating the effect of a behavioral weight loss intervention on health-care expenditures in the Veterans Affairs health-care system.
Assuntos
Interpretação Estatística de Dados , Funções Verossimilhança , Modelos Estatísticos , Simulação por Computador , Humanos , Obesidade/economia , Estados Unidos , Veteranos , Programas de Redução de Peso/economia , Programas de Redução de Peso/normasRESUMO
We introduce a new class of heteroscedastic partially linear model (PLM) with skew-normal distribution. Maximum likelihood estimation of the model parameters by the ECM algorithm (Expectation/Conditional Maximization) as well as influence diagnostics for the new model are investigated. In addition, a Likelihood Ratio test for assessing the homogeneity of the scale parameter is presented. Simulation studies for assessing the performance of the ECM algorithm and the Likelihood Ratio test statistics for homogeneity of variance are developed. Also, a study for misspecification of the structure function is considered. Finally, an application of the new heteroscedastic PLM to a real data set on ragweed pollen concentration is presented to show that it provides a better fit than the classic homocedastic PLM. We hope that the proposed model may attract applications in different areas of knowledge.
RESUMO
Fire insurance is a crucial component of property insurance, and its rating depends on the forecast of insurance loss claim data. Fire insurance loss claim data have complicated characteristics such as skewness and heavy tail. The traditional linear mixed model is commonly difficult to accurately describe the distribution of loss. Therefore, it is crucial to establish a scientific and reasonable distribution model of fire insurance loss claim data. In this study, the random effects and random errors in the linear mixed model are firstly assumed to obey the skew-normal distribution. Then, a skew-normal linear mixed model is established using the Bayesian MCMC method based on a set of U.S. property insurance loss claims data. Comparative analysis is conducted with the linear mixed model of logarithmic transformation. Afterward, a Bayesian skew-normal linear mixed model for Chinese fire insurance loss claims data is designed. The posterior distribution of claim data parameters and related parameter estimation are employed with the R language JAGS package to obtain the predicted and simulated loss claim values. Finally, the optimization model in this study is used to determine the insurance rate. The results demonstrate that the model established by the Bayesian MCMC method can overcome data skewness, and the fitting and correlation with the sample data are better than the log-normal linear mixed model. Hence, it can be concluded that the distribution model proposed in this paper is reasonable for describing insurance claims. This study innovates a new approach for calculating the insurance premium rate and expands the application of the Bayesian method in the fire insurance field.
RESUMO
Background: Pathological conditions may result in certain genes having expression variance that differs markedly from that of the control. Finding such genes from gene expression data can provide invaluable candidates for therapeutic intervention. Under the dominant paradigm for modeling RNA-Seq gene counts using the negative binomial model, tests of differential variability are challenging to develop, owing to dependence of the variance on the mean. Methods: Here, we describe clrDV, a statistical method for detecting genes that show differential variability between two populations. We present the skew-normal distribution for modeling gene-wise null distribution of centered log-ratio transformation of compositional RNA-seq data. Results: Simulation results show that clrDV has false discovery rate and probability of Type II error that are on par with or superior to existing methodologies. In addition, its run time is faster than its closest competitors, and remains relatively constant for increasing sample size per group. Analysis of a large neurodegenerative disease RNA-Seq dataset using clrDV successfully recovers multiple gene candidates that have been reported to be associated with Alzheimer's disease.
Assuntos
Perfilação da Expressão Gênica , Doenças Neurodegenerativas , Humanos , RNA-Seq , Perfilação da Expressão Gênica/métodos , Distribuição Normal , Análise de Sequência de RNA/métodosRESUMO
Many joint models of multivariate skew-normal longitudinal and survival data have been presented to accommodate for the non-normality of longitudinal outcomes in recent years. But existing work did not consider variable selection. This article investigates simultaneous parameter estimation and variable selection in joint modeling of longitudinal and survival data. The penalized splines technique is used to estimate unknown log baseline hazard function, the rectangle integral method is adopted to approximate conditional survival function. Monte Carlo expectation-maximization algorithm is developed to estimate model parameters. Based on local linear approximations to conditional expectation of likelihood function and penalty function, a one-step sparse estimation procedure is proposed to circumvent the computationally challenge in optimizing the penalized conditional expectation of likelihood function, which is utilized to select significant covariates and trajectory functions, and identify the departure from normality of longitudinal data. The conditional expectation of likelihood function-based Bayesian information criterion is developed to select the optimal tuning parameter. Simulation studies and a real example from the clinical trial are used to illustrate the proposed methodologies.
Assuntos
Algoritmos , Modelos Estatísticos , Teorema de Bayes , Simulação por Computador , Funções Verossimilhança , Método de Monte Carlo , Estudos LongitudinaisRESUMO
Most item response theory (IRT) models for dichotomous responses are based on probit or logit link functions which assume a symmetric relationship between the probability of a correct response and the latent traits of individuals taking a test. This assumption restricts the use of those models to the case in which all items behave symmetrically. On the other hand, asymmetric models proposed in the literature impose that all the items in a test behave asymmetrically. This assumption is inappropriate for great majority of tests which are, in general, composed of both symmetric and asymmetric items. Furthermore, a straightforward extension of the existing models in the literature would require a prior selection of the items' symmetry/asymmetry status. This paper proposes a Bayesian IRT model that accounts for symmetric and asymmetric items in a flexible but parsimonious way. That is achieved by assigning a finite mixture prior to the skewness parameter, with one of the mixture components being a point mass at zero. This allows for analyses under both model selection and model averaging approaches. Asymmetric item curves are designed through the centred skew normal distribution, which has a particularly appealing parametrization in terms of parameter interpretation and computational efficiency. An efficient Markov chain Monte Carlo algorithm is proposed to perform Bayesian inference and its performance is investigated in some simulated examples. Finally, the proposed methodology is applied to a data set from a large-scale educational exam in Brazil.
Assuntos
Algoritmos , Humanos , Teorema de Bayes , Cadeias de Markov , Método de Monte CarloRESUMO
A generalization of the Probit model is presented, with the extended skew-normal cumulative distribution as a link function, which can be used for modelling a binary response variable in the presence of selectivity bias. The estimate of the parameters via ML is addressed, and inference on the parameters expressing the degree of selection is discussed. The assumption underlying the model is that the selection mechanism influences the unmeasured factors and does not affect the explanatory variables. When this assumption is violated, but other conditional independencies hold, then the model proposed here is derived. In particular, the instrumental variable formula still applies and the model results at the second stage of the estimating procedure.
RESUMO
The Graded Response Model (GRM; Samejima, Estimation of ability using a response pattern of graded scores, Psychometric Monograph No. 17, Richmond, VA: The Psychometric Society, 1969) can be derived by assuming a linear regression of a continuous variable, Z, on the trait, θ, to underlie the ordinal item scores (Takane & de Leeuw in Psychometrika, 52:393-408, 1987). Traditionally, a normal distribution is specified for Z implying homoscedastic error variances and a normally distributed θ. In this paper, we present the Heteroscedastic GRM with Skewed Latent Trait, which extends the traditional GRM by incorporation of heteroscedastic error variances and a skew-normal latent trait. An appealing property of the extended GRM is that it includes the traditional GRM as a special case. This enables specific tests on the normality assumption of Z. We show how violations of normality in Z can lead to asymmetrical category response functions. The ability to test this normality assumption is beneficial from both a statistical and substantive perspective. In a simulation study, we show the viability of the model and investigate the specificity of the effects. We apply the model to a dataset on affect and a dataset on alexithymia.
RESUMO
Joint models of longitudinal and time-to-event data have received a lot of attention in epidemiological and clinical research under a linear mixed-effects model with the normal assumption for a single longitudinal outcome and Cox proportional hazards model. However, those model-based analyses may not provide robust inference when longitudinal measurements exhibit skewness and/or heavy tails. In addition, the data collected are often featured by multivariate longitudinal outcomes which are significantly correlated, and ignoring their correlation may lead to biased estimation. Under the umbrella of Bayesian inference, this article introduces multivariate joint (MVJ) models with a skewed distribution for multiple longitudinal exposures in an attempt to cope with correlated multiple longitudinal outcomes, adjust departures from normality, and tailor linkage in specifying a time-to-event process. We develop a Bayesian joint modeling approach to MVJ models that couples a multivariate linear mixed-effects (MLME) model with the skew-normal (SN) distribution and a Cox proportional hazards model. Our proposed models and method are evaluated by simulation studies and are applied to a real example from a diabetes study.
RESUMO
Grouped data are frequently used in several fields of study. In this work, we use the expectation-maximization (EM) algorithm for fitting the skew-normal (SN) mixture model to the grouped data. Implementing the EM algorithm requires computing the one-dimensional integrals for each group or class. Our simulation study and real data analyses reveal that the EM algorithm not only always converges but also can be implemented in just a few seconds even when the number of components is large, contrary to the Bayesian paradigm that is computationally expensive. The accuracy of the EM algorithm and superiority of the SN mixture model over the traditional normal mixture model in modelling grouped data are demonstrated through the simulation and three real data illustrations. For implementing the EM algorithm, we use the package called ForestFit developed for R environment available at https://cran.r-project.org/web/packages/ForestFit/index.html.
RESUMO
Premature mortality is often a neglected component of overall deaths, and the most difficult to identify. However, it is important to estimate its prevalence. Following Pearson's theory about mortality components, a definition of premature deaths and a parametric model to study its transformations are introduced. The model is a mixture of three distributions: a Half Normal for the first part of the death curve and two Skew Normals to fit the remaining pieces. One advantage of the model is the possibility of obtaining an explicit equation to compute life expectancy at birth and to break it down into mortality components. We estimated the mixture model for Sweden, France, East Germany and Czech Republic. In addition, to the well-known reduction in infant deaths, and compression and shifting trend of adult mortality, we were able to study the trend of the central part of the distribution of deaths in detail. In general, a right shift of the modal age at death for young adults is observed; in some cases, it is also accompanied by an increase in the number of deaths at these ages: in particular for France, in the last twenty years, premature mortality increases.