RESUMEN
We develop a Bayesian variable selection method for logistic regression models that can simultaneously accommodate qualitative covariates and interaction terms under various heredity constraints. We use expectation-maximization variable selection (EMVS) with a deterministic annealing variant as the platform for our method, due to its proven flexibility and efficiency. We propose a variance adjustment of the priors for the coefficients of qualitative covariates, which controls false-positive rates, and a flexible parameterization for interaction terms, which accommodates user-specified heredity constraints. This method can handle all pairwise interaction terms as well as a subset of specific interactions. Using simulation, we show that this method selects associated covariates better than the grouped LASSO and the LASSO with heredity constraints in various exploratory research scenarios encountered in epidemiological studies. We apply our method to identify genetic and non-genetic risk factors associated with smoking experimentation in a cohort of Mexican-heritage adolescents.
RESUMEN
We introduce a nonparametric Bayesian model for a phase II clinical trial with patients presenting different subtypes of the disease under study. The objective is to estimate the success probability of an experimental therapy for each subtype. We consider the case when small sample sizes require extensive borrowing of information across subtypes, but the subtypes are not a priori exchangeable. The lack of a priori exchangeability hinders the straightforward use of traditional hierarchical models to implement borrowing of strength across disease subtypes. We introduce instead a random partition model for the set of disease subtypes. This is a variation of the product partition model that allows us to model a nonexchangeable prior structure. Like a hierarchical model, the proposed clustering approach considers all observations, across all disease subtypes, to estimate individual success probabilities. But in contrast to standard hierarchical models, the model considers disease subtypes a priori nonexchangeable. This implies that when assessing the success probability for a particular type our model borrows more information from the outcome of the patients sharing the same prognosis than from the others. Our data arise from a phase II clinical trial of patients with sarcoma, a rare type of cancer affecting connective or supportive tissues and soft tissue (e.g., cartilage and fat). Each patient presents one subtype of the disease and subtypes are grouped by good, intermediate, and poor prognosis. The prior model should respect the varying prognosis across disease subtypes. The practical motivation for the proposed approach is that the number of accrued patients within each disease subtype is small. Thus it is not possible to carry out a clinical study of possible new therapies for rare conditions, because it would be impossible to plan for sufficiently large sample size to achieve the desired power. We carry out a simulation study to compare the proposed model with a model that assumes similar success probabilities for all subtypes with the same prognosis, i.e., a fixed partition of subtypes by prognosis. When the assumption is satisfied the two models perform comparably. But the proposed model outperforms the competing model when the assumption is incorrect.
Asunto(s)
Biometría/métodos , Teorema de Bayes , Ensayos Clínicos Fase II como Asunto/estadística & datos numéricos , Simulación por Computador , Interpretación Estadística de Datos , Humanos , Modelos Estadísticos , Pronóstico , Tamaño de la Muestra , Sarcoma/clasificación , Estadísticas no ParamétricasRESUMEN
This article addresses modeling and inference for ordinal outcomes nested within categorical responses. We propose a mixture of normal distributions for latent variables associated with the ordinal data. This mixture model allows us to fix without loss of generality the cutpoint parameters that link the latent variable with the observed ordinal outcome. Moreover, the mixture model is shown to be more flexible in estimating cell probabilities when compared to the traditional Bayesian ordinal probit regression model with random cutpoint parameters. We extend our model to take into account possible dependence among the outcomes in different categories. We apply the model to a randomized phase III study to compare treatments on the basis of toxicities recorded by type of toxicity and grade within type. The data include the different (categorical) toxicity types exhibited in each patient. Each type of toxicity has an (ordinal) grade associated to it. The dependence among the different types of toxicity exhibited by the same patient is modeled by introducing patient-specific random effects.