Pesquisa | Portal Regional da BVS

Sampling from Dirichlet process mixture models with unknown concentration parameter: mixing issues in large data implementations.

Hastie, David I; Liverani, Silvia; Richardson, Sylvia.

Stat Comput ; 25(5): 1023-1037, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26321800

RESUMO

We consider the question of Markov chain Monte Carlo sampling from a general stick-breaking Dirichlet process mixture model, with concentration parameter [Formula: see text]. This paper introduces a Gibbs sampling algorithm that combines the slice sampling approach of Walker (Communications in Statistics - Simulation and Computation 36:45-54, 2007) and the retrospective sampling approach of Papaspiliopoulos and Roberts (Biometrika 95(1):169-186, 2008). Our general algorithm is implemented as efficient open source C++ software, available as an R package, and is based on a blocking strategy similar to that suggested by Papaspiliopoulos (A note on posterior sampling from Dirichlet mixture models, 2008) and implemented by Yau et al. (Journal of the Royal Statistical Society, Series B (Statistical Methodology) 73:37-57, 2011). We discuss the difficulties of achieving good mixing in MCMC samplers of this nature in large data sets and investigate sensitivity to initialisation. We additionally consider the challenges when an additional layer of hierarchy is added such that joint inference is to be made on [Formula: see text]. We introduce a new label-switching move and compute the marginal partition posterior to help to surmount these difficulties. Our work is illustrated using a profile regression (Molitor et al. Biostatistics 11(3):484-498, 2010) application, where we demonstrate good mixing behaviour for both synthetic and real examples.

PReMiuM: An R Package for Profile Regression Mixture Models Using Dirichlet Processes.

Liverani, Silvia; Hastie, David I; Azizi, Lamiae; Papathomas, Michail; Richardson, Sylvia.

J Stat Softw ; 64(7): 1-30, 2015 Mar 20.

Artigo em Inglês | MEDLINE | ID: mdl-27307779

RESUMO

PReMiuM is a recently developed R package for Bayesian clustering using a Dirichlet process mixture model. This model is an alternative to regression models, non-parametrically linking a response vector to covariate data through cluster membership (Molitor, Papathomas, Jerrett, and Richardson 2010). The package allows binary, categorical, count and continuous response, as well as continuous and discrete covariates. Additionally, predictions may be made for the response, and missing values for the covariates are handled. Several samplers and label switching moves are implemented along with diagnostic tools to assess convergence. A number of R functions for post-processing of the output are also provided. In addition to fitting mixtures, it may additionally be of interest to determine which covariates actively drive the mixture components. This is implemented in the package as variable selection.

A semi-parametric approach to estimate risk functions associated with multi-dimensional exposure profiles: application to smoking and lung cancer.

Hastie, David I; Liverani, Silvia; Azizi, Lamiae; Richardson, Sylvia; Stücker, Isabelle.

BMC Med Res Methodol ; 13: 129, 2013 Oct 23.

Artigo em Inglês | MEDLINE | ID: mdl-24152389

RESUMO

BACKGROUND: A common characteristic of environmental epidemiology is the multi-dimensional aspect of exposure patterns, frequently reduced to a cumulative exposure for simplicity of analysis. By adopting a flexible Bayesian clustering approach, we explore the risk function linking exposure history to disease. This approach is applied here to study the relationship between different smoking characteristics and lung cancer in the framework of a population based case control study. METHODS: Our study includes 4658 males (1995 cases, 2663 controls) with full smoking history (intensity, duration, time since cessation, pack-years) from the ICARE multi-centre study conducted from 2001-2007. We extend Bayesian clustering techniques to explore predictive risk surfaces for covariate profiles of interest. RESULTS: We were able to partition the population into 12 clusters with different smoking profiles and lung cancer risk. Our results confirm that when compared to intensity, duration is the predominant driver of risk. On the other hand, using pack-years of cigarette smoking as a single summary leads to a considerable loss of information. CONCLUSIONS: Our method estimates a disease risk associated to a specific exposure profile by robustly accounting for the different dimensions of exposure and will be helpful in general to give further insight into the effect of exposures that are accumulated through different time patterns.

Assuntos

Adenocarcinoma/etiologia , Neoplasias Pulmonares/etiologia , Fumar/efeitos adversos , Teorema de Bayes , Estudos de Casos e Controles , Interpretação Estatística de Dados , Exposição Ambiental/efeitos adversos , Humanos , Masculino , Modelos Estatísticos , Análise Multivariada , Razão de Chances , Fatores de Risco , Sensibilidade e Especificidade

GUESS-ing polygenic associations with multiple phenotypes using a GPU-based evolutionary stochastic search algorithm.

Bottolo, Leonardo; Chadeau-Hyam, Marc; Hastie, David I; Zeller, Tanja; Liquet, Benoit; Newcombe, Paul; Yengo, Loic; Wild, Philipp S; Schillert, Arne; Ziegler, Andreas; Nielsen, Sune F; Butterworth, Adam S; Ho, Weang Kee; Castagné, Raphaële; Munzel, Thomas; Tregouet, David; Falchi, Mario; Cambien, François; Nordestgaard, Børge G; Fumeron, Fredéric; Tybjærg-Hansen, Anne; Froguel, Philippe; Danesh, John; Petretto, Enrico; Blankenberg, Stefan; Tiret, Laurence; Richardson, Sylvia.

PLoS Genet ; 9(8): e1003657, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-23950726

RESUMO

Genome-wide association studies (GWAS) yielded significant advances in defining the genetic architecture of complex traits and disease. Still, a major hurdle of GWAS is narrowing down multiple genetic associations to a few causal variants for functional studies. This becomes critical in multi-phenotype GWAS where detection and interpretability of complex SNP(s)-trait(s) associations are complicated by complex Linkage Disequilibrium patterns between SNPs and correlation between traits. Here we propose a computationally efficient algorithm (GUESS) to explore complex genetic-association models and maximize genetic variant detection. We integrated our algorithm with a new Bayesian strategy for multi-phenotype analysis to identify the specific contribution of each SNP to different trait combinations and study genetic regulation of lipid metabolism in the Gutenberg Health Study (GHS). Despite the relatively small size of GHS (n â= â3,175), when compared with the largest published meta-GWAS (n > 100,000), GUESS recovered most of the major associations and was better at refining multi-trait associations than alternative methods. Amongst the new findings provided by GUESS, we revealed a strong association of SORT1 with TG-APOB and LIPC with TG-HDL phenotypic groups, which were overlooked in the larger meta-GWAS and not revealed by competing approaches, associations that we replicated in two independent cohorts. Moreover, we demonstrated the increased power of GUESS over alternative multi-phenotype approaches, both Bayesian and non-Bayesian, in a simulation study that mimics real-case scenarios. We showed that our parallel implementation based on Graphics Processing Units outperforms alternative multi-phenotype methods. Beyond multivariate modelling of multi-phenotypes, our Bayesian model employs a flexible hierarchical prior structure for genetic effects that adapts to any correlation structure of the predictors and increases the power to identify associated variants. This provides a powerful tool for the analysis of diverse genomic features, for instance including gene expression and exome sequencing data, where complex dependencies are present in the predictor space.

Assuntos

Algoritmos , Evolução Biológica , Estudo de Associação Genômica Ampla , Locos de Características Quantitativas/genética , Teorema de Bayes , Exoma/genética , Expressão Gênica , Humanos , Desequilíbrio de Ligação , Fenótipo , Polimorfismo de Nucleotídeo Único/genética

ESS++: a C++ objected-oriented algorithm for Bayesian stochastic search model exploration.

Bottolo, Leonardo; Chadeau-Hyam, Marc; Hastie, David I; Langley, Sarah R; Petretto, Enrico; Tiret, Laurence; Tregouet, David; Richardson, Sylvia.

Bioinformatics ; 27(4): 587-8, 2011 Feb 15.

Artigo em Inglês | MEDLINE | ID: mdl-21233165

RESUMO

SUMMARY: ESS++ is a C++ implementation of a fully Bayesian variable selection approach for single and multiple response linear regression. ESS++ works well both when the number of observations is larger than the number of predictors and in the 'large p, small n' case. In the current version, ESS++ can handle several hundred observations, thousands of predictors and a few responses simultaneously. The core engine of ESS++ for the selection of relevant predictors is based on Evolutionary Monte Carlo. Our implementation is open source, allowing community-based alterations and improvements. AVAILABILITY: C++ source code and documentation including compilation instructions are available under GNU licence at http://bgx.org.uk/software/ESS.html.

Assuntos

Algoritmos , Teorema de Bayes , Modelos Estatísticos , Linguagens de Programação , Software , Regulação da Expressão Gênica , Modelos Lineares , Processos Estocásticos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA