RESUMO
Collecting information on multiple longitudinal outcomes is increasingly common in many clinical settings. In many cases, it is desirable to model these outcomes jointly. However, in large data sets, with many outcomes, computational burden often prevents the simultaneous modeling of multiple outcomes within a single model. We develop a mean field variational Bayes algorithm, to jointly model multiple Gaussian, Poisson, or binary longitudinal markers within a multivariate generalized linear mixed model. Through simulation studies and clinical applications (in the fields of sight threatening diabetic retinopathy and primary biliary cirrhosis), we demonstrate substantial computational savings of our approximate approach when compared to a standard Markov Chain Monte Carlo, while maintaining good levels of accuracy of model parameters.
Assuntos
Algoritmos , Humanos , Teorema de Bayes , Cadeias de Markov , Método de Monte Carlo , Distribuição NormalRESUMO
There is substantial interest in assessing how exposure to environmental mixtures, such as chemical mixtures, affect child health. Researchers are also interested in identifying critical time windows of susceptibility to these complex mixtures. A recently developed method, called lagged kernel machine regression (LKMR), simultaneously accounts for these research questions by estimating effects of time-varying mixture exposures, and identifying their critical exposure windows. However, LKMR inference using Markov chain Monte Carlo methods (MCMC-LKMR) is computationally burdensome and time intensive for large datasets, limiting its applicability. Therefore, we develop a mean field variational Bayesian inference procedure for lagged kernel machine regression (MFVB-LKMR). The procedure achieves computational efficiency and reasonable accuracy as compared with the corresponding MCMC estimation method. Updating parameters using MFVB may only take minutes, while the equivalent MCMC method may take many hours or several days. We apply MFVB-LKMR to PROGRESS, a prospective cohort study in Mexico. Results from a subset of PROGRESS using MFVB-LKMR provide evidence of significant positive association between second trimester cobalt levels and z-scored birthweight. This positive association is heightened by cesium exposure. MFVB-LKMR is a promising approach for computationally efficient analysis of environmental health datasets, to identify critical windows of exposure to complex mixtures.
RESUMO
We consider approximate inference methods for Bayesian inference to longitudinal and multilevel data within the context of health science studies. The complexity of these grouped data often necessitates the use of sophisticated statistical models. However, the large size of these data can pose significant challenges for model fitting in terms of computational speed and memory storage. Our methodology is motivated by a study that examines trends in cesarean section rates in the largest state of Australia, New South Wales, between 1994 and 2010. We propose a group-specific curve model that encapsulates the complex nonlinear features of the overall and hospital-specific trends in cesarean section rates while taking into account hospital variability over time. We use penalized spline-based smooth functions that represent trends and implement a fully mean field variational Bayes approach to model fitting. Our mean field variational Bayes algorithms allow a fast (up to the order of thousands) and streamlined analytical approximate inference for complex mixed effects models, with minor degradation in accuracy compared with the standard Markov chain Monte Carlo methods.
Assuntos
Teorema de Bayes , Bioestatística/métodos , Modelos Estatísticos , Algoritmos , Cesárea/estatística & dados numéricos , Cesárea/tendências , Simulação por Computador , Feminino , Humanos , Cadeias de Markov , Método de Monte Carlo , Gravidez , Análise de RegressãoRESUMO
Twitching motility-mediated biofilm expansion is a complex, multicellular behavior that enables the active colonization of surfaces by many species of bacteria. In this study we have explored the emergence of intricate network patterns of interconnected trails that form in actively expanding biofilms of Pseudomonas aeruginosa. We have used high-resolution, phase-contrast time-lapse microscopy and developed sophisticated computer vision algorithms to track and analyze individual cell movements during expansion of P. aeruginosa biofilms. We have also used atomic force microscopy to examine the topography of the substrate underneath the expanding biofilm. Our analyses reveal that at the leading edge of the biofilm, highly coherent groups of bacteria migrate across the surface of the semisolid media and in doing so create furrows along which following cells preferentially migrate. This leads to the emergence of a network of trails that guide mass transit toward the leading edges of the biofilm. We have also determined that extracellular DNA (eDNA) facilitates efficient traffic flow throughout the furrow network by maintaining coherent cell alignments, thereby avoiding traffic jams and ensuring an efficient supply of cells to the migrating front. Our analyses reveal that eDNA also coordinates the movements of cells in the leading edge vanguard rafts and is required for the assembly of cells into the "bulldozer" aggregates that forge the interconnecting furrows. Our observations have revealed that large-scale self-organization of cells in actively expanding biofilms of P. aeruginosa occurs through construction of an intricate network of furrows that is facilitated by eDNA.
Assuntos
Biofilmes , DNA Bacteriano/metabolismo , Pseudomonas aeruginosa/metabolismoRESUMO
Streamlined mean field variational Bayes algorithms for efficient fitting and inference in large models for longitudinal and multilevel data analysis are obtained. The number of operations is linear in the number of groups at each level, which represents a two orders of magnitude improvement over the naïve approach. Storage requirements are also lessened considerably. We treat models for the Gaussian and binary response situations. Our algorithms allow the fastest ever approximate Bayesian analyses of arbitrarily large longitudinal and multilevel datasets, with little degradation in accuracy compared with Markov chain Monte Carlo. The modularity of mean field variational Bayes allows relatively simple extension to more complicated scenarios.
Assuntos
Biometria/métodos , Interpretação Estatística de Dados , Algoritmos , Teorema de Bayes , Cadeias de Markov , Método de Monte Carlo , Distribuição NormalRESUMO
We derive streamlined mean field variational Bayes algorithms for fitting linear mixed models with crossed random effects. In the most general situation, where the dimensions of the crossed groups are arbitrarily large, streamlining is hindered by lack of sparseness in the underlying least squares system. Because of this fact we also consider a hierarchy of relaxations of the mean field product restriction. The least stringent product restriction delivers a high degree of inferential accuracy. However, this accuracy must be mitigated against its higher storage and computing demands. Faster sparse storage and computing alternatives are also provided, but come with the price of diminished inferential accuracy. This article provides full algorithmic details of three variational inference strategies, presents detailed empirical results on their pros and cons and, thus, guides the users on their choice of variational inference approach depending on the problem size and computing resources.
RESUMO
Although human experimental studies have shown that gaseous pollutants enhance the inflammatory response to allergens, human data on whether combustion particulates enhance the inflammatory response to allergen are limited. Therefore, we conducted a human experimental study to investigate whether combustion particulates enhance the inflammatory response to aeroallergens. "Enhancement" refers to a greater-than-additive response when combustion particulates are delivered with allergen, compared with the responses when particulates and allergen are delivered alone. Eight subjects, five atopic and three nonatopic, participated in three randomized exposure-challenge sessions at least 2 weeks apart (i.e., clean air followed by allergen, particles followed by no allergen, or particles followed by allergen). Each session consisted of nasal exposure to combustion particles (target concentration of 1.0 mg/m3) or clean air for 1 hr, followed 3 hr later by challenge with whole pollen grains or placebo. Nasal lavage was performed immediately before particle or clean air exposure, immediately after exposure, and 4, 18 and 42 hr after pollen challenge. Cell counts, differentials, and measurement of cytokines were performed on each nasal lavage. In atopic but not in nonatopic subjects, when allergen was preceded by particulates, there was a significant enhancement immediately after pollen challenge in nasal lavage leukocytes and neutrophils (29.7 X 10(3) cells/mL and 25.4 X 10(3) cells/mL, respectively). This represents a 143% and 130% enhancement, respectively. The enhanced response for interleukin-4 was 3.23 pg/mL (p = 0.06), a 395% enhancement. In atopic subjects there was evidence of an enhanced response when particulates, as compared to clean air, preceded the allergen challenge.
Assuntos
Poluentes Atmosféricos/efeitos adversos , Alérgenos/imunologia , Hipersensibilidade Imediata/imunologia , Pólen/imunologia , Adulto , Alérgenos/efeitos adversos , Citocinas/análise , Citocinas/biossíntese , Feminino , Humanos , Incineração , Inflamação , Masculino , Tamanho da Partícula , Pólen/efeitos adversos , Eliminação de ResíduosRESUMO
We introduce variational Bayes methods for fast approximate inference in functional regression analysis. Both the standard cross-sectional and the increasingly common longitudinal settings are treated. The methodology allows Bayesian functional regression analyses to be conducted without the computational overhead of Monte Carlo methods. Confidence intervals of the model parameters are obtained both using the approximate variational approach and nonparametric resampling of clusters. The latter approach is possible because our variational Bayes functional regression approach is computationally efficient. A simulation study indicates that variational Bayes is highly accurate in estimating the parameters of interest and in approximating the Markov chain Monte Carlo-sampled joint posterior distribution of the model parameters. The methods apply generally, but are motivated by a longitudinal neuroimaging study of multiple sclerosis patients. Code used in simulations is made available as a web-supplement.