RESUMO
Mediation analysis with contemporaneously observed multiple mediators is a significant area of causal inference. Recent approaches for multiple mediators are often based on parametric models and thus may suffer from model misspecification. Also, much of the existing literature either only allow estimation of the joint mediation effect or estimate the joint mediation effect just as the sum of individual mediator effects, ignoring the interaction among the mediators. In this article, we propose a novel Bayesian nonparametric method that overcomes the two aforementioned drawbacks. We model the joint distribution of the observed data (outcome, mediators, treatment, and confounders) flexibly using an enriched Dirichlet process mixture with three levels. We use standardization (g-computation) to compute all possible mediation effects, including pairwise and all other possible interaction among the mediators. We thoroughly explore our method via simulations and apply our method to a mental health data from Wisconsin Longitudinal Study, where we estimate how the effect of births from unintended pregnancies on later life mental depression (CES-D) among the mothers is mediated through lack of self-acceptance and autonomy, employment instability, lack of social participation, and increased family stress. Our method identified significant individual mediators, along with some significant pairwise effects.
Assuntos
Teorema de Bayes , Humanos , Análise de Mediação , Feminino , Estudos Longitudinais , Modelos Estatísticos , Saúde MentalRESUMO
We propose a nonparametric compound Poisson model for underreported count data that introduces a latent clustering structure for the reporting probabilities. The latter are estimated with the model's parameters based on experts' opinion and exploiting a proxy for the reporting process. The proposed model is used to estimate the prevalence of chronic kidney disease in Apulia, Italy, based on a unique statistical database covering information on m = 258 municipalities obtained by integrating multisource register information. Accurate prevalence estimates are needed for monitoring, surveillance, and management purposes; yet, counts are deemed to be considerably underreported, especially in some areas of Apulia, one of the most deprived and heterogeneous regions in Italy. Our results agree with previous findings and highlight interesting geographical patterns of the disease. We compare our model to existing approaches in the literature using simulated as well as real data on early neonatal mortality risk in Brazil, described in previous research: the proposed approach proves to be accurate and particularly suitable when partial information about data quality is available.
RESUMO
Regional aggregates of health outcomes over delineated administrative units (e.g., states, counties, and zip codes), or areal units, are widely used by epidemiologists to map mortality or incidence rates and capture geographic variation. To capture health disparities over regions, we seek "difference boundaries" that separate neighboring regions with significantly different spatial effects. Matters are more challenging with multiple outcomes over each unit, where we capture dependence among diseases as well as across the areal units. Here, we address multivariate difference boundary detection for correlated diseases. We formulate the problem in terms of Bayesian pairwise multiple comparisons and seek the posterior probabilities of neighboring spatial effects being different. To achieve this, we endow the spatial random effects with a discrete probability law using a class of multivariate areally referenced Dirichlet process models that accommodate spatial and interdisease dependence. We evaluate our method through simulation studies and detect difference boundaries for multiple cancers using data from the Surveillance, Epidemiology, and End Results Program of the National Cancer Institute.
Assuntos
Teorema de Bayes , Humanos , Simulação por Computador , Probabilidade , IncidênciaRESUMO
We propose a new Bayesian nonparametric method for estimating the causal effects of mediation in the presence of a post-treatment confounder. The methodology is motivated by the Rural Lifestyle Intervention Treatment Effectiveness Trial (Rural LITE) for which there is interest in estimating causal mediation effects but is complicated by the presence of a post-treatment confounder. We specify an enriched Dirichlet process mixture (EDPM) to model the joint distribution of the observed data (outcome, mediator, post-treatment confounder, treatment, and baseline confounders). For identifiability, we use the extended version of the standard sequential ignorability (SI) as introduced in Hong et al. along with a Gaussian copula model assumption. The observed data model and causal identification assumptions enable us to estimate and identify the causal effects of mediation, that is, the natural direct effects (NDE) and natural indirect effects (NIE). Our method enables easy computation of NIE and NDE for a subset of confounding variables and addresses missing data through data augmentation under the assumption of ignorable missingness. We conduct simulation studies to assess the performance of our proposed method. Furthermore, we apply this approach to evaluate the causal mediation effect in the Rural LITE trial, finding that there was not strong evidence for the potential mediator.
Assuntos
Teorema de Bayes , Causalidade , Simulação por Computador , Modelos Estatísticos , Humanos , Fatores de Confusão Epidemiológicos , Estatísticas não Paramétricas , Análise de Mediação , Resultado do Tratamento , Biometria/métodos , Interpretação Estatística de Dados , População Rural/estatística & dados numéricos , Estilo de VidaRESUMO
Mouse-tracking data, which record computer mouse trajectories while participants perform an experimental task, provide valuable insights into subjects' underlying cognitive processes. Neuroscientists are interested in clustering the subjects' responses during computer mouse-tracking tasks to reveal patterns of individual decision-making behaviors and identify population subgroups with similar neurobehavioral responses. These data can be combined with neuroimaging data to provide additional information for personalized interventions. In this article, we develop a novel hierarchical shrinkage partition (HSP) prior for clustering summary statistics derived from the trajectories of mouse-tracking data. The HSP model defines a subjects' cluster as a set of subjects that gives rise to more similar (rather than identical) nested partitions of the conditions. The proposed model can incorporate prior information about the partitioning of either subjects or conditions to facilitate clustering, and it allows for deviations of the nested partitions within each subject group. These features distinguish the HSP model from other bi-clustering methods that typically create identical nested partitions of conditions within a subject group. Furthermore, it differs from existing nested clustering methods, which define clusters based on common parameters in the sampling model and identify subject groups by different distributions. We illustrate the unique features of the HSP model on a mouse tracking dataset from a pilot study and in simulation studies. Our results show the ability and effectiveness of the proposed exploratory framework in clustering and revealing possible different behavioral patterns across subject groups.
Assuntos
Simulação por Computador , Análise por Conglomerados , Humanos , Modelos Estatísticos , Computadores , Animais , Tomada de DecisõesRESUMO
Several epidemiological studies have provided evidence that long-term exposure to fine particulate matter (pm2.5) increases mortality rate. Furthermore, some population characteristics (e.g., age, race, and socioeconomic status) might play a crucial role in understanding vulnerability to air pollution. To inform policy, it is necessary to identify groups of the population that are more or less vulnerable to air pollution. In causal inference literature, the group average treatment effect (GATE) is a distinctive facet of the conditional average treatment effect. This widely employed metric serves to characterize the heterogeneity of a treatment effect based on some population characteristics. In this paper, we introduce a novel Confounder-Dependent Bayesian Mixture Model (CDBMM) to characterize causal effect heterogeneity. More specifically, our method leverages the flexibility of the dependent Dirichlet process to model the distribution of the potential outcomes conditionally to the covariates and the treatment levels, thus enabling us to: (i) identify heterogeneous and mutually exclusive population groups defined by similar GATEs in a data-driven way, and (ii) estimate and characterize the causal effects within each of the identified groups. Through simulations, we demonstrate the effectiveness of our method in uncovering key insights about treatment effects heterogeneity. We apply our method to claims data from Medicare enrollees in Texas. We found six mutually exclusive groups where the causal effects of pm2.5 on mortality rate are heterogeneous.
Assuntos
Poluentes Atmosféricos , Poluição do Ar , Estados Unidos/epidemiologia , Poluentes Atmosféricos/efeitos adversos , Poluentes Atmosféricos/análise , Teorema de Bayes , Medicare , Poluição do Ar/efeitos adversos , Poluição do Ar/análise , Material Particulado/efeitos adversos , Material Particulado/análise , Exposição Ambiental/efeitos adversosRESUMO
Dysphagia, a common result of other medical conditions, is caused by malfunctions in swallowing physiology resulting in difficulty eating and drinking. The Modified Barium Swallow Study (MBSS), the most commonly used diagnostic tool for evaluating dysphagia, can be assessed using the Modified Barium Swallow Impairment Profile (MBSImP™). The MBSImP assessment tool consists of a hierarchical grouped data structure with multiple domains, a set of components within each domain which characterize specific swallowing physiologies, and a set of tasks scored on a discrete scale within each component. We lack sophisticated approaches to extract patterns of physiologic swallowing impairment from the MBSImP task scores within a component while still recognizing the nested structure of components within a domain. We propose a Bayesian hierarchical profile regression model, which uses a Bayesian profile regression model in conjunction with a hierarchical Dirichlet process mixture model to (1) cluster subjects into impairment profile patterns while respecting the hierarchical grouped data structure of the MBSImP, and (2) simultaneously determine associations between latent profile cluster membership for all components and the outcome of dysphagia severity. We apply our approach to a cohort of patients referred for an MBSS and assessed using the MBSImP. Our research results can be used to inform appropriate intervention strategies, and provide tools for clinicians to make better multidimensional management and treatment decisions for patients with dysphagia.
Assuntos
Teorema de Bayes , Transtornos de Deglutição , Humanos , Análise de Regressão , Feminino , Modelos Estatísticos , Masculino , Análise por ConglomeradosRESUMO
The prevalence of chronic non-communicable diseases such as obesity has noticeably increased in the last decade. The study of these diseases in early life is of paramount importance in determining their course in adult life and in supporting clinical interventions. Recently, attention has been drawn to approaches that study the alteration of metabolic pathways in obese children. In this work, we propose a novel joint modeling approach for the analysis of growth biomarkers and metabolite associations, to unveil metabolic pathways related to childhood obesity. Within a Bayesian framework, we flexibly model the temporal evolution of growth trajectories and metabolic associations through the specification of a joint nonparametric random effect distribution, with the main goal of clustering subjects, thus identifying risk sub-groups. Growth profiles as well as patterns of metabolic associations determine the clustering structure. Inclusion of risk factors is straightforward through the specification of a regression term. We demonstrate the proposed approach on data from the Growing Up in Singapore Towards healthy Outcomes cohort study, based in Singapore. Posterior inference is obtained via a tailored MCMC algorithm, involving a nonparametric prior with mixed support. Our analysis has identified potential key pathways in obese children that allow for the exploration of possible molecular mechanisms associated with childhood obesity.
Assuntos
Obesidade Infantil , Adulto , Humanos , Criança , Obesidade Infantil/epidemiologia , Estudos de Coortes , Teorema de Bayes , Fatores de Risco , BiomarcadoresRESUMO
BACKGROUND: The handling of missing data is a challenge for inference and regression modelling. A particular challenge is dealing with missing predictor information, particularly when trying to build and make predictions from models for use in clinical practice. METHODS: We utilise a flexible Bayesian approach for handling missing predictor information in regression models. This provides practitioners with full posterior predictive distributions for both the missing predictor information (conditional on the observed predictors) and the outcome-of-interest. We apply this approach to a previously proposed counterfactual treatment selection model for type 2 diabetes second-line therapies. Our approach combines a regression model and a Dirichlet process mixture model (DPMM), where the former defines the treatment selection model, and the latter provides a flexible way to model the joint distribution of the predictors. RESULTS: We show that DPMMs can model complex relationships between predictor variables and can provide powerful means of fitting models to incomplete data (under missing-completely-at-random and missing-at-random assumptions). This framework ensures that the posterior distribution for the parameters and the conditional average treatment effect estimates automatically reflect the additional uncertainties associated with missing data due to the hierarchical model structure. We also demonstrate that in the presence of multiple missing predictors, the DPMM model can be used to explore which variable(s), if collected, could provide the most additional information about the likely outcome. CONCLUSIONS: When developing clinical prediction models, DPMMs offer a flexible way to model complex covariate structures and handle missing predictor information. DPMM-based counterfactual prediction models can also provide additional information to support clinical decision-making, including allowing predictions with appropriate uncertainty to be made for individuals with incomplete predictor data.
Assuntos
Diabetes Mellitus Tipo 2 , Humanos , Teorema de Bayes , Diabetes Mellitus Tipo 2/tratamento farmacológico , Tomada de Decisão Clínica , IncertezaRESUMO
In laboratory medicine, due to the lack of sample availability and resources, measurements of many quantities of interest are commonly collected over a few samples, making statistical inference particularly challenging. In this context, several hypotheses can be tested, and studies are not often powered accordingly. We present a semiparametric Bayesian approach to effectively test multiple hypotheses applied to an experiment that aims to identify cytokines involved in Crohn's disease (CD) infection that may be ongoing in multiple tissues. We assume that the positive correlation commonly observed between cytokines is caused by latent groups of effects, which in turn result from a common cause. These clusters are effectively modeled through a Dirichlet Process (DP) that is one of the most popular choices as nonparametric prior in Bayesian statistics and has been proven to be a powerful tool for model-based clustering. We use a spike-slab distribution as the base measure of the DP. The nonparametric part has been included in an additive model whose parametric component is a Bayesian hierarchical model. We include simulations that empirically demonstrate the effectiveness of the proposed testing procedure in settings that mimic our application's sample size and data structure. Our CD data analysis shows strong evidence of a cytokine gradient in the external intestinal tissue.
Assuntos
Teorema de Bayes , Doença de Crohn , Citocinas , Modelos Estatísticos , Citocinas/metabolismo , Doença de Crohn/metabolismo , Humanos , Biometria/métodos , Tamanho da AmostraRESUMO
We consider a constructive definition of the multivariate Pareto that factorizes the random vector into a radial component and an independent angular component. The former follows a univariate Pareto distribution, and the latter is defined on the surface of the positive orthant of the infinity norm unit hypercube. We propose a method for inferring the distribution of the angular component by identifying its support as the limit of the positive orthant of the unit p-norm spheres and introduce a projected gamma family of distributions defined through the normalization of a vector of independent random gammas to the space. This serves to construct a flexible family of distributions obtained as a Dirichlet process mixture of projected gammas. For model assessment, we discuss scoring methods appropriate to distributions on the unit hypercube. In particular, working with the energy score criterion, we develop a kernel metric that produces a proper scoring rule and presents a simulation study to compare different modeling choices using the proposed metric. Using our approach, we describe the dependence structure of extreme values in the integrated vapor transport (IVT), data describing the flow of atmospheric moisture along the coast of California. We find clear but heterogeneous geographical dependence.
RESUMO
Heritability analysis plays a central role in quantitative genetics to describe genetic contribution to human complex traits and prioritize downstream analyses under large-scale phenotypes. Existing works largely focus on modeling single phenotype and currently available multivariate phenotypic methods often suffer from scaling and interpretation. In this article, motivated by understanding how genetic underpinning impacts human brain variation, we develop an integrative Bayesian heritability analysis to jointly estimate heritabilities for high-dimensional neuroimaging traits. To induce sparsity and incorporate brain anatomical configuration, we impose hierarchical selection among both regional and local measurements based on brain structural network and voxel dependence. We also use a nonparametric Dirichlet process mixture model to realize grouping among single nucleotide polymorphism-associated phenotypic variations, providing biological plausibility. Through extensive simulations, we show the proposed method outperforms existing ones in heritability estimation and heritable traits selection under various scenarios. We finally apply the method to two large-scale imaging genetics datasets: the Alzheimer's Disease Neuroimaging Initiative and United Kingdom Biobank and show biologically meaningful results.
Assuntos
Doença de Alzheimer , Neuroimagem , Doença de Alzheimer/diagnóstico por imagem , Doença de Alzheimer/genética , Teorema de Bayes , Humanos , Neuroimagem/métodos , Fenótipo , Polimorfismo de Nucleotídeo ÚnicoRESUMO
Across several medical fields, developing an approach for disease classification is an important challenge. The usual procedure is to fit a model for the longitudinal response in the healthy population, a different model for the longitudinal response in the diseased population, and then apply Bayes' theorem to obtain disease probabilities given the responses. Unfortunately, when substantial heterogeneity exists within each population, this type of Bayes classification may perform poorly. In this article, we develop a new approach by fitting a Bayesian nonparametric model for the joint outcome of disease status and longitudinal response, and then we perform classification through the clustering induced by the Dirichlet process. This approach is highly flexible and allows for multiple subpopulations of healthy, diseased, and possibly mixed membership. In addition, we introduce an Markov chain Monte Carlo sampling scheme that facilitates the assessment of the inference and prediction capabilities of our model. Finally, we demonstrate the method by predicting pregnancy outcomes using longitudinal profiles on the human chorionic gonadotropin beta subunit hormone levels in a sample of Chilean women being treated with assisted reproductive therapy.
Assuntos
Teorema de Bayes , Feminino , Humanos , Cadeias de Markov , Método de Monte Carlo , Análise por Conglomerados , ProbabilidadeRESUMO
The study of racial/ethnic inequalities in health is important to reduce the uneven burden of disease. In the case of colorectal cancer (CRC), disparities in survival among non-Hispanic Whites and Blacks are well documented, and mechanisms leading to these disparities need to be studied formally. It has also been established that body mass index (BMI) is a risk factor for developing CRC, and recent literature shows BMI at diagnosis of CRC is associated with survival. Since BMI varies by racial/ethnic group, a question that arises is whether differences in BMI are partially responsible for observed racial/ethnic disparities in survival for CRC patients. This article presents new methodology to quantify the impact of the hypothetical intervention that matches the BMI distribution in the Black population to a potentially complex distributional form observed in the White population on racial/ethnic disparities in survival. Our density mediation approach can be utilized to estimate natural direct and indirect effects in the general causal mediation setting under stronger assumptions. We perform a simulation study that shows our proposed Bayesian density regression approach performs as well as or better than current methodology allowing for a shift in the mean of the distribution only, and that standard practice of categorizing BMI leads to large biases when BMI is a mediator variable. When applied to motivating data from the Cancer Care Outcomes Research and Surveillance (CanCORS) Consortium, our approach suggests the proposed intervention is potentially beneficial for elderly and low-income Black patients, yet harmful for young or high-income Black populations.
Assuntos
Neoplasias Colorretais , Idoso , Teorema de Bayes , Índice de Massa Corporal , Neoplasias Colorretais/diagnóstico , Humanos , Fatores Socioeconômicos , Estados UnidosRESUMO
This work presents a population genetic model of evolution, which includes haploid selection, mutation, recombination, and drift. The mutation-selection equilibrium can be expressed exactly in closed form for arbitrary fitness functions without resorting to diffusion approximations. Tractability is achieved by generating new offspring using n-parent rather than 2-parent recombination. While this enforces linkage equilibrium among offspring, it allows analysis of the whole population under linkage disequilibrium. We derive a general and exact relationship between fitness fluctuations and response to selection. Our assumptions allow analytical calculation of the stationary distribution of the model for a variety of non-trivial fitness functions. These results allow us to speak to genetic architecture, i.e., what stationary distributions result from different fitness functions. This paper presents methods for exactly deriving stationary states for finite and infinite populations. This method can be applied to many fitness functions, and we give exact calculations for four of these. These results allow us to investigate metastability, tradeoffs between fitness functions, and even consider error-correcting codes.
Assuntos
Modelos Genéticos , Recombinação Genética , Mutação , Desequilíbrio de Ligação , Seleção GenéticaRESUMO
Recent advancements in miniaturized fluorescence microscopy have made it possible to investigate neuronal responses to external stimuli in awake behaving animals through the analysis of intracellular calcium signals. An ongoing challenge is deconvolving the temporal signals to extract the spike trains from the noisy calcium signals' time series. In this article, we propose a nested Bayesian finite mixture specification that allows the estimation of spiking activity and, simultaneously, reconstructing the distributions of the calcium transient spikes' amplitudes under different experimental conditions. The proposed model leverages two nested layers of random discrete mixture priors to borrow information between experiments and discover similarities in the distributional patterns of neuronal responses to different stimuli. Furthermore, the spikes' intensity values are also clustered within and between experimental conditions to determine the existence of common (recurring) response amplitudes. Simulation studies and the analysis of a dataset from the Allen Brain Observatory show the effectiveness of the method in clustering and detecting neuronal activities.
Assuntos
Encéfalo , Cálcio , Animais , Teorema de Bayes , Simulação por Computador , Análise por ConglomeradosRESUMO
An important experimental design problem in early-stage drug discovery is how to prioritize available compounds for testing when very little is known about the target protein. Informer-based ranking (IBR) methods address the prioritization problem when the compounds have provided bioactivity data on other potentially relevant targets. An IBR method selects an informer set of compounds, and then prioritizes the remaining compounds on the basis of new bioactivity experiments performed with the informer set on the target. We formalize the problem as a two-stage decision problem and introduce the Bayes Optimal Informer SEt (BOISE) method for its solution. BOISE leverages a flexible model of the initial bioactivity data, a relevant loss function, and effective computational schemes to resolve the two-step design problem. We evaluate BOISE and compare it to other IBR strategies in two retrospective studies, one on protein-kinase inhibition and the other on anticancer drug sensitivity. In both empirical settings BOISE exhibits better predictive performance than available methods. It also behaves well with missing data, where methods that use matrix completion show worse predictive performance.
Assuntos
Descoberta de Drogas , Proteínas , Teorema de Bayes , Estudos Retrospectivos , Descoberta de Drogas/métodosRESUMO
There is a growing interest in current medical research to develop personalized treatments using a molecular-based approach. The broad goal is to implement a more precise and targeted decision-making process, relative to traditional treatments based primarily on clinical diagnoses. Specifically, we consider patients affected by Acute Myeloid Leukemia (AML), an hematological cancer characterized by uncontrolled proliferation of hematopoietic stem cells in the bone marrow. Because AML responds poorly to chemotherapeutic treatments, the development of targeted therapies is essential to improve patients' prospects. In particular, the dataset we analyze contains the levels of proteins involved in cell cycle regulation and linked to the progression of the disease. We evaluate treatment effects within a causal framework represented by a Directed Acyclic Graph (DAG) model, whose vertices are the protein levels in the network. A major obstacle in implementing the above program is represented by individual heterogeneity. We address this issue through a Dirichlet Process (DP) mixture of Gaussian DAG-models where both the graphical structure as well as the allied model parameters are regarded as uncertain. Our procedure determines a clustering structure of the units reflecting the underlying heterogeneity, and produces subject-specific estimates of causal effects based on Bayesian Model Averaging (BMA). With reference to the AML dataset, we identify different effects of protein regulation among individuals; moreover, our method clusters patients into groups that exhibit only mild similarities with traditional categories based on morphological features.
Assuntos
Leucemia Mieloide Aguda , Humanos , Teorema de Bayes , Causalidade , Leucemia Mieloide Aguda/etiologia , Leucemia Mieloide Aguda/genética , Distribuição NormalRESUMO
Existing methods for estimating the mean outcome under a given sequential treatment rule often rely on intention-to-treat analyses, which estimate the effect of following a certain treatment rule regardless of compliance behavior of patients. There are two major concerns with intention-to-treat analyses: (1) the estimated effects are often biased toward the null effect; (2) the results are not generalizable and reproducible due to the potentially differential compliance behavior. These are particularly problematic in settings with a high level of non-compliance, such as substance use disorder studies. Our work is motivated by the Adaptive Treatment for Alcohol and Cocaine Dependence study (ENGAGE), which is a multi-stage trial that aimed to construct optimal treatment strategies to engage patients in therapy. Due to the relatively low level of compliance in this trial, intention-to-treat analyses essentially estimate the effect of being randomized to a certain treatment, instead of the actual effect of the treatment. We obviate this challenge by defining the target parameter as the mean outcome under a dynamic treatment regime conditional on a potential compliance stratum. We propose a flexible non-parametric Bayesian approach based on principal stratification, which consists of a Gaussian copula model for the joint distribution of the potential compliances, and a Dirichlet process mixture model for the treatment sequence specific outcomes. We conduct extensive simulation studies which highlight the utility of our approach in the context of multi-stage randomized trials. We show robustness of our estimator to non-linear and non-Gaussian settings as well.
Assuntos
Tomada de Decisões , Cooperação do Paciente , Humanos , Teorema de Bayes , Simulação por Computador , Resultado do TratamentoRESUMO
Functional brain connectivity analysis is an increasingly important technique in neuroscience, psychiatry, and autism research. Functional connectivity can be measured by considering co-activation of brain regions in resting-state functional magnetic resonance imaging (rs-fMRI). We propose a novel Bayesian model to detect differential connections in cross-correlated functional connectivity between region of interest (ROI) pairs. The proposed sparse clustered neighborhood model induces a lower-dimensional sparsity and clustering based on a nonparametric Bayesian approach to model sparse differentially connected ROI pairs. Second, it induces a structured dependence model for modeling potential dependence among ROI pairs. We demonstrate Bayesian inference and performance of the proposed model in simulation studies and compare with a standard model. We utilize the proposed model to contrast functional connectivities between participants with autism spectrum disorder and neurotypical participants using cross-correlated rs-fMRI data from four sites of the Autism Brain Image Data Exchange.