Search | VHL Regional Portal

1.

Unveiling the temporal variability of gas transfer coefficients of streams based on high-frequency dissolved oxygen measurements.

Wang, Fang; Tian, Siyu; Yan, Weijin.

Environ Res ; 262(Pt 2): 119939, 2024 Sep 05.

Article in English | MEDLINE | ID: mdl-39243842

ABSTRACT

Greenhouse gas (GHG) emissions from streams and rivers are important sources of global GHG emissions. As a crucial parameter for estimating GHG emissions, the gas transfer coefficient (expressed as K600 at water temperature of 20 °C) has uncertainties. This study proposed a new approach for estimating K600 based on high-frequency dissolved oxygen (DO) data and an ecosystem metabolism model. This approach combines the numerical solution method with the Markov Chain Monte Carlo analysis. This study was conducted in the Chaohu Lake watershed in Southeastern China, using high-frequency data collected from six streams from 2021 to 2023. This study found: (1) The numerical solution of K600 demonstrated distinct dynamic variability for all streams, ranging from 0 to 111.39 cm h-1 (2) Streams with higher discharge (>10 m3 s-1) exhibited significant seasonal differences in K600 values. The monthly average discharge and water temperature were the two factors that determined the variation in K600 values. (3) K600 was a major source of uncertainty in CO2 emission fluxes, with a relative contribution of 53.72%. An integrated K600 model of riverine gas exchange was developed at the watershed scale and validated using the observed DO change. Our study stressed that K600 dynamics can better represent areal change to reduce uncertainty in estimating GHG emissions.

2.

Bayesian transformation model for spatial partly interval-censored data.

Qiu, Mingyue; Hu, Tao.

J Appl Stat ; 51(11): 2139-2156, 2024.

Article in English | MEDLINE | ID: mdl-39157272

ABSTRACT

The transformation model with partly interval-censored data offers a highly flexible modeling framework that can simultaneously support multiple common survival models and a wide variety of censored data types. However, the real data may contain unexplained heterogeneity that cannot be entirely explained by covariates and may be brought on by a variety of unmeasured regional characteristics. Due to this, we introduce the conditionally autoregressive prior into the transformation model with partly interval-censored data and take the spatial frailty into account. An efficient Markov chain Monte Carlo method is proposed to handle the posterior sampling and model inference. The approach is simple to use and does not include any challenging Metropolis steps owing to four-stage data augmentation. Through several simulations, the suggested method's empirical performance is assessed and then the method is used in a leukemia study.

3.

Bayesian Modeling for Nonstationary Spatial Point Process via Spatial Deformations.

Gamerman, Dani; Quintana, Marcel de Souza Borges; Alves, Mariane Branco.

Entropy (Basel) ; 26(8)2024 Aug 11.

Article in English | MEDLINE | ID: mdl-39202148

ABSTRACT

Many techniques have been proposed to model space-varying observation processes with a nonstationary spatial covariance structure and/or anisotropy, usually on a geostatistical framework. Nevertheless, there is an increasing interest in point process applications, and methodologies that take nonstationarity into account are welcomed. In this sense, this work proposes an extension of a class of spatial Cox process using spatial deformation. The proposed method enables the deformation behavior to be data-driven, through a multivariate latent Gaussian process. Inference leads to intractable posterior distributions that are approximated via MCMC. The convergence of algorithms based on the Metropolis-Hastings steps proved to be slow, and the computational efficiency of the Bayesian updating scheme was improved by adopting Hamiltonian Monte Carlo (HMC) methods. Our proposal was also compared against an alternative anisotropic formulation. Studies based on synthetic data provided empirical evidence of the benefit brought by the adoption of nonstationarity through our anisotropic structure. A real data application was conducted on the spatial spread of the Spodoptera frugiperda pest in a corn-producing agricultural area in southern Brazil. Once again, the proposed method demonstrated its benefit over alternatives.

4.

A General Bayesian Functional Spatial Partitioning Method for Multiple Region Discovery Applied to Prostate Cancer MRI.

Masotti, Maria; Zhang, Lin; Metzger, Gregory J; Koopmeiners, Joseph S.

Bayesian Anal ; 19(2): 623-647, 2024 Jun.

Article in English | MEDLINE | ID: mdl-39183822

ABSTRACT

Current protocols to estimate the number, size, and location of cancerous lesions in the prostate using multiparametric magnetic resonance imaging (mpMRI) are highly dependent on reader experience and expertise. Automatic voxel-wise cancer classifiers do not directly provide estimates of number, location, and size of cancerous lesions that are clinically important. Existing spatial partitioning methods estimate linear or piecewise-linear boundaries separating regions of local stationarity in spatially registered data and are inadequate for the application of lesion detection. Frequentist segmentation and clustering methods often require pre-specification of the number of clusters and do not quantify uncertainty. Previously, we developed a novel Bayesian functional spatial partitioning method to estimate the boundary surrounding a single cancerous lesion using data derived from mpMRI. We propose a Bayesian functional spatial partitioning method for multiple lesion detection with an unknown number of lesions. Our method utilizes functional estimation to model the smooth boundary curves surrounding each cancerous lesion. In a Reversible Jump Markov Chain Monte Carlo (RJ-MCMC) framework, we develop novel jump steps to jointly estimate and quantify uncertainty in the number of lesions, their boundaries, and the spatial parameters in each lesion. Through simulation we show that our method is robust to the shape of the lesions, number of lesions, and region-specific spatial processes. We illustrate our method through the detection of prostate cancer lesions using MRI.

5.

Identification of illicit discharges in sewer networks by an SWMM-Bayesian coupled approach.

Yang, Liyuan; Huang, Biao; Liu, Jiachun.

Water Sci Technol ; 90(3): 951-967, 2024 Aug.

Article in English | MEDLINE | ID: mdl-39141044

ABSTRACT

Illicit discharges into sewer systems are a widespread concern within China's urban drainage management. They can result in unforeseen environmental contamination and deterioration in the performance of wastewater treatment plants. Consequently, pinpointing the origin of unauthorized discharges in the sewer network is crucial. This study aims to evaluate an integrative method that employs numerical modeling and statistical analysis to determine the locations and characteristics of illicit discharges. The Storm Water Management Model (SWMM) was employed to track water quality variations within the sewer network and examine the concentration profiles of exogenous pollutants under a range of scenarios. The identification technique employed Bayesian inference fused with the Markov chain Monte Carlo sampling method, enabling the estimation of probability distributions for the position of the suspected source, the discharge magnitude, and the commencement of the event. Specifically, the cases involving continuous release and multiple sources were examined. For single-point source identification, where all three parameters are unknown, concentration profiles from two monitoring sites in the path of pollutant transport and dispersion are necessary and sufficient to characterize the pollution source. For the identification of multiple sources, the proposed SWMM-Bayesian strategy with improved sampling is applied, which significantly improves the accuracy.

Subject(s)

Bayes Theorem , Sewage , Models, Theoretical , Environmental Monitoring/methods , China , Drainage, Sanitary , Waste Disposal, Fluid/methods , Water Pollutants, Chemical/analysis

6.

Polymorphism-Aware Models in RevBayes: Species Trees, Disentangling Balancing Selection, and GC-Biased Gene Conversion.

Braichenko, Svitlana; Borges, Rui; Kosiol, Carolin.

Mol Biol Evol ; 41(7)2024 Jul 03.

Article in English | MEDLINE | ID: mdl-38980178

ABSTRACT

The role of balancing selection is a long-standing evolutionary puzzle. Balancing selection is a crucial evolutionary process that maintains genetic variation (polymorphism) over extended periods of time; however, detecting it poses a significant challenge. Building upon the Polymorphism-aware phylogenetic Models (PoMos) framework rooted in the Moran model, we introduce a PoMoBalance model. This novel approach is designed to disentangle the interplay of mutation, genetic drift, and directional selection (GC-biased gene conversion), along with the previously unexplored balancing selection pressures on ultra-long timescales comparable with species divergence times by analyzing multi-individual genomic and phylogenetic divergence data. Implemented in the open-source RevBayes Bayesian framework, PoMoBalance offers a versatile tool for inferring phylogenetic trees as well as quantifying various selective pressures. The novel aspect of our approach in studying balancing selection lies in polymorphism-aware phylogenetic models' ability to account for ancestral polymorphisms and incorporate parameters that measure frequency-dependent selection, allowing us to determine the strength of the effect and exact frequencies under selection. We implemented validation tests and assessed the model on the data simulated with SLiM and a custom Moran model simulator. Real sequence analysis of Drosophila populations reveals insights into the evolutionary dynamics of regions subject to frequency-dependent balancing selection, particularly in the context of sex-limited color dimorphism in Drosophila erecta.

Subject(s)

Gene Conversion , Models, Genetic , Phylogeny , Polymorphism, Genetic , Selection, Genetic , Animals , Bayes Theorem , Evolution, Molecular , Male , Female

7.

Point estimation and related classification problems for several Lindley populations with application using COVID-19 data.

Bal, Debasmita; Tripathy, Manas Ranjan; Kumar, Somesh.

J Appl Stat ; 51(10): 1976-2006, 2024.

Article in English | MEDLINE | ID: mdl-39071252

ABSTRACT

The problems of point estimation and classification under the assumption that the training data follow a Lindley distribution are considered. Bayes estimators are derived for the parameter of the Lindley distribution applying the Markov chain Monte Carlo (MCMC), and Tierney and Kadane's [Tierney and Kadane, Accurate approximations for posterior moments and marginal densities, J. Amer. Statist. Assoc. 81 (1986), pp. 82-86] methods. In the sequel, we prove that the Bayes estimators using Tierney and Kadane's approximation and Lindley's approximation both converge to the maximum likelihood estimator (MLE), as n â ∞ , where n is the sample size. The performances of all the proposed estimators are compared with some of the existing ones using bias and mean squared error (MSE), numerically. It has been noticed from our simulation study that the proposed estimators perform better than some of the existing ones. Applying these estimators, we construct several plug-in type classification rules and a rule that uses the likelihood accordance function. The performances of each of the rules are numerically evaluated using the expected probability of misclassification (EPM). Two real-life examples related to COVID-19 disease are considered for illustrative purposes.

8.

Flexible Bayesian estimation of incubation times.

Gressani, Oswaldo; Torneri, Andrea; Hens, Niel; Faes, Christel.

Am J Epidemiol ; 2024 Jul 10.

Article in English | MEDLINE | ID: mdl-38988237

ABSTRACT

The incubation period is of paramount importance in infectious disease epidemiology as it informs about the transmission potential of a pathogenic organism and helps to plan public health strategies to keep an epidemic outbreak under control. Estimation of the incubation period distribution from reported exposure times and symptom onset times is challenging as the underlying data is coarse. We develop a new Bayesian methodology using Laplacian-P-splines that provides a semi-parametric estimation of the incubation density based on a Langevinized Gibbs sampler. A finite mixture density smoother informs a set of parametric distributions via moment matching and an information criterion arbitrates between competing candidates. Algorithms underlying our method find a natural nest within the EpiLPS package, which has been extended to cover estimation of incubation times. Various simulation scenarios accounting for different levels of data coarseness are considered with encouraging results. Applications to real data on COVID-19, MERS and Mpox reveal results that are in alignment with what has been obtained in recent studies. The proposed flexible approach is an interesting alternative to classic Bayesian parametric methods for estimation of the incubation distribution.

9.

A bayesian spatio-temporal dynamic analysis of food security in Africa.

Bofa, Adusei; Zewotir, Temesgen.

Sci Rep ; 14(1): 15132, 2024 07 02.

Article in English | MEDLINE | ID: mdl-38956274

ABSTRACT

Exploring the factors influencing Food Security and Nutrition (FSN) and understanding its dynamics is crucial for planning and management. This understanding plays a pivotal role in supporting Africa's food security efforts to achieve various Sustainable Development Goals (SDGs). Utilizing Principal Component Analysis (PCA) on data from the FAO website, spanning from 2000 to 2019, informative components are derived for dynamic spatio-temporal modeling of Africa's FSN Given the dynamic and evolving nature of the factors impacting FSN, despite numerous efforts to understand and mitigate food insecurity, existing models often fail to capture this dynamic nature. This study employs a Bayesian dynamic spatio-temporal approach to explore the interconnected dynamics of food security and its components in Africa. The results reveal a consistent pattern of elevated FSN levels, showcasing notable stability in the initial and middle-to-late stages, followed by a significant acceleration in the late stage of the study period. The Democratic Republic of Congo and Ethiopia exhibited particularly noteworthy high levels of FSN dynamicity. In particular, child care factors and undernourishment factors showed significant dynamicity on FSN. This insight suggests establishing regional task forces or forums for coordinated responses to FSN challenges based on dynamicity patterns to prevent or mitigate the impact of potential food security crises.

Subject(s)

Bayes Theorem , Food Security , Spatio-Temporal Analysis , Humans , Africa , Food Supply , Principal Component Analysis , Nutritional Status

10.

Spatial heterogeneity analysis for the transmission of syphilis disease in China via a data-validated reaction-diffusion model.

Wu, Peng; Wang, Xiunan; Wang, Hao.

Math Biosci ; 375: 109243, 2024 Sep.

Article in English | MEDLINE | ID: mdl-38964670

ABSTRACT

Based on the distinctive spatial diffusion characteristics observed in syphilis transmission patterns, this paper introduces a novel reaction-diffusion model for syphilis disease dynamics, incorporating general incidence functions within a heterogeneous environment. We derive the basic reproduction number essential for threshold dynamics and investigate the uniform persistence of the model. We validate the model and estimate its parameters by employing the multi-objective Markov Chain Monte Carlo (MCMC) method, using real syphilis data from the years 2004 to 2018 in China. Furthermore, we explore the impact of spatial heterogeneity and intervention measures on syphilis transmission. Our findings reveal several key insights: (1) In addition to the original high-incidence areas of syphilis, Xinjiang, Guizhou, Hunan and Northeast China have also emerged as high-incidence regions for syphilis in China. (2) The latent syphilis cases represent the highest proportion of newly reported cases, highlighting the critical importance of considering their role in transmission dynamics to avoid underestimation of syphilis outbreaks. (3) Neglecting spatial heterogeneity results in an underestimation of disease prevalence and the number of syphilis-infected individuals, undermining effective disease prevention and control strategies. (4) The initial conditions have minimal impact on the long-term spatial distribution of syphilis-infected individuals in scenarios of varying diffusion rates. This study underscores the significance of spatial dynamics and intervention measures in assessing and managing syphilis transmission, which offers insights for public health policymakers.

Subject(s)

Syphilis , Syphilis/transmission , Syphilis/epidemiology , Humans , China/epidemiology , Basic Reproduction Number/statistics & numerical data , Incidence , Markov Chains , Epidemiological Models , Prevalence , Monte Carlo Method

11.

Extended Regression Analysis for Debye-Einstein Models Describing Low Temperature Heat Capacity Data of Solids.

Gamsjäger, Ernst; Wiessner, Manfred.

Entropy (Basel) ; 26(6)2024 May 26.

Article in English | MEDLINE | ID: mdl-38920461

ABSTRACT

Heat capacity data of many crystalline solids can be described in a physically sound manner by Debye-Einstein integrals in the temperature range from 0K to 300K. The parameters of the Debye-Einstein approach are either obtained by a Markov chain Monte Carlo (MCMC) global optimization method or by a Levenberg-Marquardt (LM) local optimization routine. In the case of the MCMC approach the model parameters and the coefficients of a function describing the residuals of the measurement points are simultaneously optimized. Thereby, the Bayesian credible interval for the heat capacity function is obtained. Although both regression tools (LM and MCMC) are completely different approaches, not only the values of the Debye-Einstein parameters, but also their standard errors appear to be similar. The calculated model parameters and their associated standard errors are then used to derive the enthalpy, entropy and Gibbs energy as functions of temperature. By direct insertion of the MCMC parameters of all 4·105 computer runs the distributions of the integral quantities enthalpy, entropy and Gibbs energy are determined.

12.

A collaborative study on the precision of the Markov chain Monte Carlo algorithms used for DNA profile interpretation.

Riman, Sarah; Bright, Jo-Anne; Huffman, Kaitlin; Moreno, Lilliana I; Liu, Sicen; Sathya, Asmitha; Vallone, Peter M.

Forensic Sci Int Genet ; 72: 103088, 2024 Sep.

Article in English | MEDLINE | ID: mdl-38908322

ABSTRACT

Several fully continuous probabilistic genotyping software (PGS) use Markov chain Monte Carlo algorithms (MCMC) to assign weights to different proposed genotype combinations at a locus. Replicate interpretations of the same profile in these software are expected not to produce identical weights and likelihood ratio (LR) values due to the Monte Carlo aspect. This paper reports a detailed precision study under reproducibility conditions conducted as a collaborative exercise across the National Institute of Standards and Technology (NIST), Federal Bureau of Investigation (FBI), and Institute of Environmental Science and Research (ESR). Replicate interpretations generated across the three laboratories used the same input files, software version, and settings but different random number seed and different computers. This work demonstrates that using different computers to analyze replicate interpretations does not contribute to any variations in LR values. The study quantifies the magnitude of differences in the assigned LRs that is only due to run-to-run MCMC variability and addresses the potential explanations for the observed differences.

Subject(s)

Algorithms , DNA Fingerprinting , Markov Chains , Monte Carlo Method , Humans , Likelihood Functions , Reproducibility of Results , Software , Genotype

13.

Exploration of the MCMC Wald test with linear regression.

Woller, Michael P; Enders, Craig K.

Behav Res Methods ; 56(7): 7391-7409, 2024 10.

Article in English | MEDLINE | ID: mdl-38886305

ABSTRACT

Recently, Asparouhov and Muthén Structural Equation Modeling: A Multidisciplinary Journal, 28, 1-14, (2021a, 2021b) proposed a variant of the Wald test that uses Markov chain Monte Carlo machinery to generate a chi-square test statistic for frequentist inference. Because the test's composition does not rely on analytic expressions for sampling variation and covariation, it potentially provides a way to get honest significance tests in cases where the likelihood-based test statistic's assumptions break down (e.g., in small samples). The goal of this study is to use simulation to compare the new MCM Wald test to its maximum likelihood counterparts, with respect to both their type I error rate and power. Our simulation examined the test statistics across different levels of sample size, effect size, and degrees of freedom (test complexity). An additional goal was to assess the robustness of the MCMC Wald test with nonnormal data. The simulation results uniformly demonstrated that the MCMC Wald test was superior to the maximum likelihood test statistic, especially with small samples (e.g., sample sizes less than 150) and complex models (e.g., models with five or more predictors). This conclusion held for nonnormal data as well. Lastly, we provide a brief application to a real data example.

Subject(s)

Markov Chains , Monte Carlo Method , Humans , Likelihood Functions , Linear Models , Computer Simulation , Models, Statistical , Data Interpretation, Statistical , Sample Size

14.

Family of bivariate distributions on the unit square: theoretical properties and applications.

Vila, Roberto; Balakrishnan, Narayanaswamy; Saulo, Helton; Zörnig, Peter.

J Appl Stat ; 51(9): 1729-1755, 2024.

Article in English | MEDLINE | ID: mdl-38933136

ABSTRACT

We introduce the bivariate unit-log-symmetric model based on the bivariate log-symmetric distribution (BLS) defined in Vila et al. [25] as a flexible family of bivariate distributions over the unit square. We then study its mathematical properties such as stochastic representations, quantiles, conditional distributions, independence of the marginal distributions and marginal moments. Maximum likelihood estimation method is discussed and examined through Monte Carlo simulation. Finally, the proposed model is used to analyze some soccer data sets.

15.

A Bayesian multilevel model for populations of networks using exponential-family random graphs.

Lehmann, Brieuc; White, Simon.

Stat Comput ; 34(4): 136, 2024.

Article in English | MEDLINE | ID: mdl-38911222

ABSTRACT

The collection of data on populations of networks is becoming increasingly common, where each data point can be seen as a realisation of a network-valued random variable. Moreover, each data point may be accompanied by some additional covariate information and one may be interested in assessing the effect of these covariates on network structure within the population. A canonical example is that of brain networks: a typical neuroimaging study collects one or more brain scans across multiple individuals, each of which can be modelled as a network with nodes corresponding to distinct brain regions and edges corresponding to structural or functional connections between these regions. Most statistical network models, however, were originally proposed to describe a single underlying relational structure, although recent years have seen a drive to extend these models to populations of networks. Here, we describe a model for when the outcome of interest is a network-valued random variable whose distribution is given by an exponential random graph model. To perform inference, we implement an exchange-within-Gibbs MCMC algorithm that generates samples from the doubly-intractable posterior. To illustrate this approach, we use it to assess population-level variations in networks derived from fMRI scans, enabling the inference of age- and intelligence-related differences in the topological structure of the brain's functional connectivity. Supplementary Information: The online version contains supplementary material available at 10.1007/s11222-024-10446-0.

16.

A Bayesian approach for exploring person × environment interaction within the environmental sensitivity meta-framework.

Lionetti, Francesca; Calcagnì, Antonio; D'Urso, Giulio; Spinelli, Maria; Fasolo, Mirco; Pluess, Michael; Pastore, Massimiliano.

J Child Psychol Psychiatry ; 2024 May 03.

Article in English | MEDLINE | ID: mdl-38698763

ABSTRACT

BACKGROUND: For investigating the individual-environment interplay and individual differences in response to environmental exposures as captured by models of environmental sensitivity including Diathesis-stress, Differential Susceptibility, and Vantage Sensitivity, over the last few years, a series of statistical guidelines have been proposed. However, available solutions suffer of computational problems especially relevant when sample size is not sufficiently large, a common condition in observational and clinical studies. METHOD: In the current contribution, we propose a Bayesian solution for estimating interaction parameters via Monte Carlo Markov Chains (MCMC), adapting Widaman et al. (Psychological Methods, 17, 2012, 615) Nonlinear Least Squares (NLS) approach. RESULTS: Findings from an applied exemplification and a simulation study showed that with relatively big samples both MCMC and NLS estimates converged on the same results. Conversely, MCMC clearly outperformed NLS, resolving estimation problems and providing more accurate estimates, particularly with small samples and greater residual variance. CONCLUSIONS: As the body of research exploring the interplay between individual and environmental variables grows, enabling predictions regarding the form of interaction and the extent of effects, the Bayesian approach could emerge as a feasible and readily applicable solution to numerous computational challenges inherent in existing frequentist methods. This approach holds promise for enhancing the trustworthiness of research outcomes, thereby impacting clinical and applied understanding.

17.

A hierarchical signal detection model with unequal variance for binary responses.

Lages, Martin.

Psychon Bull Rev ; 2024 May 28.

Article in English | MEDLINE | ID: mdl-38806791

ABSTRACT

Gaussian signal detection models with equal variance are commonly used in simple yes-no detection and discrimination tasks whereas more flexible models with unequal variance require additional information. Here, a hierarchical Bayesian model with equal variance is extended to an unequal-variance model by exploiting variability of hit and false-alarm rates in a random sample of participants. This hierarchical model is investigated analytically, in simulations and in applications to existing data sets. The results suggest that signal variance and other parameters can be accurately estimated if plausible assumptions are met. It is concluded that the model provides a promising alternative to the ubiquitous equal-variance model for binary data.

18.

How Trustworthy Is Your Tree? Bayesian Phylogenetic Effective Sample Size Through the Lens of Monte Carlo Error.

Magee, Andrew; Karcher, Michael; Matsen, Frederick A; Minin, Volodymyr M.

Bayesian Anal ; 19(2): 565-593, 2024 Jun.

Article in English | MEDLINE | ID: mdl-38665694

ABSTRACT

Bayesian inference is a popular and widely-used approach to infer phylogenies (evolutionary trees). However, despite decades of widespread application, it remains difficult to judge how well a given Bayesian Markov chain Monte Carlo (MCMC) run explores the space of phylogenetic trees. In this paper, we investigate the Monte Carlo error of phylogenies, focusing on high-dimensional summaries of the posterior distribution, including variability in estimated edge/branch (known in phylogenetics as "split") probabilities and tree probabilities, and variability in the estimated summary tree. Specifically, we ask if there is any measure of effective sample size (ESS) applicable to phylogenetic trees which is capable of capturing the Monte Carlo error of these three summary measures. We find that there are some ESS measures capable of capturing the error inherent in using MCMC samples to approximate the posterior distributions on phylogenies. We term these tree ESS measures, and identify a set of three which are useful in practice for assessing the Monte Carlo error. Lastly, we present visualization tools that can improve comparisons between multiple independent MCMC runs by accounting for the Monte Carlo error present in each chain. Our results indicate that common post-MCMC workflows are insufficient to capture the inherent Monte Carlo error of the tree, and highlight the need for both within-chain mixing and between-chain convergence assessments.

19.

Variable Selection in Bayesian Multiple Instance Regression using Shotgun Stochastic Search.

Park, Seongoh; Kim, Joungyoun; Wang, Xinlei; Lim, Johan.

Comput Stat Data Anal ; 1962024 Aug.

Article in English | MEDLINE | ID: mdl-38646418

ABSTRACT

In multiple instance learning (MIL), a bag represents a sample that has a set of instances, each of which is described by a vector of explanatory variables, but the entire bag only has one label/response. Though many methods for MIL have been developed to date, few have paid attention to interpretability of models and results. The proposed Bayesian regression model stands on two levels of hierarchy, which transparently show how explanatory variables explain and instances contribute to bag responses. Moreover, two selection problems are simultaneously addressed; the instance selection to find out the instances in each bag responsible for the bag response, and the variable selection to search for the important covariates. To explore a joint discrete space of indicator variables created for selection of both explanatory variables and instances, the shotgun stochastic search algorithm is modified to fit in the MIL context. Also, the proposed model offers a natural and rigorous way to quantify uncertainty in coefficient estimation and outcome prediction, which many modern MIL applications call for. The simulation study shows the proposed regression model can select variables and instances with high performance (AUC greater than 0.86), thus predicting responses well. The proposed method is applied to the musk data for prediction of binding strengths (labels) between molecules (bags) with different conformations (instances) and target receptors. It outperforms all existing methods, and can identify variables relevant in modeling responses.

20.

Bayesian compositional models for ordinal response.

Zhang, Li; Zhang, Xinyan; Leach, Justin M; Rahman, Akm F; Yi, Nengjun.

Stat Methods Med Res ; 33(6): 1043-1054, 2024 Jun.

Article in English | MEDLINE | ID: mdl-38654396

ABSTRACT

Ordinal response is commonly found in medicine, biology, and other fields. In many situations, the predictors for this ordinal response are compositional, which means that the sum of predictors for each sample is fixed. Examples of compositional data include the relative abundance of species in microbiome data and the relative frequency of nutrition concentrations. Moreover, the predictors that are strongly correlated tend to have similar influence on the response outcome. Conventional cumulative logistic regression models for ordinal responses ignore the fixed-sum constraint on predictors and their associated interrelationships, and thus are not appropriate for analyzing compositional predictors.To solve this problem, we proposed Bayesian Compositional Models for Ordinal Response to analyze the relationship between compositional data and an ordinal response with a structured regularized horseshoe prior for the compositional coefficients and a soft sum-to-zero restriction on coefficients through the prior distribution. The method was implemented with R package rstan using efficient Hamiltonian Monte Carlo algorithm. We performed simulations to compare the proposed approach and existing methods for ordinal responses. Results revealed that our proposed method outperformed the existing methods in terms of parameter estimation and prediction. We also applied the proposed method to a microbiome study HMP2Data, to find microorganisms linked to ordinal inflammatory bowel disease levels. To make this work reproducible, the code and data used in this paper are available at https://github.com/Li-Zhang28/BCO.

Subject(s)

Algorithms , Bayes Theorem , Microbiota , Models, Statistical , Monte Carlo Method , Humans , Inflammatory Bowel Diseases , Computer Simulation , Logistic Models

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL