Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 82
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Biostatistics ; 25(1): 220-236, 2023 12 15.
Artigo em Inglês | MEDLINE | ID: mdl-36610075

RESUMO

Trial-level surrogates are useful tools for improving the speed and cost effectiveness of trials but surrogates that have not been properly evaluated can cause misleading results. The evaluation procedure is often contextual and depends on the type of trial setting. There have been many proposed methods for trial-level surrogate evaluation, but none, to our knowledge, for the specific setting of platform studies. As platform studies are becoming more popular, methods for surrogate evaluation using them are needed. These studies also offer a rich data resource for surrogate evaluation that would not normally be possible. However, they also offer a set of statistical issues including heterogeneity of the study population, treatments, implementation, and even potentially the quality of the surrogate. We propose the use of a hierarchical Bayesian semiparametric model for the evaluation of potential surrogates using nonparametric priors for the distribution of true effects based on Dirichlet process mixtures. The motivation for this approach is to flexibly model relationships between the treatment effect on the surrogate and the treatment effect on the outcome and also to identify potential clusters with differential surrogate value in a data-driven manner so that treatment effects on the surrogate can be used to reliably predict treatment effects on the clinical outcome. In simulations, we find that our proposed method is superior to a simple, but fairly standard, hierarchical Bayesian method. We demonstrate how our method can be used in a simulated illustrative example (based on the ProBio trial), in which we are able to identify clusters where the surrogate is, and is not useful. We plan to apply our method to the ProBio trial, once it is completed.


Assuntos
Ensaios Clínicos como Assunto , Humanos , Teorema de Bayes , Resultado do Tratamento
2.
Biometrics ; 80(2)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38640436

RESUMO

Several epidemiological studies have provided evidence that long-term exposure to fine particulate matter (pm2.5) increases mortality rate. Furthermore, some population characteristics (e.g., age, race, and socioeconomic status) might play a crucial role in understanding vulnerability to air pollution. To inform policy, it is necessary to identify groups of the population that are more or less vulnerable to air pollution. In causal inference literature, the group average treatment effect (GATE) is a distinctive facet of the conditional average treatment effect. This widely employed metric serves to characterize the heterogeneity of a treatment effect based on some population characteristics. In this paper, we introduce a novel Confounder-Dependent Bayesian Mixture Model (CDBMM) to characterize causal effect heterogeneity. More specifically, our method leverages the flexibility of the dependent Dirichlet process to model the distribution of the potential outcomes conditionally to the covariates and the treatment levels, thus enabling us to: (i) identify heterogeneous and mutually exclusive population groups defined by similar GATEs in a data-driven way, and (ii) estimate and characterize the causal effects within each of the identified groups. Through simulations, we demonstrate the effectiveness of our method in uncovering key insights about treatment effects heterogeneity. We apply our method to claims data from Medicare enrollees in Texas. We found six mutually exclusive groups where the causal effects of pm2.5 on mortality rate are heterogeneous.


Assuntos
Poluentes Atmosféricos , Poluição do Ar , Estados Unidos/epidemiologia , Poluentes Atmosféricos/efeitos adversos , Poluentes Atmosféricos/análise , Teorema de Bayes , Medicare , Poluição do Ar/efeitos adversos , Poluição do Ar/análise , Material Particulado/efeitos adversos , Material Particulado/análise , Exposição Ambiental/efeitos adversos
3.
Biostatistics ; 23(1): 34-49, 2022 01 13.
Artigo em Inglês | MEDLINE | ID: mdl-32247284

RESUMO

We develop a Bayesian nonparametric (BNP) approach to evaluate the causal effect of treatment in a randomized trial where a nonterminal event may be censored by a terminal event, but not vice versa (i.e., semi-competing risks). Based on the idea of principal stratification, we define a novel estimand for the causal effect of treatment on the nonterminal event. We introduce identification assumptions, indexed by a sensitivity parameter, and show how to draw inference using our BNP approach. We conduct simulation studies and illustrate our methodology using data from a brain cancer trial. The R code implementing our model and algorithm is available for download at https://github.com/YanxunXu/BaySemiCompeting.


Assuntos
Algoritmos , Teorema de Bayes , Causalidade , Simulação por Computador , Humanos , Ensaios Clínicos Controlados Aleatórios como Assunto
4.
J Theor Biol ; 558: 111351, 2023 02 07.
Artigo em Inglês | MEDLINE | ID: mdl-36379231

RESUMO

Whether an outbreak of infectious disease is likely to grow or dissipate is determined through the time-varying reproduction number, Rt. Real-time or retrospective identification of changes in Rt following the imposition or relaxation of interventions can thus contribute important evidence about disease transmission dynamics which can inform policymaking. Here, we present a method for estimating shifts in Rt within a renewal model framework. Our method, which we call EpiCluster, is a Bayesian nonparametric model based on the Pitman-Yor process. We assume that Rt is piecewise-constant, and the incidence data and priors determine when or whether Rt should change and how many times it should do so throughout the series. We also introduce a prior which induces sparsity over the number of changepoints. Being Bayesian, our approach yields a measure of uncertainty in Rt and its changepoints. EpiCluster is fast, straightforward to use, and we demonstrate that it provides automated detection of rapid changes in transmission, either in real-time or retrospectively, for synthetic data series where the Rt profile is known. We illustrate the practical utility of our method by fitting it to case data of outbreaks of COVID-19 in Australia and Hong Kong, where it finds changepoints coinciding with the imposition of non-pharmaceutical interventions. Bayesian nonparametric methods, such as ours, allow the volume and complexity of the data to dictate the number of parameters required to approximate the process and should find wide application in epidemiology. This manuscript was submitted as part of a theme issue on "Modelling COVID-19 and Preparedness for Future Pandemics".


Assuntos
COVID-19 , Humanos , Teorema de Bayes , Estudos Retrospectivos , COVID-19/epidemiologia , Pandemias , Surtos de Doenças
5.
Biometrics ; 79(4): 3907-3915, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-37349969

RESUMO

In longitudinal studies, it is not uncommon to make multiple attempts to collect a measurement after baseline. Recording whether these attempts are successful provides useful information for the purposes of assessing missing data assumptions. This is because measurements from subjects who provide the data after numerous failed attempts may differ from those who provide the measurement after fewer attempts. Previous models for these designs were parametric and/or did not allow sensitivity analysis. For the former, there are always concerns about model misspecification and for the latter, sensitivity analysis is essential when conducting inference in the presence of missing data. Here, we propose a new approach which minimizes issues with model misspecification by using Bayesian nonparametrics for the observed data distribution. We also introduce a novel approach for identification and sensitivity analysis. We re-analyze the repeated attempts data from a clinical trial involving patients with severe mental illness and conduct simulations to better understand the properties of our approach.


Assuntos
Transtornos Mentais , Modelos Estatísticos , Humanos , Teorema de Bayes , Estudos Longitudinais
6.
Biometrics ; 79(4): 3252-3265, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-36718599

RESUMO

Analysis of observational studies increasingly confronts the challenge of determining which of a possibly high-dimensional set of available covariates are required to satisfy the assumption of ignorable treatment assignment for estimation of causal effects. We propose a Bayesian nonparametric approach that simultaneously (1) prioritizes inclusion of adjustment variables in accordance with existing principles of confounder selection; (2) estimates causal effects in a manner that permits complex relationships among confounders, exposures, and outcomes; and (3) provides causal estimates that account for uncertainty in the nature of confounding. The proposal relies on specification of multiple Bayesian additive regression trees models, linked together with a common prior distribution that accrues posterior selection probability to covariates on the basis of association with both the exposure and the outcome of interest. A set of extensive simulation studies demonstrates that the proposed method performs well relative to similarly-motivated methodologies in a variety of scenarios. We deploy the method to investigate the causal effect of emissions from coal-fired power plants on ambient air pollution concentrations, where the prospect of confounding due to local and regional meteorological factors introduces uncertainty around the confounding role of a high-dimensional set of measured variables. Ultimately, we show that the proposed method produces more efficient and more consistent results across adjacent years than alternative methods, lending strength to the evidence of the causal relationship between SO2 emissions and ambient particulate pollution.


Assuntos
Poluição do Ar , Teorema de Bayes , Poluição do Ar/efeitos adversos , Causalidade , Simulação por Computador , Incerteza
7.
Biometrics ; 79(3): 2171-2183, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-36065934

RESUMO

Wildlife monitoring for open populations can be performed using a number of different survey methods. Each survey method gives rise to a type of data and, in the last five decades, a large number of associated statistical models have been developed for analyzing these data. Although these models have been parameterized and fitted using different approaches, they have all been designed to either model the pattern with which individuals enter and/or exit the population, or to estimate the population size by accounting for the corresponding observation process, or both. However, existing approaches rely on a predefined model structure and complexity, either by assuming that parameters linked to the entry and exit pattern (EEP) are specific to sampling occasions, or by employing parametric curves to describe the EEP. Instead, we propose a novel Bayesian nonparametric framework for modeling EEPs based on the Polya tree (PT) prior for densities. Our Bayesian nonparametric approach avoids overfitting when inferring EEPs, while simultaneously allowing more flexibility than is possible using parametric curves. Finally, we introduce the replicate PT prior for defining classes of models for these data allowing us to impose constraints on the EEPs, when required. We demonstrate our new approach using capture-recapture, count, and ring-recovery data for two different case studies.


Assuntos
Animais Selvagens , Modelos Estatísticos , Humanos , Animais , Teorema de Bayes , Densidade Demográfica
8.
Biometrics ; 79(4): 3140-3152, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-36745745

RESUMO

We propose a doubly robust approach to characterizing treatment effect heterogeneity in observational studies. We develop a frequentist inferential procedure that utilizes posterior distributions for both the propensity score and outcome regression models to provide valid inference on the conditional average treatment effect even when high-dimensional or nonparametric models are used. We show that our approach leads to conservative inference in finite samples or under model misspecification and provides a consistent variance estimator when both models are correctly specified. In simulations, we illustrate the utility of these results in difficult settings such as high-dimensional covariate spaces or highly flexible models for the propensity score and outcome regression. Lastly, we analyze environmental exposure data from NHANES to identify how the effects of these exposures vary by subject-level characteristics.


Assuntos
Modelos Estatísticos , Heterogeneidade da Eficácia do Tratamento , Simulação por Computador , Inquéritos Nutricionais , Pontuação de Propensão
9.
Stat Med ; 42(3): 246-263, 2023 02 10.
Artigo em Inglês | MEDLINE | ID: mdl-36433639

RESUMO

This paper introduces a nonparametric regression approach for univariate and multivariate skewed responses using Bayesian additive regression trees (BART). Existing BART methods use ensembles of decision trees to model a mean function, and have become popular recently due to their high prediction accuracy and ease of use. The usual assumption of a univariate Gaussian error distribution, however, is restrictive in many biomedical applications. Motivated by an oral health study, we provide a useful extension of BART, the skewBART model, to address this problem. We then extend skewBART to allow for multivariate responses, with information shared across the decision trees associated with different responses within the same subject. The methodology accommodates within-subject association, and allows varying skewness parameters for the varying multivariate responses. We illustrate the benefits of our multivariate skewBART proposal over existing alternatives via simulation studies and application to the oral health dataset with bivariate highly skewed responses. Our methodology is implementable via the R package skewBART, available on GitHub.


Assuntos
Modelos Estatísticos , Humanos , Teorema de Bayes , Simulação por Computador
10.
Stat Med ; 42(1): 33-51, 2023 01 15.
Artigo em Inglês | MEDLINE | ID: mdl-36336460

RESUMO

In observational studies, causal inference relies on several key identifying assumptions. One identifiability condition is the positivity assumption, which requires the probability of treatment be bounded away from 0 and 1. That is, for every covariate combination, it should be possible to observe both treated and control subjects the covariate distributions should overlap between treatment arms. If the positivity assumption is violated, population-level causal inference necessarily involves some extrapolation. Ideally, a greater amount of uncertainty about the causal effect estimate should be reflected in such situations. With that goal in mind, we construct a Gaussian process model for estimating treatment effects in the presence of practical violations of positivity. Advantages of our method include minimal distributional assumptions, a cohesive model for estimating treatment effects, and more uncertainty associated with areas in the covariate space where there is less overlap. We assess the performance of our approach with respect to bias and efficiency using simulation studies. The method is then applied to a study of critically ill female patients to examine the effect of undergoing right heart catheterization.


Assuntos
Modelos Estatísticos , Humanos , Feminino , Probabilidade , Simulação por Computador , Viés
11.
BMC Genomics ; 23(1): 599, 2022 Aug 17.
Artigo em Inglês | MEDLINE | ID: mdl-35978291

RESUMO

BACKGROUND: Somatic copy number alterations (SCNAs) are an important class of genomic alteration in cancer. They are frequently observed in cancer samples, with studies showing that, on average, SCNAs affect 34% of a cancer cell's genome. Furthermore, SCNAs have been shown to be major drivers of tumour development and have been associated with response to therapy and prognosis. Large-scale cancer genome studies suggest that tumours are driven by somatic copy number alterations (SCNAs) or single-nucleotide variants (SNVs). Despite the frequency of SCNAs and their clinical relevance, the use of genomics assays in the clinic is biased towards targeted gene panels, which identify SNVs but provide limited scope to detect SCNAs throughout the genome. There is a need for a comparably low-cost and simple method for high-resolution SCNA profiling. RESULTS: We present conliga, a fully probabilistic method that infers SCNA profiles from a low-cost, simple, and clinically-relevant assay (FAST-SeqS). When applied to 11 high-purity oesophageal adenocarcinoma samples, we obtain good agreement (Spearman's rank correlation coefficient, rs=0.94) between conliga's inferred SCNA profiles using FAST-SeqS data (approximately £14 per sample) and those inferred by ASCAT using high-coverage WGS (gold-standard). We find that conliga outperforms CNVkit (rs=0.89), also applied to FAST-SeqS data, and is comparable to QDNAseq (rs=0.96) applied to low-coverage WGS, which is approximately four-fold more expensive, more laborious and less clinically-relevant. By performing an in silico dilution series experiment, we find that conliga is particularly suited to detecting SCNAs in low tumour purity samples. At two million reads per sample, conliga is able to detect SCNAs in all nine samples at 3% tumour purity and as low as 0.5% purity in one sample. Crucially, we show that conliga's hidden state information can be used to decide when a sample is abnormal or normal, whereas CNVkit and QDNAseq cannot provide this critical information. CONCLUSIONS: We show that conliga provides high-resolution SCNA profiles using a convenient, low-cost assay. We believe conliga makes FAST-SeqS a more clinically valuable assay as well as a useful research tool, enabling inexpensive and fast copy number profiling of pre-malignant and cancer samples.


Assuntos
Variações do Número de Cópias de DNA , Neoplasias , Sequência de Bases , DNA , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Neoplasias/genética
12.
Phys Biol ; 19(5)2022 08 30.
Artigo em Inglês | MEDLINE | ID: mdl-35944548

RESUMO

Analyses of structural dynamics of biomolecules hold great promise to deepen the understanding of and ability to construct complex molecular systems. To this end, both experimental and computational means are available, such as fluorescence quenching experiments or molecular dynamics simulations, respectively. We argue that while seemingly disparate, both fields of study have to deal with the same type of data about the same underlying phenomenon of conformational switching. Two central challenges typically arise in both contexts: (i) the amount of obtained data is large, and (ii) it is often unknown how many distinct molecular states underlie these data. In this study, we build on the established idea of Markov state modeling and propose a generative, Bayesian nonparametric hidden Markov state model that addresses these challenges. Utilizing hierarchical Dirichlet processes, we treat different meta-stable molecule conformations as distinct Markov states, the number of which we then do not have to seta priori. In contrast to existing approaches to both experimental as well as simulation data that are based on the same idea, we leverage a mean-field variational inference approach, enabling scalable inference on large amounts of data. Furthermore, we specify the model also for the important case of angular data, which however proves to be computationally intractable. Addressing this issue, we propose a computationally tractable approximation to the angular model. We demonstrate the method on synthetic ground truth data and apply it to known benchmark problems as well as electrophysiological experimental data from a conformation-switching ion channel to highlight its practical utility.


Assuntos
Simulação de Dinâmica Molecular , Teorema de Bayes , Conformação Molecular
13.
Stat Sci ; 37(2): 162-182, 2022 May.
Artigo em Inglês | MEDLINE | ID: mdl-36034090

RESUMO

Genomic surveillance of SARS-CoV-2 has been instrumental in tracking the spread and evolution of the virus during the pandemic. The availability of SARS-CoV-2 molecular sequences isolated from infected individuals, coupled with phylodynamic methods, have provided insights into the origin of the virus, its evolutionary rate, the timing of introductions, the patterns of transmission, and the rise of novel variants that have spread through populations. Despite enormous global efforts of governments, laboratories, and researchers to collect and sequence molecular data, many challenges remain in analyzing and interpreting the data collected. Here, we describe the models and methods currently used to monitor the spread of SARS-CoV-2, discuss long-standing and new statistical challenges, and propose a method for tracking the rise of novel variants during the epidemic.

14.
Stat Med ; 41(20): 3879-3898, 2022 09 10.
Artigo em Inglês | MEDLINE | ID: mdl-35760708

RESUMO

Diagnostic tests play an important role in medical research and clinical practice. The ultimate goal of a diagnostic test is to distinguish between diseased and nondiseased individuals and before a test is routinely used in practice, it is a pivotal requirement that its ability to discriminate between these two states is thoroughly assessed. The overlap coefficient, which is defined as the proportion of overlap area between two probability density functions, has gained popularity as a summary measure of diagnostic accuracy. We propose two Bayesian nonparametric estimators, based on Dirichlet process mixtures, for estimating the overlap coefficient. We further introduce the covariate-specific overlap coefficient and develop a Bayesian nonparametric approach based on Dirichlet process mixtures of additive normal models for estimating it. A simulation study is conducted to assess the empirical performance of our proposed estimators. Two illustrations are provided: one concerned with the search for biomarkers of ovarian cancer and another one aimed to assess the age-specific accuracy of glucose as a biomarker of diabetes.


Assuntos
Modelos Estatísticos , Neoplasias Ovarianas , Teorema de Bayes , Biomarcadores , Simulação por Computador , Feminino , Humanos , Neoplasias Ovarianas/diagnóstico , Estatísticas não Paramétricas
15.
Stat Med ; 41(7): 1242-1262, 2022 03 30.
Artigo em Inglês | MEDLINE | ID: mdl-34816464

RESUMO

Jointly analyzing transcriptomic data and the existing biological networks can yield more robust and informative feature selection results, as well as better understanding of the biological mechanisms. Selecting and classifying node features over genome-scale networks has become increasingly important in genomic biology and genomic medicine. Existing methods have some critical drawbacks. The first is they do not allow flexible modeling of different subtypes of selected nodes. The second is they ignore nodes with missing values, very likely to increase bias in estimation. To address these limitations, we propose a general modeling framework for Bayesian node classification (BNC) with missing values. A new prior model is developed for the class indicators incorporating the network structure. For posterior computation, we resort to the Swendsen-Wang algorithm for efficiently updating class indicators. BNC can naturally handle missing values in the Bayesian modeling framework, which improves the node classification accuracy and reduces the bias in estimating gene effects. We demonstrate the advantages of our methods via extensive simulation studies and the analysis of the cutaneous melanoma dataset from The Cancer Genome Atlas.


Assuntos
Melanoma , Neoplasias Cutâneas , Algoritmos , Teorema de Bayes , Simulação por Computador , Humanos , Melanoma/genética
16.
Ecol Appl ; 32(3): e2524, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-34918421

RESUMO

Clustering is a ubiquitous task in ecological and environmental sciences and multiple methods have been developed for this purpose. Because these clustering methods typically require users to a priori specify the number of groups, the standard approach is to run the algorithm for different numbers of groups and then choose the optimal number using a criterion (e.g., AIC or BIC). The problem with this approach is that it can be computationally expensive to run these clustering algorithms multiple times (i.e., for different numbers of groups) and some of these information criteria can lead to an overestimation of the number of groups. To address these concerns, we advocate for the use of sparsity-inducing priors within a Bayesian clustering framework. In particular, we highlight how the truncated stick-breaking (TSB) prior, a prior commonly adopted in Bayesian nonparametrics, can be used to simultaneously determine the number of groups and estimate model parameters for a wide range of Bayesian clustering models without requiring the fitting of multiple models. We illustrate the ability of this prior to successfully recover the true number of groups for three clustering models (two types of mixture models, applied to GPS movement data and species occurrence data, as well as the species archetype model) using simulated data in the context of movement ecology and community ecology. We then apply these models to armadillo movement data in Brazil, plant occurrence data from Alberta (Canada), and bird occurrence data from North America. We believe that many ecological and environmental sciences applications will benefit from Bayesian clustering methods with sparsity-inducing priors given the ubiquity of clustering and the associated challenge of determining the number of groups. Two R packages, EcoCluster and bayesmove, are provided that enable the straightforward fitting of these models with the TSB prior.


Assuntos
Algoritmos , Alberta , Teorema de Bayes , Brasil , Análise por Conglomerados
17.
Artigo em Inglês | MEDLINE | ID: mdl-35781923

RESUMO

The standard approach to analyzing brain electrical activity is to examine the spectral density function (SDF) and identify frequency bands, defined a priori, that have the most substantial relative contributions to the overall variance of the signal. However, a limitation of this approach is that the precise frequency and bandwidth of oscillations are not uniform across different cognitive demands. Thus, these bands should not be arbitrarily set in any analysis. To overcome this limitation, the Bayesian mixture auto-regressive decomposition (BMARD) method is proposed, as a data-driven approach that identifies (i) the number of prominent spectral peaks, (ii) the frequency peak locations, and (iii) their corresponding bandwidths (or spread of power around the peaks). Using the BMARD method, the standardized SDF is represented as a Dirichlet process mixture based on a kernel derived from second-order auto-regressive processes which completely characterize the location (peak) and scale (bandwidth) parameters. A Metropolis-Hastings within the Gibbs algorithm is developed for sampling the posterior distribution of the mixture parameters. Simulations demonstrate the robust performance of the proposed method. Finally, the BMARD method is applied to analyze local field potential (LFP) activity from the hippocampus of laboratory rats across different conditions in a non-spatial sequence memory experiment, to identify the most prominent frequency bands and examine the link between specific patterns of brain oscillatory activity and trial-specific cognitive demands.

18.
Sensors (Basel) ; 22(23)2022 Dec 03.
Artigo em Inglês | MEDLINE | ID: mdl-36502155

RESUMO

Wearable sensor data is relatively easily collected and provides direct measurements of movement that can be used to develop useful behavioral biomarkers. Sensitive and specific behavioral biomarkers for neurodegenerative diseases are critical to supporting early detection, drug development efforts, and targeted treatments. In this paper, we use autoregressive hidden Markov models and a time-frequency approach to create meaningful quantitative descriptions of behavioral characteristics of cerebellar ataxias from wearable inertial sensor data gathered during movement. We create a flexible and descriptive set of features derived from accelerometer and gyroscope data collected from wearable sensors worn while participants perform clinical assessment tasks, and use these data to estimate disease status and severity. A short period of data collection (<5 min) yields enough information to effectively separate patients with ataxia from healthy controls with very high accuracy, to separate ataxia from other neurodegenerative diseases such as Parkinson's disease, and to provide estimates of disease severity.


Assuntos
Doenças Cerebelares , Doença de Parkinson , Dispositivos Eletrônicos Vestíveis , Humanos , Movimento , Doença de Parkinson/diagnóstico , Ataxia
19.
Entropy (Basel) ; 24(12)2022 Nov 22.
Artigo em Inglês | MEDLINE | ID: mdl-36554108

RESUMO

Hierarchical stochastic processes, such as the hierarchical Dirichlet process, hold an important position as a modelling tool in statistical machine learning, and are even used in deep neural networks. They allow, for instance, networks of probability vectors to be used in general statistical modelling, intrinsically supporting information sharing through the network. This paper presents a general theory of hierarchical stochastic processes and illustrates its use on the gamma process and the generalised gamma process. In general, most of the convenient properties of hierarchical Dirichlet processes extend to the broader family. The main construction for this corresponds to estimating the moments of an infinitely divisible distribution based on its cumulants. Various equivalences and relationships can then be applied to networks of hierarchical processes. Examples given demonstrate the duplication in non-parametric research, and presents plots of the Pitman-Yor distribution.

20.
Biometrics ; 77(2): 622-633, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-32535900

RESUMO

The simultaneous testing of multiple hypotheses is common to the analysis of high-dimensional data sets. The two-group model, first proposed by Efron, identifies significant comparisons by allocating observations to a mixture of an empirical null and an alternative distribution. In the Bayesian nonparametrics literature, many approaches have suggested using mixtures of Dirichlet Processes in the two-group model framework. Here, we investigate employing mixtures of two-parameter Poisson-Dirichlet Processes instead, and show how they provide a more flexible and effective tool for large-scale hypothesis testing. Our model further employs nonlocal prior densities to allow separation between the two mixture components. We obtain a closed-form expression for the exchangeable partition probability function of the two-group model, which leads to a straightforward Markov Chain Monte Carlo implementation. We compare the performance of our method for large-scale inference in a simulation study and illustrate its use on both a prostate cancer data set and a case-control microbiome study of the gastrointestinal tracts in children from underdeveloped countries who have been recently diagnosed with moderate-to-severe diarrhea.


Assuntos
Microbiota , Teorema de Bayes , Criança , Simulação por Computador , Humanos , Cadeias de Markov , Método de Monte Carlo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA