Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
Biometrics ; 77(2): 622-633, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-32535900

RESUMO

The simultaneous testing of multiple hypotheses is common to the analysis of high-dimensional data sets. The two-group model, first proposed by Efron, identifies significant comparisons by allocating observations to a mixture of an empirical null and an alternative distribution. In the Bayesian nonparametrics literature, many approaches have suggested using mixtures of Dirichlet Processes in the two-group model framework. Here, we investigate employing mixtures of two-parameter Poisson-Dirichlet Processes instead, and show how they provide a more flexible and effective tool for large-scale hypothesis testing. Our model further employs nonlocal prior densities to allow separation between the two mixture components. We obtain a closed-form expression for the exchangeable partition probability function of the two-group model, which leads to a straightforward Markov Chain Monte Carlo implementation. We compare the performance of our method for large-scale inference in a simulation study and illustrate its use on both a prostate cancer data set and a case-control microbiome study of the gastrointestinal tracts in children from underdeveloped countries who have been recently diagnosed with moderate-to-severe diarrhea.


Assuntos
Microbiota , Teorema de Bayes , Criança , Simulação por Computador , Humanos , Cadeias de Markov , Método de Monte Carlo
2.
Entropy (Basel) ; 22(1)2020 Jan 06.
Artigo em Inglês | MEDLINE | ID: mdl-33285844

RESUMO

We examine issues of prior sensitivity in a semi-parametric hierarchical extension of the INAR(p) model with innovation rates clustered according to a Pitman-Yor process placed at the top of the model hierarchy. Our main finding is a graphical criterion that guides the specification of the hyperparameters of the Pitman-Yor process base measure. We show how the discount and concentration parameters interact with the chosen base measure to yield a gain in terms of the robustness of the inferential results. The forecasting performance of the model is exemplified in the analysis of a time series of worldwide earthquake events, for which the new model outperforms the original INAR(p) model.

3.
Biometrics ; 68(4): 1188-96, 2012 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-23025286

RESUMO

Species sampling problems have a long history in ecological and biological studies and a number of issues, including the evaluation of species richness, the design of sampling experiments, and the estimation of rare species variety, are to be addressed. Such inferential problems have recently emerged also in genomic applications, however, exhibiting some peculiar features that make them more challenging: specifically, one has to deal with very large populations (genomic libraries) containing a huge number of distinct species (genes) and only a small portion of the library has been sampled (sequenced). These aspects motivate the Bayesian nonparametric approach we undertake, since it allows to achieve the degree of flexibility typically needed in this framework. Based on an observed sample of size n, focus will be on prediction of a key aspect of the outcome from an additional sample of size m, namely, the so-called discovery probability. In particular, conditionally on an observed basic sample of size n, we derive a novel estimator of the probability of detecting, at the (n+m+1)th observation, species that have been observed with any given frequency in the enlarged sample of size n+m. Such an estimator admits a closed-form expression that can be exactly evaluated. The result we obtain allows us to quantify both the rate at which rare species are detected and the achieved sample coverage of abundant species, as m increases. Natural applications are represented by the estimation of the probability of discovering rare genes within genomic libraries and the results are illustrated by means of two expressed sequence tags datasets.


Assuntos
Algoritmos , Mapeamento Cromossômico/métodos , Interpretação Estatística de Dados , Etiquetas de Sequências Expressas , Modelos Estatísticos , Análise de Sequência de DNA/métodos , Simulação por Computador
4.
Bayesian Anal ; 14(4): 1303-1356, 2019 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-35978607

RESUMO

Discrete random structures are important tools in Bayesian nonparametrics and the resulting models have proven effective in density estimation, clustering, topic modeling and prediction, among others. In this paper, we consider nested processes and study the dependence structures they induce. Dependence ranges between homogeneity, corresponding to full exchangeability, and maximum heterogeneity, corresponding to (unconditional) independence across samples. The popular nested Dirichlet process is shown to degenerate to the fully exchangeable case when there are ties across samples at the observed or latent level. To overcome this drawback, inherent to nesting general discrete random measures, we introduce a novel class of latent nested processes. These are obtained by adding common and group-specific completely random measures and, then, normalizing to yield dependent random probability measures. We provide results on the partition distributions induced by latent nested processes, and develop a Markov Chain Monte Carlo sampler for Bayesian inferences. A test for distributional homogeneity across groups is obtained as a by-product. The results and their inferential implications are showcased on synthetic and real data.

5.
BMC Bioinformatics ; 8: 339, 2007 Sep 14.
Artigo em Inglês | MEDLINE | ID: mdl-17868445

RESUMO

BACKGROUND: Expressed sequence tags (ESTs) analyses are a fundamental tool for gene identification in organisms. Given a preliminary EST sample from a certain library, several statistical prediction problems arise. In particular, it is of interest to estimate how many new genes can be detected in a future EST sample of given size and also to determine the gene discovery rate: these estimates represent the basis for deciding whether to proceed sequencing the library and, in case of a positive decision, a guideline for selecting the size of the new sample. Such information is also useful for establishing sequencing efficiency in experimental design and for measuring the degree of redundancy of an EST library. RESULTS: In this work we propose a Bayesian nonparametric approach for tackling statistical problems related to EST surveys. In particular, we provide estimates for: a) the coverage, defined as the proportion of unique genes in the library represented in the given sample of reads; b) the number of new unique genes to be observed in a future sample; c) the discovery rate of new genes as a function of the future sample size. The Bayesian nonparametric model we adopt conveys, in a statistically rigorous way, the available information into prediction. Our proposal has appealing properties over frequentist nonparametric methods, which become unstable when prediction is required for large future samples. EST libraries, previously studied with frequentist methods, are analyzed in detail. CONCLUSION: The Bayesian nonparametric approach we undertake yields valuable tools for gene capture and prediction in EST libraries. The estimators we obtain do not feature the kind of drawbacks associated with frequentist estimators and are reliable for any size of the additional sample.


Assuntos
Teorema de Bayes , Modelos Estatísticos , Análise de Sequência de DNA/métodos , Algoritmos , Amoeba/genética , Animais , Mapeamento Cromossômico , Teoria da Decisão , Etiquetas de Sequências Expressas , Perfilação da Expressão Gênica/métodos , Biblioteca Gênica , Modelos Genéticos , Valor Preditivo dos Testes , Tamanho da Amostra , Estatísticas não Paramétricas
6.
IEEE Trans Pattern Anal Mach Intell ; 37(2): 212-29, 2015 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-26353237

RESUMO

Discrete random probability measures and the exchangeable random partitions they induce are key tools for addressing a variety of estimation and prediction problems in Bayesian inference. Here we focus on the family of Gibbs-type priors, a recent elegant generalization of the Dirichlet and the Pitman-Yor process priors. These random probability measures share properties that are appealing both from a theoretical and an applied point of view: (i) they admit an intuitive predictive characterization justifying their use in terms of a precise assumption on the learning mechanism; (ii) they stand out in terms of mathematical tractability; (iii) they include several interesting special cases besides the Dirichlet and the Pitman-Yor processes. The goal of our paper is to provide a systematic and unified treatment of Gibbs-type priors and highlight their implications for Bayesian nonparametric inference. We deal with their distributional properties, the resulting estimators, frequentist asymptotic validation and the construction of time-dependent versions. Applications, mainly concerning mixture models and species sampling, serve to convey the main ideas. The intuition inherent to this class of priors and the neat results they lead to make one wonder whether it actually represents the most natural generalization of the Dirichlet process.

7.
Tex Heart Inst J ; 29(1): 40-4, 2002.
Artigo em Inglês | MEDLINE | ID: mdl-11995849

RESUMO

Aneurysm of the left sinus of Valsalva is extremely rare. Compression of the left coronary artery by such an aneurysm is an unusual complication of this condition and can cause coronary insufficiency. We describe the case of a 75-year-old woman who had an isolated unruptured aneurysm of the left coronary sinus with intraluminal thrombus, which caused coronary artery compression. We performed successful surgical correction by closing the mouth of the aneurysm without aortic valve replacement or coronary artery bypass grafting. A review of the world medical literature revealed 19 cases of sinus of Valsalva aneurysms that hindered the coronary arterial flow. The previously published reports of this rare condition and its treatment are discussed herein.


Assuntos
Doenças da Aorta/complicações , Estenose Coronária/etiologia , Seio Aórtico/anormalidades , Trombose/complicações , Idoso , Aneurisma Aórtico/complicações , Aneurisma Aórtico/cirurgia , Doenças da Aorta/cirurgia , Estenose Coronária/cirurgia , Feminino , Humanos , Trombose/cirurgia
8.
J Comput Biol ; 15(10): 1315-27, 2008 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-19040366

RESUMO

Inference for Expressed Sequence Tags (ESTs) data is considered. We focus on evaluating the redundancy of a cDNA library and, more importantly, on comparing different libraries on the basis of their clustering structure. The numerical results we achieve allow us to assess the effect of an error correction procedure for EST data and to study the compatibility of single EST libraries with respect to merged ones. The proposed method is based on a Bayesian nonparametric approach that allows to understand the clustering mechanism that generates the observed data. As specific nonparametric model we use the two parameter Poisson-Dirichlet (PD) process. The PD process represents a tractable nonparametric prior which is a natural candidate for modeling data arising from discrete distributions. It allows prediction and testing in order to analyze the clustering structure featured by the data. We show how a full Bayesian analysis can be performed and describe the corresponding computational algorithm.


Assuntos
Teorema de Bayes , Etiquetas de Sequências Expressas , Biblioteca Gênica , Algoritmos , Sequência de Bases , Análise por Conglomerados , Dados de Sequência Molecular , Análise de Sequência de DNA/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA