Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
J Am Stat Assoc ; 119(545): 320-331, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38716405

RESUMO

There is a growing interest in the estimation of the number of unseen features, mostly driven by biological applications. A recent work brought out a peculiar property of the popular completely random measures (CRMs) as prior models in Bayesian nonparametric (BNP) inference for the unseen-features problem: for fixed prior's parameters, they all lead to a Poisson posterior distribution for the number of unseen features, which depends on the sampling information only through the sample size. CRMs are thus not a flexible prior model for the unseen-features problem and, while the Poisson posterior distribution may be appealing for analytical tractability and ease of interpretability, its independence from the sampling information makes the BNP approach a questionable oversimplification, with posterior inferences being completely determined by the estimation of unknown prior's parameters. In this article, we introduce the stable-Beta scaled process (SB-SP) prior, and we show that it allows to enrich the posterior distribution of the number of unseen features arising under CRM priors, while maintaining its analytical tractability and interpretability. That is, the SB-SP prior leads to a negative Binomial posterior distribution, which depends on the sampling information through the sample size and the number of distinct features, with corresponding estimates being simple, linear in the sampling information and computationally efficient. We apply our BNP approach to synthetic data and to real cancer genomic data, showing that: (i) it outperforms the most popular parametric and nonparametric competitors in terms of estimation accuracy; (ii) it provides improved coverage for the estimation with respect to a BNP approach under CRM priors. Supplementary materials for this article are available online.

2.
ArXiv ; 2024 Mar 04.
Artigo em Inglês | MEDLINE | ID: mdl-38495567

RESUMO

Collecting genomics data across multiple heterogeneous populations (e.g., across different cancer types) has the potential to improve our understanding of disease. Despite sequencing advances, though, resources often remain a constraint when gathering data. So it would be useful for experimental design if experimenters with access to a pilot study could predict the number of new variants they might expect to find in a follow-up study: both the number of new variants shared between the populations and the total across the populations. While many authors have developed prediction methods for the single-population case, we show that these predictions can fare poorly across multiple populations that are heterogeneous. We prove that, surprisingly, a natural extension of a state-of-the-art single-population predictor to multiple populations fails for fundamental reasons. We provide the first predictor for the number of new shared variants and new total variants that can handle heterogeneity in multiple populations. We show that our proposed method works well empirically using real cancer and population genetics data.

3.
Sci Adv ; 9(7): eabn3999, 2023 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-36791188

RESUMO

Probabilistic machine learning increasingly informs critical decisions in medicine, economics, politics, and beyond. To aid the development of trust in these decisions, we develop a taxonomy delineating where trust in an analysis can break down: (i) in the translation of real-world goals to goals on a particular set of training data, (ii) in the translation of abstract goals on the training data to a concrete mathematical problem, (iii) in the use of an algorithm to solve the stated mathematical problem, and (iv) in the use of a particular code implementation of the chosen algorithm. We detail how trust can fail at each step and illustrate our taxonomy with two case studies. Finally, we describe a wide variety of methods that can be used to increase trust at each step of our taxonomy. The use of our taxonomy highlights not only steps where existing research work on trust tends to concentrate and but also steps where building trust is particularly challenging.

5.
IEEE Trans Pattern Anal Mach Intell ; 37(2): 290-306, 2015 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-26353242

RESUMO

We develop a Bayesian nonparametric approach to a general family of latent class problems in which individuals can belong simultaneously to multiple classes and where each class can be exhibited multiple times by an individual. We introduce a combinatorial stochastic process known as the negative binomial process ( NBP ) as an infinite-dimensional prior appropriate for such problems. We show that the NBP is conjugate to the beta process, and we characterize the posterior distribution under the beta-negative binomial process ( BNBP) and hierarchical models based on the BNBP (the HBNBP). We study the asymptotic properties of the BNBP and develop a three-parameter extension of the BNBP that exhibits power-law behavior. We derive MCMC algorithms for posterior inference under the HBNBP , and we present experiments using these algorithms in the domains of image segmentation, object recognition, and document analysis.


Assuntos
Análise por Conglomerados , Informática/métodos , Algoritmos , Teorema de Bayes , Simulação por Computador , Processamento de Imagem Assistida por Computador , Modelos Teóricos , Estatísticas não Paramétricas
6.
PLoS One ; 4(10): e7481, 2009 Oct 22.
Artigo em Inglês | MEDLINE | ID: mdl-19847300

RESUMO

Selection methods that require only a single-switch input, such as a button click or blink, are potentially useful for individuals with motor impairments, mobile technology users, and individuals wishing to transmit information securely. We present a single-switch selection method, "Nomon," that is general and efficient. Existing single-switch selection methods require selectable options to be arranged in ways that limit potential applications. By contrast, traditional operating systems, web browsers, and free-form applications (such as drawing) place options at arbitrary points on the screen. Nomon, however, has the flexibility to select any point on a screen. Nomon adapts automatically to an individual's clicking ability; it allows a person who clicks precisely to make a selection quickly and allows a person who clicks imprecisely more time to make a selection without error. Nomon reaps gains in information rate by allowing the specification of beliefs (priors) about option selection probabilities and by avoiding tree-based selection schemes in favor of direct (posterior) inference. We have developed both a Nomon-based writing application and a drawing application. To evaluate Nomon's performance, we compared the writing application with a popular existing method for single-switch writing (row-column scanning). Novice users wrote 35% faster with the Nomon interface than with the scanning interface. An experienced user (author TB, with 10 hours practice) wrote at speeds of 9.3 words per minute with Nomon, using 1.2 clicks per character and making no errors in the final text.


Assuntos
Inteligência Artificial , Periféricos de Computador , Interface Usuário-Computador , Adulto , Algoritmos , Auxiliares de Comunicação para Pessoas com Deficiência , Computadores , Desenho de Equipamento , Feminino , Humanos , Masculino , Modelos Estatísticos , Redes Neurais de Computação , Reprodutibilidade dos Testes , Software
7.
Appl Math Res Express ; 2009(2): 123-141, 2009 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-23105943

RESUMO

In free response choice tasks, decision making is often modeled as a first-passage problem for a stochastic differential equation. In particular, drift-diffusion processes with constant or time-varying drift rates and noise can reproduce behavioral data (accuracy and response-time distributions) and neuronal firing rates. However, no exact solutions are known for the first-passage problem with time-varying data. Recognizing the importance of simple closed-form expressions for modeling and inference, we show that an interrogation or cued-response protocol, appropriately interpreted, can yield approximate first-passage (response time) distributions for a specific class of time-varying processes used to model evidence accumulation. We test these against exact expressions for the constant drift case and compare them with data from a class of sigmoidal functions. We find that both the direct interrogation approximation and an error-minimizing interrogation approximation can capture a variety of distribution shapes and mode numbers but that the direct approximation, in particular, is systematically biased away from the correct free response distribution.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...