RESUMO
With the growing role of artificial intelligence (AI) in our lives, attention is increasingly turning to the way that humans and AI work together. A key aspect of human-AI collaboration is how people integrate judgements or recommendations from machine agents, when they differ from their own judgements. We investigated trust in human-machine teaming using a perceptual judgement task based on the judge-advisor system. Participants ( n = 89 ) estimated a perceptual quantity, then received a recommendation from a machine agent. The participants then made a second response which combined their first estimate and the machine's recommendation. The degree to which participants shifted their second response in the direction of the recommendations provided a measure of their trust in the machine agent. We analysed the role of advice distance in people's willingness to change their judgements. When a recommendation falls a long way from their initial judgement, do people come to doubt their own judgement, trusting the recommendation more, or do they doubt the machine agent, trusting the recommendation less? We found that although some participants exhibited these behaviours, the most common response was neither of these tendencies, and a simple model based on averaging accounted best for participants' trust behaviour. We discuss implications for theories of trust, and human-machine teaming.
Assuntos
Inteligência Artificial , Julgamento , Confiança , Humanos , Adulto , Masculino , Feminino , Adulto Jovem , Julgamento/fisiologia , Sistemas Homem-MáquinaRESUMO
In engineering, redundancy is the duplication of vital systems for use in the event of failure. In studies of human cognition, redundancy often refers to the duplication of the signal. Scores of studies have shown the salutary effects of a combined auditory and visual signal over single modality, the advantage of processing complete faces over facial features, and more recently the advantage of two observers over one. But what if the signal (or the number of observers) is fixed and cannot be altered or augmented? Can people improve the efficiency of information processing by recruiting an additional, redundant system? Here we demonstrate that recruiting a second redundant system can, under reasonable assumptions about human capacity, result in improved performance. Recruiting a second redundant system may come with a higher energy cost, but may be worthwhile in high-stakes situations where processing information accurately is crucial.
Assuntos
Cognição , Humanos , Cognição/fisiologiaRESUMO
Joint modeling of decisions and neural activation poses the potential to provide significant advances in linking brain and behavior. However, methods of joint modeling have been limited by difficulties in estimation, often due to high dimensionality and simultaneous estimation challenges. In the current article, we propose a method of model estimation that draws on state-of-the-art Bayesian hierarchical modeling techniques and uses factor analysis as a means of dimensionality reduction and inference at the group level. This hierarchical factor approach can adopt any model for the individual and distill the relationships of its parameters across individuals through a factor structure. We demonstrate the significant dimensionality reduction gained by factor analysis and good parameter recovery, and illustrate a variety of factor loading constraints that can be used for different purposes and research questions, as well as three applications of the method to previously analyzed data. We conclude that this method provides a flexible and usable approach with interpretable outcomes that are primarily data-driven, in contrast to the largely hypothesis-driven methods often used in joint modeling. Although we focus on joint modeling methods, this model-based estimation approach could be used for any high dimensional modeling problem. We provide open-source code and accompanying tutorial documentation to make the method accessible to any researchers. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
RESUMO
A fundamental part of experimental design is to determine the sample size of a study. However, sparse information about population parameters and effect sizes before data collection renders effective sample size planning challenging. Specifically, sparse information may lead research designs to be based on inaccurate a priori assumptions, causing studies to use resources inefficiently or to produce inconclusive results. Despite its deleterious impact on sample size planning, many prominent methods for experimental design fail to adequately address the challenge of sparse a-priori information. Here we propose a Bayesian Monte Carlo methodology for interim design analyses that allows researchers to analyze and adapt their sampling plans throughout the course of a study. At any point in time, the methodology uses the best available knowledge about parameters to make projections about expected evidence trajectories. Two simulated application examples demonstrate how interim design analyses can be integrated into common designs to inform sampling plans on the fly. The proposed methodology addresses the problem of sample size planning with sparse a-priori information and yields research designs that are efficient, informative, and flexible. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
RESUMO
Response inhibition is a key attribute of human executive control. Standard stop-signal tasks require countermanding a single response; the speed at which that response can be inhibited indexes the efficacy of the inhibitory control networks. However, more complex stopping tasks, where one or more components of a multi-component action are cancelled (i.e., response-selective stopping) cannot be explained by the independent-race model appropriate for the simple task (Logan and Cowan 1984). Healthy human participants (n=28; 10 male; 19-40 years) completed a response-selective stopping task where a 'go' stimulus required simultaneous (bimanual) button presses in response to left and right pointing green arrows. On a subset of trials (30%) one, or both, arrows turned red (constituting the stop signal) requiring that only the button-press(es) associated with red arrows be cancelled. Electromyographic recordings from both index fingers (first dorsal interosseous) permitted the assessment of both voluntary motor responses that resulted in overt button presses, and activity that was cancelled prior to an overt response (i.e., partial, or covert, responses). We propose a simultaneously inhibit and start (SIS) model that extends the independent race model and provides a highly accurate account of response-selective stopping data. Together with fine-grained EMG analysis, our model-based analysis offers converging evidence that the selective-stop signal simultaneously triggers a process that stops the bimanual response and triggers a new unimanual response corresponding to the green arrow. Our results require a reconceptualisation of response-selective stopping and offer a tractable framework for assessing such tasks in healthy and patient populations. Significance Statement Response inhibition is a key attribute of human executive control, frequently investigated using the stop-signal task. After initiating a motor response to a go signal, a stop signal occasionally appears at a delay, requiring cancellation of the response. This has been conceptualised as a 'race' between the go and stop processes, with the successful (or failed) cancellation determined by which process wins the race. Here we provide a novel computational model for a complex variation of the stop-signal task, where only one component of a multicomponent action needs to be cancelled. We provide compelling muscle activation data that support our model, providing a robust and plausible framework for studying these complex inhibition tasks in both healthy and pathological cohorts.
Assuntos
Função Executiva , Desempenho Psicomotor , Humanos , Masculino , Tempo de Reação/fisiologia , Desempenho Psicomotor/fisiologia , Função Executiva/fisiologia , Inibição PsicológicaRESUMO
Researchers conduct meta-analyses in order to synthesize information across different studies. Compared to standard meta-analytic methods, Bayesian model-averaged meta-analysis offers several practical advantages including the ability to quantify evidence in favor of the absence of an effect, the ability to monitor evidence as individual studies accumulate indefinitely, and the ability to draw inferences based on multiple models simultaneously. This tutorial introduces the concepts and logic underlying Bayesian model-averaged meta-analysis and illustrates its application using the open-source software JASP. As a running example, we perform a Bayesian meta-analysis on language development in children. We show how to conduct a Bayesian model-averaged meta-analysis and how to interpret the results.
Assuntos
Projetos de Pesquisa , Software , Criança , Humanos , Teorema de BayesRESUMO
The ability to stop simple ongoing actions has been extensively studied using the stop signal task, but less is known about inhibition in more complex scenarios. Here we used a task requiring bimanual responses to go stimuli, but selective inhibition of only one of those responses following a stop signal. We assessed how proactive cues affect the nature of both the responding and stopping processes, and the well-documented stopping delay (interference effect) in the continuing action following successful stopping. In this task, estimates of the speed of inhibition based on a simple-stopping model are inappropriate, and have produced inconsistent findings about the effects of proactive control on motor inhibition. We instead used a multi-modal approach, based on improved methods of detecting and interpreting partial electromyographical responses and the recently proposed SIS (simultaneously inhibit and start) model of selective stopping behaviour. Our results provide clear and converging evidence that proactive cues reduce the stopping delay effect by slowing bimanual responses and speeding unimanual responses, with a negligible effect on the speed of the stopping process.
Assuntos
Sinais (Psicologia) , Inibição Psicológica , Tempo de Reação/fisiologia , Eletromiografia , Comportamento de Escolha , Desempenho Psicomotor/fisiologiaRESUMO
Cognitive models provide a substantively meaningful quantitative description of latent cognitive processes. The quantitative formulation of these models supports cumulative theory building and enables strong empirical tests. However, the nonlinearity of these models and pervasive correlations among model parameters pose special challenges when applying cognitive models to data. Firstly, estimating cognitive models typically requires large hierarchical data sets that need to be accommodated by an appropriate statistical structure within the model. Secondly, statistical inference needs to appropriately account for model uncertainty to avoid overconfidence and biased parameter estimates. In the present work, we show how these challenges can be addressed through a combination of Bayesian hierarchical modeling and Bayesian model averaging. To illustrate these techniques, we apply the popular diffusion decision model to data from a collaborative selective influence study. (PsycInfo Database Record (c) 2023 APA, all rights reserved).
RESUMO
Hypotheses concerning the distribution of multinomial proportions typically entail exact equality constraints that can be evaluated using standard tests. Whenever researchers formulate inequality constrained hypotheses, however, they must rely on sampling-based methods that are relatively inefficient and computationally expensive. To address this problem we developed a bridge sampling routine that allows an efficient evaluation of multinomial inequality constraints. An empirical application showcases that bridge sampling outperforms current Bayesian methods, especially when relatively little posterior mass falls in the restricted parameter space. The method is extended to mixtures between equality and inequality constrained hypotheses. (PsycInfo Database Record (c) 2023 APA, all rights reserved).
Assuntos
Teorema de Bayes , HumanosRESUMO
Bayesian inference requires the specification of prior distributions that quantify the pre-data uncertainty about parameter values. One way to specify prior distributions is through prior elicitation, an interview method guiding field experts through the process of expressing their knowledge in the form of a probability distribution. However, prior distributions elicited from experts can be subject to idiosyncrasies of experts and elicitation procedures, raising the spectre of subjectivity and prejudice. Here, we investigate the effect of interpersonal variation in elicited prior distributions on the Bayes factor hypothesis test. We elicited prior distributions from six academic experts with a background in different fields of psychology and applied the elicited prior distributions as well as commonly used default priors in a re-analysis of 1710 studies in psychology. The degree to which the Bayes factors vary as a function of the different prior distributions is quantified by three measures of concordance of evidence: We assess whether the prior distributions change the Bayes factor direction, whether they cause a switch in the category of evidence strength, and how much influence they have on the value of the Bayes factor. Our results show that although the Bayes factor is sensitive to changes in the prior distribution, these changes do not necessarily affect the qualitative conclusions of a hypothesis test. We hope that these results help researchers gauge the influence of interpersonal variation in elicited prior distributions in future psychological studies. Additionally, our sensitivity analyses can be used as a template for Bayesian robustness analyses that involve prior elicitation from multiple experts.
Assuntos
Projetos de Pesquisa , Teorema de Bayes , Humanos , Probabilidade , IncertezaRESUMO
Testing the equality of two proportions is a common procedure in science, especially in medicine and public health. In these domains, it is crucial to be able to quantify evidence for the absence of a treatment effect. Bayesian hypothesis testing by means of the Bayes factor provides one avenue to do so, requiring the specification of prior distributions for parameters. The most popular analysis approach views the comparison of proportions from a contingency table perspective, assigning prior distributions directly to the two proportions. Another, less popular approach views the problem from a logistic regression perspective, assigning prior distributions to logit-transformed parameters. Reanalyzing 39 null results from the New England Journal of Medicine with both approaches, we find that they can lead to markedly different conclusions, especially when the observed proportions are at the extremes (ie, very low or very high). We explain these stark differences and provide recommendations for researchers interested in testing the equality of two proportions and users of Bayes factors more generally. The test that assigns prior distributions to logit-transformed parameters creates prior dependence between the two proportions and yields weaker evidence when the observations are at the extremes. When comparing two proportions, we argue that this test should become the new default.
Assuntos
Projetos de Pesquisa , Teorema de Bayes , Humanos , Modelos LogísticosRESUMO
We outline a Bayesian model-averaged (BMA) meta-analysis for standardized mean differences in order to quantify evidence for both treatment effectiveness δ and across-study heterogeneity τ . We construct four competing models by orthogonally combining two present-absent assumptions, one for the treatment effect and one for across-study heterogeneity. To inform the choice of prior distributions for the model parameters, we used 50% of the Cochrane Database of Systematic Reviews to specify rival prior distributions for δ and τ . The relative predictive performance of the competing models and rival prior distributions was assessed using the remaining 50% of the Cochrane Database. On average, â1r -the model that assumes the presence of a treatment effect as well as across-study heterogeneity-outpredicted the other models, but not by a large margin. Within â1r , predictive adequacy was relatively constant across the rival prior distributions. We propose specific empirical prior distributions, both for the field in general and for each of 46 specific medical subdisciplines. An example from oral health demonstrates how the proposed prior distributions can be used to conduct a BMA meta-analysis in the open-source software R and JASP. The preregistered analysis plan is available at https://osf.io/zs3df/.
Assuntos
Teorema de Bayes , Bases de Dados Factuais , Humanos , Metanálise como Assunto , Revisões Sistemáticas como Assunto , Resultado do TratamentoRESUMO
We conducted a preregistered multilaboratory project (k = 36; N = 3,531) to assess the size and robustness of ego-depletion effects using a novel replication method, termed the paradigmatic replication approach. Each laboratory implemented one of two procedures that was intended to manipulate self-control and tested performance on a subsequent measure of self-control. Confirmatory tests found a nonsignificant result (d = 0.06). Confirmatory Bayesian meta-analyses using an informed-prior hypothesis (δ = 0.30, SD = 0.15) found that the data were 4 times more likely under the null than the alternative hypothesis. Hence, preregistered analyses did not find evidence for a depletion effect. Exploratory analyses on the full sample (i.e., ignoring exclusion criteria) found a statistically significant effect (d = 0.08); Bayesian analyses showed that the data were about equally likely under the null and informed-prior hypotheses. Exploratory moderator tests suggested that the depletion effect was larger for participants who reported more fatigue but was not moderated by trait self-control, willpower beliefs, or action orientation.
Assuntos
Ego , Autocontrole , Teorema de Bayes , Humanos , Projetos de PesquisaRESUMO
Linear regression analyses commonly involve two consecutive stages of statistical inquiry. In the first stage, a single 'best' model is defined by a specific selection of relevant predictors; in the second stage, the regression coefficients of the winning model are used for prediction and for inference concerning the importance of the predictors. However, such second-stage inference ignores the model uncertainty from the first stage, resulting in overconfident parameter estimates that generalize poorly. These drawbacks can be overcome by model averaging, a technique that retains all models for inference, weighting each model's contribution by its posterior probability. Although conceptually straightforward, model averaging is rarely used in applied research, possibly due to the lack of easily accessible software. To bridge the gap between theory and practice, we provide a tutorial on linear regression using Bayesian model averaging in JASP, based on the BAS package in R. Firstly, we provide theoretical background on linear regression, Bayesian inference, and Bayesian model averaging. Secondly, we demonstrate the method on an example data set from the World Happiness Report. Lastly, we discuss limitations of model averaging and directions for dealing with violations of model assumptions.
Assuntos
Projetos de Pesquisa , Software , Teorema de Bayes , Modelos Lineares , Análise de RegressãoRESUMO
Gautret and colleagues reported the results of a non-randomised case series which examined the effects of hydroxychloroquine and azithromycin on viral load in the upper respiratory tract of Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) patients. The authors reported that hydroxychloroquine (HCQ) had significant virus reducing effects, and that dual treatment of both HCQ and azithromycin further enhanced virus reduction. In light of criticisms regarding how patients were excluded from analyses, we reanalysed the original data to interrogate the main claims of the paper. We applied Bayesian statistics to assess the robustness of the original paper's claims by testing four variants of the data: 1) The original data; 2) Data including patients who deteriorated; 3) Data including patients who deteriorated with exclusion of untested patients in the comparison group; 4) Data that includes patients who deteriorated with the assumption that untested patients were negative. To ask if HCQ monotherapy was effective, we performed an A/B test for a model which assumes a positive effect, compared to a model of no effect. We found that the statistical evidence was highly sensitive to these data variants. Statistical evidence for the positive effect model ranged from strong for the original data (BF+0 ~11), to moderate when including patients who deteriorated (BF+0 ~4.35), to anecdotal when excluding untested patients (BF+0 ~2), and to anecdotal negative evidence if untested patients were assumed positive (BF+0 ~0.6). The fact that the patient inclusions and exclusions are not well justified nor adequately reported raises substantial uncertainty about the interpretation of the evidence obtained from the original paper.
Assuntos
Antivirais/administração & dosagem , Azitromicina/administração & dosagem , Tratamento Farmacológico da COVID-19 , COVID-19/sangue , Hidroxicloroquina/administração & dosagem , SARS-CoV-2/metabolismo , Carga Viral , Adolescente , Adulto , Idoso , Criança , Feminino , Humanos , Masculino , Pessoa de Meia-IdadeRESUMO
Despite the increasing popularity of Bayesian inference in empirical research, few practical guidelines provide detailed recommendations for how to apply Bayesian procedures and interpret the results. Here we offer specific guidelines for four different stages of Bayesian statistical reasoning in a research setting: planning the analysis, executing the analysis, interpreting the results, and reporting the results. The guidelines for each stage are illustrated with a running example. Although the guidelines are geared towards analyses performed with the open-source statistical software JASP, most guidelines extend to Bayesian inference in general.
Assuntos
Interpretação Estatística de Dados , Guias como Assunto , Modelos Estatísticos , Projetos de Pesquisa , Teorema de Bayes , HumanosRESUMO
To what extent are research results influenced by subjective decisions that scientists make as they design studies? Fifteen research teams independently designed studies to answer five original research questions related to moral judgments, negotiations, and implicit cognition. Participants from 2 separate large samples (total N > 15,000) were then randomly assigned to complete 1 version of each study. Effect sizes varied dramatically across different sets of materials designed to test the same hypothesis: Materials from different teams rendered statistically significant effects in opposite directions for 4 of 5 hypotheses, with the narrowest range in estimates being d = -0.37 to + 0.26. Meta-analysis and a Bayesian perspective on the results revealed overall support for 2 hypotheses and a lack of support for 3 hypotheses. Overall, practically none of the variability in effect sizes was attributable to the skill of the research team in designing materials, whereas considerable variability was attributable to the hypothesis being tested. In a forecasting survey, predictions of other scientists were significantly correlated with study results, both across and within hypotheses. Crowdsourced testing of research hypotheses helps reveal the true consistency of empirical support for a scientific claim. (PsycInfo Database Record (c) 2020 APA, all rights reserved).
Assuntos
Crowdsourcing , Psicologia/métodos , Projetos de Pesquisa , Adulto , Humanos , Distribuição AleatóriaRESUMO
Over the last decade, the Bayesian estimation of evidence-accumulation models has gained popularity, largely due to the advantages afforded by the Bayesian hierarchical framework. Despite recent advances in the Bayesian estimation of evidence-accumulation models, model comparison continues to rely on suboptimal procedures, such as posterior parameter inference and model selection criteria known to favor overly complex models. In this paper, we advocate model comparison for evidence-accumulation models based on the Bayes factor obtained via Warp-III bridge sampling. We demonstrate, using the linear ballistic accumulator (LBA), that Warp-III sampling provides a powerful and flexible approach that can be applied to both nested and non-nested model comparisons, even in complex and high-dimensional hierarchical instantiations of the LBA. We provide an easy-to-use software implementation of the Warp-III sampler and outline a series of recommendations aimed at facilitating the use of Warp-III sampling in practical applications.
Assuntos
Software , Teorema de Bayes , Cadeias de Markov , Método de Monte CarloRESUMO
The article A SIMPLE METHOD FOR COMPARING COMPLEX MODELS.
RESUMO
Cross-validation (CV) is increasingly popular as a generic method to adjudicate between mathematical models of cognition and behavior. In order to measure model generalizability, CV quantifies out-of-sample predictive performance, and the CV preference goes to the model that predicted the out-of-sample data best. The advantages of CV include theoretic simplicity and practical feasibility. Despite its prominence, however, the limitations of CV are often underappreciated. Here, we demonstrate the limitations of a particular form of CV-Bayesian leave-one-out cross-validation or LOO-with three concrete examples. In each example, a data set of infinite size is perfectly in line with the predictions of a simple model (i.e., a general law or invariance). Nevertheless, LOO shows bounded and relatively modest support for the simple model. We conclude that CV is not a panacea for model selection.