Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 17.776
Filtrar
1.
Addict Biol ; 29(5): e13397, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38711205

RESUMEN

Neuronal ensembles in the medial prefrontal cortex mediate cocaine self-administration via projections to the nucleus accumbens. We have recently shown that neuronal ensembles in the prelimbic cortex form rapidly to mediate cocaine self-administration. However, the role of neuronal ensembles within the nucleus accumbens in initial cocaine-seeking behaviour remains unknown. Here, we sought to expand the current literature by testing the necessity of the cocaine self-administration ensemble in the nucleus accumbens core (NAcCore) 1 day after male and female rats acquire cocaine self-administration by using the Daun02 inactivation procedure. We found that disrupting the NAcCore ensembles after a no-cocaine reward-seeking test increased subsequent cocaine seeking, while disrupting NAcCore ensembles following a cocaine self-administration session decreased subsequent cocaine seeking. We then characterized neuronal cell type in the NAcCore using RNAscope in situ hybridization. In the no-cocaine session, we saw reduced dopamine D1 type neuronal activation, while in the cocaine self-administration session, we found preferential dopamine D1 type neuronal activity in the NAcCore.


Asunto(s)
Cocaína , Comportamiento de Búsqueda de Drogas , Neuronas , Núcleo Accumbens , Autoadministración , Animales , Núcleo Accumbens/efectos de los fármacos , Cocaína/farmacología , Masculino , Femenino , Ratas , Comportamiento de Búsqueda de Drogas/efectos de los fármacos , Neuronas/efectos de los fármacos , Recompensa , Inhibidores de Captación de Dopamina/farmacología , Refuerzo en Psicología , Receptores de Dopamina D1 , Trastornos Relacionados con Cocaína/fisiopatología , Ratas Sprague-Dawley , Corteza Prefrontal/efectos de los fármacos
2.
Elife ; 132024 May 07.
Artículo en Inglés | MEDLINE | ID: mdl-38711355

RESUMEN

Collaborative hunting, in which predators play different and complementary roles to capture prey, has been traditionally believed to be an advanced hunting strategy requiring large brains that involve high-level cognition. However, recent findings that collaborative hunting has also been documented in smaller-brained vertebrates have placed this previous belief under strain. Here, using computational multi-agent simulations based on deep reinforcement learning, we demonstrate that decisions underlying collaborative hunts do not necessarily rely on sophisticated cognitive processes. We found that apparently elaborate coordination can be achieved through a relatively simple decision process of mapping between states and actions related to distance-dependent internal representations formed by prior experience. Furthermore, we confirmed that this decision rule of predators is robust against unknown prey controlled by humans. Our computational ecological results emphasize that collaborative hunting can emerge in various intra- and inter-specific interactions in nature, and provide insights into the evolution of sociality.


From wolves to ants, many animals are known to be able to hunt as a team. This strategy may yield several advantages: going after bigger preys together, for example, can often result in individuals spending less energy and accessing larger food portions than when hunting alone. However, it remains unclear whether this behavior relies on complex cognitive processes, such as the ability for an animal to represent and anticipate the actions of its teammates. It is often thought that 'collaborative hunting' may require such skills, as this form of group hunting involves animals taking on distinct, tightly coordinated roles ­ as opposed to simply engaging in the same actions simultaneously. To better understand whether high-level cognitive skills are required for collaborative hunting, Tsutsui et al. used a type of artificial intelligence known as deep reinforcement learning. This allowed them to develop a computational model in which a small number of 'agents' had the opportunity to 'learn' whether and how to work together to catch a 'prey' under various conditions. To do so, the agents were only equipped with the ability to link distinct stimuli together, such as an event and a reward; this is similar to associative learning, a cognitive process which is widespread amongst animal species. The model showed that the challenge of capturing the prey when hunting alone, and the reward of sharing food after a successful hunt drove the agents to learn how to work together, with previous experiences shaping decisions made during subsequent hunts. Importantly, the predators started to exhibit the ability to take on distinct, complementary roles reminiscent of those observed during collaborative hunting, such as one agent chasing the prey while another ambushes it. Overall, the work by Tsutsui et al. challenges the traditional view that only organisms equipped with high-level cognitive processes can show refined collaborative approaches to hunting, opening the possibility that these behaviors may be more widespread than originally thought ­ including between animals of different species.


Asunto(s)
Aprendizaje Profundo , Conducta Predatoria , Refuerzo en Psicología , Animales , Conducta Cooperativa , Humanos , Simulación por Computador , Toma de Decisiones
3.
Proc Natl Acad Sci U S A ; 121(20): e2316658121, 2024 May 14.
Artículo en Inglés | MEDLINE | ID: mdl-38717856

RESUMEN

Individual survival and evolutionary selection require biological organisms to maximize reward. Economic choice theories define the necessary and sufficient conditions, and neuronal signals of decision variables provide mechanistic explanations. Reinforcement learning (RL) formalisms use predictions, actions, and policies to maximize reward. Midbrain dopamine neurons code reward prediction errors (RPE) of subjective reward value suitable for RL. Electrical and optogenetic self-stimulation experiments demonstrate that monkeys and rodents repeat behaviors that result in dopamine excitation. Dopamine excitations reflect positive RPEs that increase reward predictions via RL; against increasing predictions, obtaining similar dopamine RPE signals again requires better rewards than before. The positive RPEs drive predictions higher again and thus advance a recursive reward-RPE-prediction iteration toward better and better rewards. Agents also avoid dopamine inhibitions that lower reward prediction via RL, which allows smaller rewards than before to elicit positive dopamine RPE signals and resume the iteration toward better rewards. In this way, dopamine RPE signals serve a causal mechanism that attracts agents via RL to the best rewards. The mechanism improves daily life and benefits evolutionary selection but may also induce restlessness and greed.


Asunto(s)
Dopamina , Neuronas Dopaminérgicas , Recompensa , Animales , Dopamina/metabolismo , Neuronas Dopaminérgicas/fisiología , Neuronas Dopaminérgicas/metabolismo , Humanos , Refuerzo en Psicología
4.
Appetite ; 198: 107355, 2024 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-38621593

RESUMEN

Associative learning can drive many different types of behaviors, including food consumption. Previous studies have shown that cues paired with food delivery while mice are hungry will lead to increased consumption in the presence of those cues at later times. We previously showed that overconsumption can be driven in male mice by contextual cues, using chow pellets. Here we extended our findings by examining other parameters that may influence the outcome of context-conditioned overconsumption training. We found that the task worked equally well in males and females, and that palatable substances such as high-fat diet and Ensure chocolate milkshake supported learning and induced overconsumption. Surprisingly, mice did not overconsume when sucrose was used as the reinforcer during training, suggesting that nutritional content is a critical factor. Interestingly, we also observed that diet-induced obese mice did not learn the task. Overall, we find that context-conditioned overconsumption can be studied in lean male and female mice, and with multiple reinforcer types.


Asunto(s)
Señales (Psicología) , Dieta Alta en Grasa , Ratones Endogámicos C57BL , Obesidad , Animales , Masculino , Femenino , Obesidad/etiología , Obesidad/psicología , Ratones , Refuerzo en Psicología , Ratones Obesos , Hiperfagia/psicología , Conducta Alimentaria/psicología , Sacarosa/administración & dosificación , Delgadez/psicología
5.
Drug Alcohol Depend ; 258: 111282, 2024 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-38593731

RESUMEN

The adulteration of illicit fentanyl with the alpha-2 agonist xylazine has been designated an emerging public health threat. The clinical rationale for combining fentanyl with xylazine is currently unclear, and the inability to study fentanyl/xylazine interactions in humans warrants the need for preclinical research. We studied fentanyl and xylazine pharmacodynamic and pharmacokinetic interactions in male and female rats using drug self-administration behavioral economic methods. Fentanyl, but not xylazine, functioned as a reinforcer under both fixed-ratio and progressive-ratio drug self-administration procedures. Xylazine combined with fentanyl at three fixed dose-proportion mixtures did not significantly alter fentanyl reinforcement as measured using behavioral economic analyses. Xylazine produced a proportion-dependent decrease in the behavioral economic Q0 endpoint compared to fentanyl alone. However, xylazine did not significantly alter fentanyl self-administration at FR1. Fentanyl and xylazine co-administration did not result in changes to pharmacokinetic endpoints. The present results demonstrate that xylazine does not enhance the addictive effects of fentanyl or alter fentanyl plasma concentrations. The premise for why illicitly manufacture fentanyl has been adulterated with xylazine remains to be determined.


Asunto(s)
Fentanilo , Refuerzo en Psicología , Autoadministración , Xilazina , Fentanilo/farmacología , Animales , Xilazina/farmacología , Ratas , Masculino , Femenino , Economía del Comportamiento , Ratas Sprague-Dawley , Esquema de Refuerzo , Agonistas de Receptores Adrenérgicos alfa 2/farmacología , Analgésicos Opioides , Condicionamiento Operante/efectos de los fármacos
6.
Drug Alcohol Depend ; 258: 111280, 2024 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-38614019

RESUMEN

The most prevalent psychoactive chemical in tobacco smoke is nicotine, which has been shown to maintain tobacco consumption as well as cause acute adverse effects at high doses, like nausea and emesis. Recent studies in laboratory animals have suggested that many non-nicotine constituents of tobacco smoke (e.g., minor tobacco alkaloids) may also contribute to tobacco's overall reinforcing and adverse effects. Here, we used intravenous (IV) self-administration (n = 3) and observation (n = 4) procedures in squirrel monkeys to, respectively, compare the reinforcing and adverse observable effects of nicotine and three prominent minor tobacco alkaloids, nornicotine, anatabine, and myosmine. In self-administration studies, male squirrel monkeys were trained to respond under a second-order fixed-interval schedule of reinforcement and dose-effects functions for nicotine and each of the minor tobacco alkaloids nornicotine, anatabine, and mysomine were determined. Observation studies were conducted in a different group of male squirrel monkeys to quantify the ability of nicotine, nornicotine, anatabine, and mysomine to produce adverse overt effects, including hypersalivation, emesis, and tremors. Results show that nicotine and to a lesser extent nornicotine were readily self-administered, whereas anatabine and myosmine were not. In observation studies, all minor tobacco alkaloids produced adverse observable effects that were either comparable or more pronounced than nicotine. Collectively, the present results showing that nicotine and the minor tobacco alkaloids nornicotine, anatabine, and myosmine produce differential reinforcing and acute adverse observable effects in monkeys provides further evidence that these constituents may differently contribute to the psychopharmacological and adverse effects of tobacco consumption.


Asunto(s)
Alcaloides , Nicotiana , Nicotina , Refuerzo en Psicología , Saimiri , Autoadministración , Animales , Masculino , Relación Dosis-Respuesta a Droga , Condicionamiento Operante/efectos de los fármacos
7.
J Exp Anal Behav ; 121(3): 346-357, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38604980

RESUMEN

Efficient methods for assessing the relative aversiveness of stimuli are sparse and underresearched. Having access to efficient procedures that can identify aversive stimuli would benefit researchers and practitioners alike. Across three experiments, 13 participants helped to pilot, refine, and test two approaches to identifying negative reinforcers. The first experiment presented two conditions, one in which computerized button pressing started or stopped one of two recorded infant cries (or silence, when the control button was selected). Choices were presented either in a modified observing-response procedure (i.e., simultaneous observing) or in a modified progressive-ratio procedure (i.e., committed concurrent progressive ratio; CCPR). Results were favorable though not conclusive on their own. A second experiment, using more distinct stimuli (i.e., one likely aversive, one likely not aversive), replicated the first, and clearer results emerged. Finally, the third experiment tested the stimuli from the second experiment in a CCPR arrangement where sound was terminated contingent on responding and idiosyncratic negative reinforcement hierarchies emerged. The utility of these two procedures is discussed, and future work that addresses the limitations is outlined.


Asunto(s)
Refuerzo en Psicología , Humanos , Masculino , Femenino , Esquema de Refuerzo , Adulto , Condicionamiento Operante , Conducta de Elección , Adulto Joven
8.
Accid Anal Prev ; 201: 107570, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38614052

RESUMEN

To improve the traffic safety and efficiency of freeway tunnels, this study proposes a novel variable speed limit (VSL) control strategy based on the model-based reinforcement learning framework (MBRL) with safety perception. The MBRL framework is designed by developing a multi-lane cell transmission model for freeway tunnels as an environment model, which is built so that agents can interact with the environment model while interacting with the real environment to improve the sampling efficiency of reinforcement learning. Based on a real-time crash risk prediction model for freeway tunnels that uses random deep and cross networks, the safety perception function inside the MBRL framework is developed. The reinforcement learning components fully account for most current tunnels' application conditions, and the VSL control agent is trained using a deep dyna-Q method. The control process uses a safety trigger mechanism to reduce the likelihood of crashes caused by frequent changes in speed. The efficacy of the proposed VSL strategies is validated through simulation experiments. The results show that the proposed VSL strategies significantly increase traffic safety performance by between 16.00% and 20.00% and traffic efficiency by between 3.00% and 6.50% compared to a fixed speed limit approach. Notably, the proposed strategies outperform traditional VSL strategy based on the traffic flow prediction model in terms of traffic safety and efficiency improvement, and they also outperform the VSL strategy based on model-free reinforcement learning framework when sampling efficiency is considered together. In addition, the proposed strategies with safety triggers are safer than those without safety triggers. These findings demonstrate the potential for MBRL-based VSL strategies to improve traffic safety and efficiency within freeway tunnels.


Asunto(s)
Accidentes de Tránsito , Conducción de Automóvil , Refuerzo en Psicología , Seguridad , Accidentes de Tránsito/prevención & control , Humanos , Conducción de Automóvil/psicología , Planificación Ambiental , Simulación por Computador , Modelos Teóricos
9.
Behav Ther ; 55(3): 513-527, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38670665

RESUMEN

Tic disorders are a class of neurodevelopmental disorders characterized by involuntary motor and/or vocal tics. It has been hypothesized that tics function to reduce aversive premonitory urges (i.e., negative reinforcement) and that suppression-based behavioral interventions such as habit reversal training (HRT) and exposure and response prevention (ERP) disrupt this process and facilitate urge reduction through habituation. However, previous findings regarding the negative reinforcement hypothesis and the effect of suppression on the urge-tic relationship have been inconsistent. The present study applied a dynamical systems framework and within-subject time-series autoregressive models to examine the temporal dynamics of urges and tics and assess whether their relationship changes over time. Eleven adults with tic disorders provided continuous urge ratings during separate conditions in which they were instructed to tic freely or to suppress tics. During the free-to-tic conditions, there was considerable heterogeneity across participants in whether and how the urge-tic relationship followed a pattern consistent with the automatic negative reinforcement hypothesis. Further, little evidence for within-session habituation was seen; tic suppression did not result in a reduction in premonitory urges for most participants. Analysis of broader urge change metrics did show significant disruption to the urge pattern during suppression, which has implications for the current biobehavioral model of tics.


Asunto(s)
Modelos Psicológicos , Trastornos de Tic , Humanos , Trastornos de Tic/psicología , Trastornos de Tic/terapia , Femenino , Adulto , Masculino , Terapia Conductista/métodos , Refuerzo en Psicología , Adulto Joven , Hábitos , Persona de Mediana Edad
10.
Proc Natl Acad Sci U S A ; 121(16): e2303165121, 2024 Apr 16.
Artículo en Inglés | MEDLINE | ID: mdl-38607932

RESUMEN

Antimicrobial resistance was estimated to be associated with 4.95 million deaths worldwide in 2019. It is possible to frame the antimicrobial resistance problem as a feedback-control problem. If we could optimize this feedback-control problem and translate our findings to the clinic, we could slow, prevent, or reverse the development of high-level drug resistance. Prior work on this topic has relied on systems where the exact dynamics and parameters were known a priori. In this study, we extend this work using a reinforcement learning (RL) approach capable of learning effective drug cycling policies in a system defined by empirically measured fitness landscapes. Crucially, we show that it is possible to learn effective drug cycling policies despite the problems of noisy, limited, or delayed measurement. Given access to a panel of 15 [Formula: see text]-lactam antibiotics with which to treat the simulated Escherichia coli population, we demonstrate that RL agents outperform two naive treatment paradigms at minimizing the population fitness over time. We also show that RL agents approach the performance of the optimal drug cycling policy. Even when stochastic noise is introduced to the measurements of population fitness, we show that RL agents are capable of maintaining evolving populations at lower growth rates compared to controls. We further tested our approach in arbitrary fitness landscapes of up to 1,024 genotypes. We show that minimization of population fitness using drug cycles is not limited by increasing genome size. Our work represents a proof-of-concept for using AI to control complex evolutionary processes.


Asunto(s)
Antiinfecciosos , Aprendizaje , Refuerzo en Psicología , Farmacorresistencia Microbiana , Ciclismo , Escherichia coli/genética
11.
PLoS Comput Biol ; 20(4): e1011516, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38626219

RESUMEN

When facing an unfamiliar environment, animals need to explore to gain new knowledge about which actions provide reward, but also put the newly acquired knowledge to use as quickly as possible. Optimal reinforcement learning strategies should therefore assess the uncertainties of these action-reward associations and utilise them to inform decision making. We propose a novel model whereby direct and indirect striatal pathways act together to estimate both the mean and variance of reward distributions, and mesolimbic dopaminergic neurons provide transient novelty signals, facilitating effective uncertainty-driven exploration. We utilised electrophysiological recording data to verify our model of the basal ganglia, and we fitted exploration strategies derived from the neural model to data from behavioural experiments. We also compared the performance of directed exploration strategies inspired by our basal ganglia model with other exploration algorithms including classic variants of upper confidence bound (UCB) strategy in simulation. The exploration strategies inspired by the basal ganglia model can achieve overall superior performance in simulation, and we found qualitatively similar results in fitting model to behavioural data compared with the fitting of more idealised normative models with less implementation level detail. Overall, our results suggest that transient dopamine levels in the basal ganglia that encode novelty could contribute to an uncertainty representation which efficiently drives exploration in reinforcement learning.


Asunto(s)
Ganglios Basales , Dopamina , Modelos Neurológicos , Recompensa , Dopamina/metabolismo , Dopamina/fisiología , Incertidumbre , Animales , Ganglios Basales/fisiología , Conducta Exploratoria/fisiología , Refuerzo en Psicología , Neuronas Dopaminérgicas/fisiología , Biología Computacional , Simulación por Computador , Masculino , Algoritmos , Toma de Decisiones/fisiología , Conducta Animal/fisiología , Ratas
12.
Proc Natl Acad Sci U S A ; 121(15): e2317618121, 2024 Apr 09.
Artículo en Inglés | MEDLINE | ID: mdl-38557193

RESUMEN

Throughout evolution, bacteria and other microorganisms have learned efficient foraging strategies that exploit characteristic properties of their unknown environment. While much research has been devoted to the exploration of statistical models describing the dynamics of foraging bacteria and other (micro-) organisms, little is known, regarding the question of how good the learned strategies actually are. This knowledge gap is largely caused by the absence of methods allowing to systematically develop alternative foraging strategies to compare with. In the present work, we use deep reinforcement learning to show that a smart run-and-tumble agent, which strives to find nutrients for its survival, learns motion patterns that are remarkably similar to the trajectories of chemotactic bacteria. Strikingly, despite this similarity, we also find interesting differences between the learned tumble rate distribution and the one that is commonly assumed for the run and tumble model. We find that these differences equip the agent with significant advantages regarding its foraging and survival capabilities. Our results uncover a generic route to use deep reinforcement learning for discovering search and collection strategies that exploit characteristic but initially unknown features of the environment. These results can be used, e.g., to program future microswimmers, nanorobots, and smart active particles for tasks like searching for cancer cells, micro-waste collection, or environmental remediation.


Asunto(s)
Aprendizaje , Refuerzo en Psicología , Modelos Estadísticos , Movimiento (Física) , Bacterias
13.
Sci Robot ; 9(89): eadi9579, 2024 Apr 17.
Artículo en Inglés | MEDLINE | ID: mdl-38630806

RESUMEN

Humanoid robots that can autonomously operate in diverse environments have the potential to help address labor shortages in factories, assist elderly at home, and colonize new planets. Although classical controllers for humanoid robots have shown impressive results in a number of settings, they are challenging to generalize and adapt to new environments. Here, we present a fully learning-based approach for real-world humanoid locomotion. Our controller is a causal transformer that takes the history of proprioceptive observations and actions as input and predicts the next action. We hypothesized that the observation-action history contains useful information about the world that a powerful transformer model can use to adapt its behavior in context, without updating its weights. We trained our model with large-scale model-free reinforcement learning on an ensemble of randomized environments in simulation and deployed it to the real-world zero-shot. Our controller could walk over various outdoor terrains, was robust to external disturbances, and could adapt in context.


Asunto(s)
Robótica , Humanos , Anciano , Robótica/métodos , Locomoción , Caminata , Aprendizaje , Refuerzo en Psicología
14.
Elife ; 122024 Apr 02.
Artículo en Inglés | MEDLINE | ID: mdl-38562050

RESUMEN

In the unpredictable Anthropocene, a particularly pressing open question is how certain species invade urban environments. Sex-biased dispersal and learning arguably influence movement ecology, but their joint influence remains unexplored empirically, and might vary by space and time. We assayed reinforcement learning in wild-caught, temporarily captive core-, middle-, or edge-range great-tailed grackles-a bird species undergoing urban-tracking rapid range expansion, led by dispersing males. We show, across populations, both sexes initially perform similarly when learning stimulus-reward pairings, but, when reward contingencies reverse, male-versus female-grackles finish 'relearning' faster, making fewer choice-option switches. How do male grackles do this? Bayesian cognitive modelling revealed male grackles' choice behaviour is governed more strongly by the 'weight' of relative differences in recent foraging payoffs-i.e., they show more pronounced risk-sensitive learning. Confirming this mechanism, agent-based forward simulations of reinforcement learning-where we simulate 'birds' based on empirical estimates of our grackles' reinforcement learning-replicate our sex-difference behavioural data. Finally, evolutionary modelling revealed natural selection should favour risk-sensitive learning in hypothesised urban-like environments: stable but stochastic settings. Together, these results imply risk-sensitive learning is a winning strategy for urban-invasion leaders, underscoring the potential for life history and cognition to shape invasion success in human-modified environments.


Asunto(s)
Aprendizaje , Passeriformes , Animales , Humanos , Femenino , Masculino , Teorema de Bayes , Cognición , Refuerzo en Psicología
15.
PLoS One ; 19(4): e0300842, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38598429

RESUMEN

Maze-solving is a classical mathematical task, and is recently analogously achieved using various eccentric media and devices, such as living tissues, chemotaxis, and memristors. Plasma generated in a labyrinth of narrow channels can also play a role as a route finder to the exit. In this study, we experimentally observe the function of maze-route findings in a plasma system based on a mixed discharge scheme of direct-current (DC) volume mode and alternative-current (AC) surface dielectric-barrier discharge, and computationally generalize this function in a reinforcement-learning model. In our plasma system, we install two electrodes at the entry and the exit in a square lattice configuration of narrow channels whose cross section is 1×1 mm2 with the total length around ten centimeters. Visible emissions in low-pressure Ar gas are observed after plasma ignition, and the plasma starting from a given entry location reaches the exit as the discharge voltage increases, whose route converging level is quantified by Shannon entropy. A similar short-path route is reproduced in a reinforcement-learning model in which electric potentials through the discharge voltage is replaced by rewards with positive and negative sign or polarity. The model is not rigorous numerical representation of plasma simulation, but it shares common points with the experiments along with a rough sketch of underlying processes (charges in experiments and rewards in modelling). This finding indicates that a plasma-channel network works in an analog computing function similar to a reinforcement-learning algorithm slightly modified in this study.


Asunto(s)
Líquidos Corporales , Refuerzo en Psicología , Recompensa , Plasma , Algoritmos
16.
PLoS Comput Biol ; 20(4): e1012057, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38669280

RESUMEN

Policy compression is a computational framework that describes how capacity-limited agents trade reward for simpler action policies to reduce cognitive cost. In this study, we present behavioral evidence that humans prefer simpler policies, as predicted by a capacity-limited reinforcement learning model. Across a set of tasks, we find that people exploit structure in the relationships between states, actions, and rewards to "compress" their policies. In particular, compressed policies are systematically biased towards actions with high marginal probability, thereby discarding some state information. This bias is greater when there is redundancy in the reward-maximizing action policy across states, and increases with memory load. These results could not be explained qualitatively or quantitatively by models that did not make use of policy compression under a capacity limit. We also confirmed the prediction that time pressure should further reduce policy complexity and increase action bias, based on the hypothesis that actions are selected via time-dependent decoding of a compressed code. These findings contribute to a deeper understanding of how humans adapt their decision-making strategies under cognitive resource constraints.


Asunto(s)
Toma de Decisiones , Recompensa , Humanos , Toma de Decisiones/fisiología , Biología Computacional , Masculino , Adulto , Femenino , Refuerzo en Psicología , Modelos Psicológicos , Adulto Joven , Cognición/fisiología
17.
J Exp Anal Behav ; 121(3): 389-398, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38561597

RESUMEN

We developed and examined a laboratory preparation with adult humans that pits shorter term avoidance over longer term positive reinforcement and may serve as a useful laboratory functional analogue of problematic behavior. Participants were exposed to choices between (1) avoiding an aversive sound and acquiring no money or (2) listening to an aversive sound for a set duration and then receiving money. The first choice, avoiding an aversive sound and acquiring no money, was conceptualized as immediate negative reinforcement and no positive reinforcement, whereas the latter choice, listening to an aversive sound for a set duration and then receiving money, was conceptualized as a potential positive punisher paired with a larger later positive reinforcer. We manipulated the duration of the sound and the magnitude of money to identify the point at which individual participants' choices changed from avoiding the sound to choosing the sound plus money. As the sound duration increased, the choice of listening to the sound and receiving money decreased. Similar functions were observed with two different monetary magnitudes. The model has potential applicability to real-world problems such as smoking, addiction, gambling, anxiety disorders, and other impulse control disorders.


Asunto(s)
Refuerzo en Psicología , Humanos , Masculino , Femenino , Adulto , Conducta de Elección , Adulto Joven , Descuento por Demora , Estimulación Acústica , Sonido , Reacción de Prevención , Recompensa
18.
Neurobiol Learn Mem ; 211: 107924, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38579896

RESUMEN

We and other animals learn because there is some aspect of the world about which we are uncertain. This uncertainty arises from initial ignorance, and from changes in the world that we do not perfectly know; the uncertainty often becomes evident when our predictions about the world are found to be erroneous. The Rescorla-Wagner learning rule, which specifies one way that prediction errors can occasion learning, has been hugely influential as a characterization of Pavlovian conditioning and, through its equivalence to the delta rule in engineering, in a much wider class of learning problems. Here, we review the embedding of the Rescorla-Wagner rule in a Bayesian context that is precise about the link between uncertainty and learning, and thereby discuss extensions to such suggestions as the Kalman filter, structure learning, and beyond, that collectively encompass a wider range of uncertainties and accommodate a wider assortment of phenomena in conditioning.


Asunto(s)
Teorema de Bayes , Condicionamiento Clásico , Refuerzo en Psicología , Animales , Condicionamiento Clásico/fisiología , Incertidumbre , Humanos , Aprendizaje/fisiología , Modelos Psicológicos
19.
J Exp Anal Behav ; 121(3): 327-345, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38629655

RESUMEN

Can simple choice conditional-discrimination choice be accounted for by recent quantitative models of combined stimulus and reinforcer control? In Experiment 1, two sets of five blackout durations, one using shorter intervals and one using longer intervals, conditionally signaled which subsequent choice response might provide food. In seven conditions, the distribution of blackout durations across the sets was varied. An updated version of the generalization-across-dimensions model nicely described the way that choice changed across durations. In Experiment 2, just two blackout durations acted as the conditional stimuli and the durations were varied over 10 conditions. The parameters of the model obtained in Experiment 1 failed adequately to predict choice in Experiment 2, but the model again fitted the data nicely. The failure to predict the Experiment 2 data from the Experiment 1 parameters occurred because in Experiment 1 differential control by reinforcer locations progressively decreased with blackout durations, whereas in Experiment 2 this control remained constant. These experiments extend the ability of the model to describe data from procedures based on concurrent schedules in which reinforcer ratios reverse at fixed times to those from conditional-discrimination procedures. Further research is needed to understand why control by reinforcer location differed between the two experiments.


Asunto(s)
Conducta de Elección , Aprendizaje Discriminativo , Generalización Psicológica , Modelos Psicológicos , Esquema de Refuerzo , Animales , Refuerzo en Psicología , Condicionamiento Operante , Discriminación en Psicología , Columbidae , Factores de Tiempo
20.
Neuropharmacology ; 252: 109947, 2024 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-38631564

RESUMEN

A growing body of research indicates that ß-caryophyllene (BCP), a constituent present in a large number of plants, possesses significant therapeutic properties against CNS disorders, including alcohol and psychostimulant use disorders. However, it is unknown whether BCP has similar therapeutic potential for opioid use disorders. In this study, we found that systemic administration of BCP dose-dependently reduced heroin self-administration in rats under an FR2 schedule of reinforcement and partially blocked heroin-enhanced brain stimulation reward in DAT-cre mice, maintained by optical stimulation of midbrain dopamine neurons at high frequencies. Acute administration of BCP failed to block heroin conditioned place preference (CPP) in male mice, but attenuated heroin-induced CPP in females. Furthermore, repeated dosing with BCP for 5 days facilitated the extinction of CPP in female but not male mice. In the hot plate assay, pretreatment with the same doses of BCP failed to enhance or prolong opioid antinociception. Lastly, in a substitution test, BCP replacement for heroin failed to maintain intravenous BCP self-administration, suggesting that BCP itself has no reinforcing properties. These findings suggest that BCP may have certain therapeutic effects against opioid use disorders with fewer unwanted side-effects by itself.


Asunto(s)
Heroína , Sesquiterpenos Policíclicos , Autoadministración , Animales , Masculino , Heroína/administración & dosificación , Sesquiterpenos Policíclicos/farmacología , Sesquiterpenos Policíclicos/administración & dosificación , Femenino , Ratones , Ratas , Analgésicos Opioides/farmacología , Analgésicos Opioides/administración & dosificación , Sesquiterpenos/farmacología , Sesquiterpenos/administración & dosificación , Ratas Sprague-Dawley , Relación Dosis-Respuesta a Droga , Condicionamiento Operante/efectos de los fármacos , Extinción Psicológica/efectos de los fármacos , Refuerzo en Psicología , Recompensa , Ratones Transgénicos , Nocicepción/efectos de los fármacos , Ratones Endogámicos C57BL
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...