Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 2.921
Filtrar
1.
Cell ; 187(6): 1476-1489.e21, 2024 Mar 14.
Artículo en Inglés | MEDLINE | ID: mdl-38401541

RESUMEN

Attention filters sensory inputs to enhance task-relevant information. It is guided by an "attentional template" that represents the stimulus features that are currently relevant. To understand how the brain learns and uses templates, we trained monkeys to perform a visual search task that required them to repeatedly learn new attentional templates. Neural recordings found that templates were represented across the prefrontal and parietal cortex in a structured manner, such that perceptually neighboring templates had similar neural representations. When the task changed, a new attentional template was learned by incrementally shifting the template toward rewarded features. Finally, we found that attentional templates transformed stimulus features into a common value representation that allowed the same decision-making mechanisms to deploy attention, regardless of the identity of the template. Altogether, our results provide insight into the neural mechanisms by which the brain learns to control attention and how attention can be flexibly deployed across tasks.


Asunto(s)
Atención , Toma de Decisiones , Aprendizaje , Lóbulo Parietal , Recompensa , Animales , Haplorrinos
2.
Cell ; 184(10): 2733-2749.e16, 2021 05 13.
Artículo en Inglés | MEDLINE | ID: mdl-33861952

RESUMEN

Significant evidence supports the view that dopamine shapes learning by encoding reward prediction errors. However, it is unknown whether striatal targets receive tailored dopamine dynamics based on regional functional specialization. Here, we report wave-like spatiotemporal activity patterns in dopamine axons and release across the dorsal striatum. These waves switch between activational motifs and organize dopamine transients into localized clusters within functionally related striatal subregions. Notably, wave trajectories were tailored to task demands, propagating from dorsomedial to dorsolateral striatum when rewards are contingent on animal behavior and in the opponent direction when rewards are independent of behavioral responses. We propose a computational architecture in which striatal dopamine waves are sculpted by inference about agency and provide a mechanism to direct credit assignment to specialized striatal subregions. Supporting model predictions, dorsomedial dopamine activity during reward-pursuit signaled the extent of instrumental control and interacted with reward waves to predict future behavioral adjustments.


Asunto(s)
Axones/metabolismo , Conducta Animal , Cuerpo Estriado/metabolismo , Dopamina/metabolismo , Recompensa , Animales , Femenino , Masculino , Ratones , Ratones Mutantes
3.
Cell ; 182(1): 112-126.e18, 2020 07 09.
Artículo en Inglés | MEDLINE | ID: mdl-32504542

RESUMEN

Every decision we make is accompanied by a sense of confidence about its likely outcome. This sense informs subsequent behavior, such as investing more-whether time, effort, or money-when reward is more certain. A neural representation of confidence should originate from a statistical computation and predict confidence-guided behavior. An additional requirement for confidence representations to support metacognition is abstraction: they should emerge irrespective of the source of information and inform multiple confidence-guided behaviors. It is unknown whether neural confidence signals meet these criteria. Here, we show that single orbitofrontal cortex neurons in rats encode statistical decision confidence irrespective of the sensory modality, olfactory or auditory, used to make a choice. The activity of these neurons also predicts two confidence-guided behaviors: trial-by-trial time investment and cross-trial choice strategy updating. Orbitofrontal cortex thus represents decision confidence consistent with a metacognitive process that is useful for mediating confidence-guided economic decisions.


Asunto(s)
Conducta/fisiología , Corteza Prefrontal/fisiología , Animales , Conducta de Elección/fisiología , Toma de Decisiones , Modelos Biológicos , Neuronas/fisiología , Ratas Long-Evans , Sensación/fisiología , Análisis y Desempeño de Tareas , Factores de Tiempo
4.
Annu Rev Neurosci ; 45: 317-337, 2022 07 08.
Artículo en Inglés | MEDLINE | ID: mdl-35363533

RESUMEN

Nervous systems evolved to effectively navigate the dynamics of the environment to achieve their goals. One framework used to study this fundamental problem arose in the study of learning and decision-making. In this framework, the demands of effective behavior require slow dynamics-on the scale of seconds to minutes-of networks of neurons. Here, we review the phenomena and mechanisms involved. Using vignettes from a few species and areas of the nervous system, we view neuromodulators as key substrates for temporal scaling of neuronal dynamics.


Asunto(s)
Toma de Decisiones , Neurofisiología , Toma de Decisiones/fisiología , Aprendizaje/fisiología , Neuronas/fisiología , Neurotransmisores
5.
Trends Biochem Sci ; 49(4): 286-289, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38341333

RESUMEN

Eukaryotic cells learn and adapt via unknown network architectures. Recent work demonstrated a circuit of two GTPases used by cells to overcome growth factor scarcity, encouraging our view that artificial and biological intelligence share strikingly similar design principles and that cells function as deep reinforcement learning (RL) agents in uncertain environments.


Asunto(s)
GTP Fosfohidrolasas , Transducción de Señal , GTP Fosfohidrolasas/metabolismo
6.
Annu Rev Neurosci ; 43: 485-507, 2020 07 08.
Artículo en Inglés | MEDLINE | ID: mdl-32303147

RESUMEN

Behavior is readily classified into patterns of movements with inferred common goals-actions. Goals may be discrete; movements are continuous. Through the careful study of isolated movements in laboratory settings, or via introspection, it has become clear that animals can exhibit exquisite graded specification to their movements. Moreover, graded control can be as fundamental to success as the selection of which action to perform under many naturalistic scenarios: a predator adjusting its speed to intercept moving prey, or a tool-user exerting the perfect amount of force to complete a delicate task. The basal ganglia are a collection of nuclei in vertebrates that extend from the forebrain (telencephalon) to the midbrain (mesencephalon), constituting a major descending extrapyramidal pathway for control over midbrain and brainstem premotor structures. Here we discuss how this pathway contributes to the continuous specification of movements that endows our voluntary actions with vigor and grace.


Asunto(s)
Ganglios Basales/fisiología , Conducta/fisiología , Encéfalo/fisiología , Movimiento/fisiología , Vías Nerviosas/fisiología , Animales , Humanos , Neuronas/fisiología
7.
Proc Natl Acad Sci U S A ; 121(8): e2310238121, 2024 Feb 20.
Artículo en Inglés | MEDLINE | ID: mdl-38359294

RESUMEN

The adaptive and surprising emergent properties of biological materials self-assembled in far-from-equilibrium environments serve as an inspiration for efforts to design nanomaterials. In particular, controlling the conditions of self-assembly can modulate material properties, but there is no systematic understanding of either how to parameterize external control or how controllable a given material can be. Here, we demonstrate that branched actin networks can be encoded with metamaterial properties by dynamically controlling the applied force under which they grow and that the protocols can be selected using multi-task reinforcement learning. These actin networks have tunable responses over a large dynamic range depending on the chosen external protocol, providing a pathway to encoding "memory" within these structures. Interestingly, we obtain a bound that relates the dissipation rate and the rate of "encoding" that gives insight into the constraints on control-both physical and information theoretical. Taken together, these results emphasize the utility and necessity of nonequilibrium control for designing self-assembled nanostructures.


Asunto(s)
Actinas , Nanoestructuras , Actinas/metabolismo , Nanoestructuras/química
8.
Proc Natl Acad Sci U S A ; 121(20): e2316658121, 2024 May 14.
Artículo en Inglés | MEDLINE | ID: mdl-38717856

RESUMEN

Individual survival and evolutionary selection require biological organisms to maximize reward. Economic choice theories define the necessary and sufficient conditions, and neuronal signals of decision variables provide mechanistic explanations. Reinforcement learning (RL) formalisms use predictions, actions, and policies to maximize reward. Midbrain dopamine neurons code reward prediction errors (RPE) of subjective reward value suitable for RL. Electrical and optogenetic self-stimulation experiments demonstrate that monkeys and rodents repeat behaviors that result in dopamine excitation. Dopamine excitations reflect positive RPEs that increase reward predictions via RL; against increasing predictions, obtaining similar dopamine RPE signals again requires better rewards than before. The positive RPEs drive predictions higher again and thus advance a recursive reward-RPE-prediction iteration toward better and better rewards. Agents also avoid dopamine inhibitions that lower reward prediction via RL, which allows smaller rewards than before to elicit positive dopamine RPE signals and resume the iteration toward better rewards. In this way, dopamine RPE signals serve a causal mechanism that attracts agents via RL to the best rewards. The mechanism improves daily life and benefits evolutionary selection but may also induce restlessness and greed.


Asunto(s)
Dopamina , Neuronas Dopaminérgicas , Recompensa , Animales , Dopamina/metabolismo , Neuronas Dopaminérgicas/fisiología , Neuronas Dopaminérgicas/metabolismo , Humanos , Refuerzo en Psicología
9.
Proc Natl Acad Sci U S A ; 121(14): e2318521121, 2024 Apr 02.
Artículo en Inglés | MEDLINE | ID: mdl-38551832

RESUMEN

During foraging behavior, action values are persistently encoded in neural activity and updated depending on the history of choice outcomes. What is the neural mechanism for action value maintenance and updating? Here, we explore two contrasting network models: synaptic learning of action value versus neural integration. We show that both models can reproduce extant experimental data, but they yield distinct predictions about the underlying biological neural circuits. In particular, the neural integrator model but not the synaptic model requires that reward signals are mediated by neural pools selective for action alternatives and their projections are aligned with linear attractor axes in the valuation system. We demonstrate experimentally observable neural dynamical signatures and feasible perturbations to differentiate the two contrasting scenarios, suggesting that the synaptic model is a more robust candidate mechanism. Overall, this work provides a modeling framework to guide future experimental research on probabilistic foraging.


Asunto(s)
Conducta de Elección , Recompensa , Encéfalo , Aprendizaje , Plasticidad Neuronal , Toma de Decisiones
10.
Proc Natl Acad Sci U S A ; 121(12): e2304866121, 2024 Mar 19.
Artículo en Inglés | MEDLINE | ID: mdl-38483992

RESUMEN

Accelerating the measurement for discrimination of samples, such as classification of cell phenotype, is crucial when faced with significant time and cost constraints. Spontaneous Raman microscopy offers label-free, rich chemical information but suffers from long acquisition time due to extremely small scattering cross-sections. One possible approach to accelerate the measurement is by measuring necessary parts with a suitable number of illumination points. However, how to design these points during measurement remains a challenge. To address this, we developed an imaging technique based on a reinforcement learning in machine learning (ML). This ML approach adaptively feeds back "optimal" illumination pattern during the measurement to detect the existence of specific characteristics of interest, allowing faster measurements while guaranteeing discrimination accuracy. Using a set of Raman images of human follicular thyroid and follicular thyroid carcinoma cells, we showed that our technique requires 3,333 to 31,683 times smaller number of illuminations for discriminating the phenotypes than raster scanning. To quantitatively evaluate the number of illuminations depending on the requisite discrimination accuracy, we prepared a set of polymer bead mixture samples to model anomalous and normal tissues. We then applied a home-built programmable-illumination microscope equipped with our algorithm, and confirmed that the system can discriminate the sample conditions with 104 to 4,350 times smaller number of illuminations compared to standard point illumination Raman microscopy. The proposed algorithm can be applied to other types of microscopy that can control measurement condition on the fly, offering an approach for the acceleration of accurate measurements in various applications including medical diagnosis.


Asunto(s)
Microscopía , Espectrometría Raman , Humanos , Microscopía/métodos , Espectrometría Raman/métodos , Glándula Tiroides , Microscopía Óptica no Lineal , Aprendizaje Automático
11.
Proc Natl Acad Sci U S A ; 121(12): e2317751121, 2024 Mar 19.
Artículo en Inglés | MEDLINE | ID: mdl-38489382

RESUMEN

Do people's attitudes toward the (a)symmetry of an outcome distribution affect their choices? Financial investors seek return distributions with frequent small returns but few large ones, consistent with leading models of choice in economics and finance that assume right-skewed preferences. In contrast, many experiments in which decision-makers learn about choice options through experience find the opposite choice tendency, in favor of left-skewed options. To reconcile these seemingly contradicting findings, the present work investigates the effect of skewness on choices in experience-based decisions. Across seven studies, we show that apparent preferences for left-skewed outcome distributions are a consequence of those distributions having a higher value in most direct outcome comparisons, a "frequent-winner effect." By manipulating which option is the frequent winner, we show that choice tendencies for frequent winners can be obtained even with identical outcome distributions. Moreover, systematic choice tendencies in favor of right- or left-skewed options can be obtained by manipulating which option is experienced as the frequent winner. We also find evidence for an intrinsic preference for right-skewed outcome distributions. The frequent-winner phenomenon is robust to variations in outcome distributions and experimental paradigms. These findings are confirmed by computational analyses in which a reinforcement-learning model capturing frequent winning and intrinsic skewness preferences provides the best account of the data. Our work reconciles conflicting findings of aggregated behavior in financial markets and experiments and highlights the need for theories of decision-making sensitive to joint outcome distributions of the available options.


Asunto(s)
Conducta de Elección , Toma de Decisiones , Humanos , Aprendizaje , Refuerzo en Psicología
12.
Annu Rev Neurosci ; 41: 371-388, 2018 07 08.
Artículo en Inglés | MEDLINE | ID: mdl-29709209

RESUMEN

In everyday life, the outcomes of our actions are rarely certain. Further, we often lack the information needed to precisely estimate the probability and value of potential outcomes as well as how much effort will be required by the courses of action under consideration. Under such conditions of uncertainty, individual differences in the estimation and weighting of these variables, and in reliance on model-free versus model-based decision making, have the potential to strongly influence our behavior. Both anxiety and depression are associated with difficulties in decision making. Further, anxiety is linked to increased engagement in threat-avoidance behaviors and depression is linked to reduced engagement in reward-seeking behaviors. The precise deficits, or biases, in decision making associated with these common forms of psychopathology remain to be fully specified. In this article, we review evidence for which of the computations supporting decision making are altered in anxiety and depression and consider the potential consequences for action selection. In addition, we provide a schematic framework that integrates the findings reviewed and will hopefully be of value to future studies.


Asunto(s)
Ansiedad , Simulación por Computador , Toma de Decisiones/fisiología , Depresión , Animales , Humanos , Recompensa
13.
Brief Bioinform ; 25(2)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38493345

RESUMEN

The evolution of drug resistance leads to treatment failure and tumor progression. Intermittent androgen deprivation therapy (IADT) helps responsive cancer cells compete with resistant cancer cells in intratumoral competition. However, conventional IADT is population-based, ignoring the heterogeneity of patients and cancer. Additionally, existing IADT relies on pre-determined thresholds of prostate-specific antigen to pause and resume treatment, which is not optimized for individual patients. To address these challenges, we framed a data-driven method in two steps. First, we developed a time-varied, mixed-effect and generative Lotka-Volterra (tM-GLV) model to account for the heterogeneity of the evolution mechanism and the pharmacokinetics of two ADT drugs Cyproterone acetate and Leuprolide acetate for individual patients. Then, we proposed a reinforcement-learning-enabled individualized IADT framework, namely, I$^{2}$ADT, to learn the patient-specific tumor dynamics and derive the optimal drug administration policy. Experiments with clinical trial data demonstrated that the proposed I$^{2}$ADT can significantly prolong the time to progression of prostate cancer patients with reduced cumulative drug dosage. We further validated the efficacy of the proposed methods with a recent pilot clinical trial data. Moreover, the adaptability of I$^{2}$ADT makes it a promising tool for other cancers with the availability of clinical data, where treatment regimens might need to be individualized based on patient characteristics and disease dynamics. Our research elucidates the application of deep reinforcement learning to identify personalized adaptive cancer therapy.


Asunto(s)
Neoplasias de la Próstata , Masculino , Humanos , Neoplasias de la Próstata/tratamiento farmacológico , Neoplasias de la Próstata/genética , Neoplasias de la Próstata/patología , Antagonistas de Andrógenos/uso terapéutico , Andrógenos/uso terapéutico
14.
Brief Bioinform ; 25(3)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38770719

RESUMEN

Recent advances in cancer immunotherapy have highlighted the potential of neoantigen-based vaccines. However, the design of such vaccines is hindered by the possibility of weak binding affinity between the peptides and the patient's specific human leukocyte antigen (HLA) alleles, which may not elicit a robust adaptive immune response. Triggering cross-immunity by utilizing peptide mutations that have enhanced binding affinity to target HLA molecules, while preserving their homology with the original one, can be a promising avenue for neoantigen vaccine design. In this study, we introduced UltraMutate, a novel algorithm that combines Reinforcement Learning and Monte Carlo Tree Search, which identifies peptide mutations that not only exhibit enhanced binding affinities to target HLA molecules but also retains a high degree of homology with the original neoantigen. UltraMutate outperformed existing state-of-the-art methods in identifying affinity-enhancing mutations in an independent test set consisting of 3660 peptide-HLA pairs. UltraMutate further showed its applicability in the design of peptide vaccines for Human Papillomavirus and Human Cytomegalovirus, demonstrating its potential as a promising tool in the advancement of personalized immunotherapy.


Asunto(s)
Algoritmos , Vacunas contra el Cáncer , Método de Montecarlo , Humanos , Vacunas contra el Cáncer/inmunología , Vacunas contra el Cáncer/genética , Antígenos HLA/inmunología , Antígenos HLA/genética , Antígenos de Neoplasias/inmunología , Antígenos de Neoplasias/genética , Mutación
15.
Brief Bioinform ; 25(2)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38446739

RESUMEN

Antimicrobial peptides (AMPs), short peptides with diverse functions, effectively target and combat various organisms. The widespread misuse of chemical antibiotics has led to increasing microbial resistance. Due to their low drug resistance and toxicity, AMPs are considered promising substitutes for traditional antibiotics. While existing deep learning technology enhances AMP generation, it also presents certain challenges. Firstly, AMP generation overlooks the complex interdependencies among amino acids. Secondly, current models fail to integrate crucial tasks like screening, attribute prediction and iterative optimization. Consequently, we develop a integrated deep learning framework, Diff-AMP, that automates AMP generation, identification, attribute prediction and iterative optimization. We innovatively integrate kinetic diffusion and attention mechanisms into the reinforcement learning framework for efficient AMP generation. Additionally, our prediction module incorporates pre-training and transfer learning strategies for precise AMP identification and screening. We employ a convolutional neural network for multi-attribute prediction and a reinforcement learning-based iterative optimization strategy to produce diverse AMPs. This framework automates molecule generation, screening, attribute prediction and optimization, thereby advancing AMP research. We have also deployed Diff-AMP on a web server, with code, data and server details available in the Data Availability section.


Asunto(s)
Aminoácidos , Péptidos Antimicrobianos , Antibacterianos , Difusión , Cinética
16.
Brief Bioinform ; 25(5)2024 Jul 25.
Artículo en Inglés | MEDLINE | ID: mdl-39256196

RESUMEN

Using amino acid residues in peptide generation has solved several key problems, including precise control of amino acid sequence order, customized peptides for property modification, and large-scale peptide synthesis. Proteins contain unknown amino acid residues. Extracting them for the synthesis of drug-like peptides can create novel structures with unique properties, driving drug development. Computer-aided design of novel peptide drug molecules can solve the high-cost and low-efficiency problems in the traditional drug discovery process. Previous studies faced limitations in enhancing the bioactivity and drug-likeness of polypeptide drugs due to less emphasis on the connection relationships in amino acid structures. Thus, we proposed a reinforcement learning-driven generation model based on graph attention mechanisms for peptide generation. By harnessing the advantages of graph attention mechanisms, this model effectively captured the connectivity structures between amino acid residues in peptides. Simultaneously, leveraging reinforcement learning's strength in guiding optimal sequence searches provided a novel approach to peptide design and optimization. This model introduces an actor-critic framework with real-time feedback loops to achieve dynamic balance between attributes, which can customize the generation of multiple peptides for specific targets and enhance the affinity between peptides and targets. Experimental results demonstrate that the generated drug-like peptides meet specified absorption, distribution, metabolism, excretion, and toxicity properties and bioactivity with a success rate of over 90$\%$, thereby significantly accelerating the process of drug-like peptide generation.


Asunto(s)
Péptidos , Péptidos/química , Secuencia de Aminoácidos , Descubrimiento de Drogas , Diseño de Fármacos , Algoritmos , Diseño Asistido por Computadora , Humanos
17.
Proc Natl Acad Sci U S A ; 120(39): e2220593120, 2023 09 26.
Artículo en Inglés | MEDLINE | ID: mdl-37725652

RESUMEN

I apply a recently emerging perspective on the complexity of action selection, the rate-distortion theory of control, to provide a computational-level model of errors and difficulties in human language production, which is grounded in information theory and control theory. Language production is cast as the sequential selection of actions to achieve a communicative goal subject to a capacity constraint on cognitive control. In a series of calculations, simulations, corpus analyses, and comparisons to experimental data, I show that the model directly predicts some of the major known qualitative and quantitative phenomena in language production, including semantic interference and predictability effects in word choice; accessibility-based ("easy-first") production preferences in word order alternations; and the existence and distribution of disfluencies including filled pauses, corrections, and false starts. I connect the rate-distortion view to existing models of human language production, to probabilistic models of semantics and pragmatics, and to proposals for controlled language generation in the machine learning and reinforcement learning literature.


Asunto(s)
Lenguaje , Semántica , Humanos , Comunicación , Teoría de la Información , Aprendizaje Automático
18.
Proc Natl Acad Sci U S A ; 120(31): e2304881120, 2023 08.
Artículo en Inglés | MEDLINE | ID: mdl-37490530

RESUMEN

Motivation influences goals, decisions, and memory formation. Imperative motivation links urgent goals to actions, narrowing the focus of attention and memory. Conversely, interrogative motivation integrates goals over time and space, supporting rich memory encoding for flexible future use. We manipulated motivational states via cover stories for a reinforcement learning task: The imperative group imagined executing a museum heist, whereas the interrogative group imagined planning a future heist. Participants repeatedly chose among four doors, representing different museum rooms, to sample trial-unique paintings with variable rewards (later converted to bonus payments). The next day, participants performed a surprise memory test. Crucially, only the cover stories differed between the imperative and interrogative groups; the reinforcement learning task was identical, and all participants had the same expectations about how and when bonus payments would be awarded. In an initial sample and a preregistered replication, we demonstrated that imperative motivation increased exploitation during reinforcement learning. Conversely, interrogative motivation increased directed (but not random) exploration, despite the cost to participants' earnings. At test, the interrogative group was more accurate at recognizing paintings and recalling associated values. In the interrogative group, higher value paintings were more likely to be remembered; imperative motivation disrupted this effect of reward modulating memory. Overall, we demonstrate that a prelearning motivational manipulation can bias learning and memory, bearing implications for education, behavior change, clinical interventions, and communication.


Asunto(s)
Motivación , Refuerzo en Psicología , Humanos , Aprendizaje , Recompensa , Recuerdo Mental
19.
Proc Natl Acad Sci U S A ; 120(13): e2216524120, 2023 03 28.
Artículo en Inglés | MEDLINE | ID: mdl-36961923

RESUMEN

Patch foraging presents a sequential decision-making problem widely studied across organisms-stay with a current option or leave it in search of a better alternative? Behavioral ecology has identified an optimal strategy for these decisions, but, across species, foragers systematically deviate from it, staying too long with an option or "overharvesting" relative to this optimum. Despite the ubiquity of this behavior, the mechanism underlying it remains unclear and an object of extensive investigation. Here, we address this gap by approaching foraging as both a decision-making and learning problem. Specifically, we propose a model in which foragers 1) rationally infer the structure of their environment and 2) use their uncertainty over the inferred structure representation to adaptively discount future rewards. We find that overharvesting can emerge from this rational statistical inference and uncertainty adaptation process. In a patch-leaving task, we show that human participants adapt their foraging to the richness and dynamics of the environment in ways consistent with our model. These findings suggest that definitions of optimal foraging could be extended by considering how foragers reduce and adapt to uncertainty over representations of their environment.


Asunto(s)
Conducta de Elección , Aprendizaje , Modelos Teóricos , Toma de Decisiones , Ambiente , Humanos
20.
Proc Natl Acad Sci U S A ; 120(28): e2221180120, 2023 07 11.
Artículo en Inglés | MEDLINE | ID: mdl-37399387

RESUMEN

Satisfying a variety of conflicting needs in a changing environment is a fundamental challenge for any adaptive agent. Here, we show that designing an agent in a modular fashion as a collection of subagents, each dedicated to a separate need, powerfully enhanced the agent's capacity to satisfy its overall needs. We used the formalism of deep reinforcement learning to investigate a biologically relevant multiobjective task: continually maintaining homeostasis of a set of physiologic variables. We then conducted simulations in a variety of environments and compared how modular agents performed relative to standard monolithic agents (i.e., agents that aimed to satisfy all needs in an integrated manner using a single aggregate measure of success). Simulations revealed that modular agents a) exhibited a form of exploration that was intrinsic and emergent rather than extrinsically imposed; b) were robust to changes in nonstationary environments, and c) scaled gracefully in their ability to maintain homeostasis as the number of conflicting objectives increased. Supporting analysis suggested that the robustness to changing environments and increasing numbers of needs were due to intrinsic exploration and efficiency of representation afforded by the modular architecture. These results suggest that the normative principles by which agents have adapted to complex changing environments may also explain why humans have long been described as consisting of "multiple selves."


Asunto(s)
Aprendizaje , Refuerzo en Psicología , Humanos , Aprendizaje/fisiología , Homeostasis
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA