RESUMO
Attention filters sensory inputs to enhance task-relevant information. It is guided by an "attentional template" that represents the stimulus features that are currently relevant. To understand how the brain learns and uses templates, we trained monkeys to perform a visual search task that required them to repeatedly learn new attentional templates. Neural recordings found that templates were represented across the prefrontal and parietal cortex in a structured manner, such that perceptually neighboring templates had similar neural representations. When the task changed, a new attentional template was learned by incrementally shifting the template toward rewarded features. Finally, we found that attentional templates transformed stimulus features into a common value representation that allowed the same decision-making mechanisms to deploy attention, regardless of the identity of the template. Altogether, our results provide insight into the neural mechanisms by which the brain learns to control attention and how attention can be flexibly deployed across tasks.
Assuntos
Atenção , Tomada de Decisões , Aprendizagem , Lobo Parietal , Recompensa , Animais , HaplorrinosRESUMO
Significant evidence supports the view that dopamine shapes learning by encoding reward prediction errors. However, it is unknown whether striatal targets receive tailored dopamine dynamics based on regional functional specialization. Here, we report wave-like spatiotemporal activity patterns in dopamine axons and release across the dorsal striatum. These waves switch between activational motifs and organize dopamine transients into localized clusters within functionally related striatal subregions. Notably, wave trajectories were tailored to task demands, propagating from dorsomedial to dorsolateral striatum when rewards are contingent on animal behavior and in the opponent direction when rewards are independent of behavioral responses. We propose a computational architecture in which striatal dopamine waves are sculpted by inference about agency and provide a mechanism to direct credit assignment to specialized striatal subregions. Supporting model predictions, dorsomedial dopamine activity during reward-pursuit signaled the extent of instrumental control and interacted with reward waves to predict future behavioral adjustments.
Assuntos
Axônios/metabolismo , Comportamento Animal , Corpo Estriado/metabolismo , Dopamina/metabolismo , Recompensa , Animais , Feminino , Masculino , Camundongos , Camundongos MutantesRESUMO
Every decision we make is accompanied by a sense of confidence about its likely outcome. This sense informs subsequent behavior, such as investing more-whether time, effort, or money-when reward is more certain. A neural representation of confidence should originate from a statistical computation and predict confidence-guided behavior. An additional requirement for confidence representations to support metacognition is abstraction: they should emerge irrespective of the source of information and inform multiple confidence-guided behaviors. It is unknown whether neural confidence signals meet these criteria. Here, we show that single orbitofrontal cortex neurons in rats encode statistical decision confidence irrespective of the sensory modality, olfactory or auditory, used to make a choice. The activity of these neurons also predicts two confidence-guided behaviors: trial-by-trial time investment and cross-trial choice strategy updating. Orbitofrontal cortex thus represents decision confidence consistent with a metacognitive process that is useful for mediating confidence-guided economic decisions.
Assuntos
Comportamento/fisiologia , Córtex Pré-Frontal/fisiologia , Animais , Comportamento de Escolha/fisiologia , Tomada de Decisões , Modelos Biológicos , Neurônios/fisiologia , Ratos Long-Evans , Sensação/fisiologia , Análise e Desempenho de Tarefas , Fatores de TempoRESUMO
Nervous systems evolved to effectively navigate the dynamics of the environment to achieve their goals. One framework used to study this fundamental problem arose in the study of learning and decision-making. In this framework, the demands of effective behavior require slow dynamics-on the scale of seconds to minutes-of networks of neurons. Here, we review the phenomena and mechanisms involved. Using vignettes from a few species and areas of the nervous system, we view neuromodulators as key substrates for temporal scaling of neuronal dynamics.
Assuntos
Tomada de Decisões , Neurofisiologia , Tomada de Decisões/fisiologia , Aprendizagem/fisiologia , Neurônios/fisiologia , NeurotransmissoresRESUMO
Eukaryotic cells learn and adapt via unknown network architectures. Recent work demonstrated a circuit of two GTPases used by cells to overcome growth factor scarcity, encouraging our view that artificial and biological intelligence share strikingly similar design principles and that cells function as deep reinforcement learning (RL) agents in uncertain environments.
Assuntos
GTP Fosfo-Hidrolases , Transdução de Sinais , GTP Fosfo-Hidrolases/metabolismoRESUMO
Behavior is readily classified into patterns of movements with inferred common goals-actions. Goals may be discrete; movements are continuous. Through the careful study of isolated movements in laboratory settings, or via introspection, it has become clear that animals can exhibit exquisite graded specification to their movements. Moreover, graded control can be as fundamental to success as the selection of which action to perform under many naturalistic scenarios: a predator adjusting its speed to intercept moving prey, or a tool-user exerting the perfect amount of force to complete a delicate task. The basal ganglia are a collection of nuclei in vertebrates that extend from the forebrain (telencephalon) to the midbrain (mesencephalon), constituting a major descending extrapyramidal pathway for control over midbrain and brainstem premotor structures. Here we discuss how this pathway contributes to the continuous specification of movements that endows our voluntary actions with vigor and grace.
Assuntos
Gânglios da Base/fisiologia , Comportamento/fisiologia , Encéfalo/fisiologia , Movimento/fisiologia , Vias Neurais/fisiologia , Animais , Humanos , Neurônios/fisiologiaRESUMO
During foraging behavior, action values are persistently encoded in neural activity and updated depending on the history of choice outcomes. What is the neural mechanism for action value maintenance and updating? Here, we explore two contrasting network models: synaptic learning of action value versus neural integration. We show that both models can reproduce extant experimental data, but they yield distinct predictions about the underlying biological neural circuits. In particular, the neural integrator model but not the synaptic model requires that reward signals are mediated by neural pools selective for action alternatives and their projections are aligned with linear attractor axes in the valuation system. We demonstrate experimentally observable neural dynamical signatures and feasible perturbations to differentiate the two contrasting scenarios, suggesting that the synaptic model is a more robust candidate mechanism. Overall, this work provides a modeling framework to guide future experimental research on probabilistic foraging.
Assuntos
Comportamento de Escolha , Recompensa , Encéfalo , Aprendizagem , Plasticidade Neuronal , Tomada de DecisõesRESUMO
Individual survival and evolutionary selection require biological organisms to maximize reward. Economic choice theories define the necessary and sufficient conditions, and neuronal signals of decision variables provide mechanistic explanations. Reinforcement learning (RL) formalisms use predictions, actions, and policies to maximize reward. Midbrain dopamine neurons code reward prediction errors (RPE) of subjective reward value suitable for RL. Electrical and optogenetic self-stimulation experiments demonstrate that monkeys and rodents repeat behaviors that result in dopamine excitation. Dopamine excitations reflect positive RPEs that increase reward predictions via RL; against increasing predictions, obtaining similar dopamine RPE signals again requires better rewards than before. The positive RPEs drive predictions higher again and thus advance a recursive reward-RPE-prediction iteration toward better and better rewards. Agents also avoid dopamine inhibitions that lower reward prediction via RL, which allows smaller rewards than before to elicit positive dopamine RPE signals and resume the iteration toward better rewards. In this way, dopamine RPE signals serve a causal mechanism that attracts agents via RL to the best rewards. The mechanism improves daily life and benefits evolutionary selection but may also induce restlessness and greed.
Assuntos
Dopamina , Neurônios Dopaminérgicos , Recompensa , Animais , Dopamina/metabolismo , Neurônios Dopaminérgicos/fisiologia , Neurônios Dopaminérgicos/metabolismo , Humanos , Reforço PsicológicoRESUMO
The adaptive and surprising emergent properties of biological materials self-assembled in far-from-equilibrium environments serve as an inspiration for efforts to design nanomaterials. In particular, controlling the conditions of self-assembly can modulate material properties, but there is no systematic understanding of either how to parameterize external control or how controllable a given material can be. Here, we demonstrate that branched actin networks can be encoded with metamaterial properties by dynamically controlling the applied force under which they grow and that the protocols can be selected using multi-task reinforcement learning. These actin networks have tunable responses over a large dynamic range depending on the chosen external protocol, providing a pathway to encoding "memory" within these structures. Interestingly, we obtain a bound that relates the dissipation rate and the rate of "encoding" that gives insight into the constraints on control-both physical and information theoretical. Taken together, these results emphasize the utility and necessity of nonequilibrium control for designing self-assembled nanostructures.
Assuntos
Actinas , Nanoestruturas , Actinas/metabolismo , Nanoestruturas/químicaRESUMO
Do people's attitudes toward the (a)symmetry of an outcome distribution affect their choices? Financial investors seek return distributions with frequent small returns but few large ones, consistent with leading models of choice in economics and finance that assume right-skewed preferences. In contrast, many experiments in which decision-makers learn about choice options through experience find the opposite choice tendency, in favor of left-skewed options. To reconcile these seemingly contradicting findings, the present work investigates the effect of skewness on choices in experience-based decisions. Across seven studies, we show that apparent preferences for left-skewed outcome distributions are a consequence of those distributions having a higher value in most direct outcome comparisons, a "frequent-winner effect." By manipulating which option is the frequent winner, we show that choice tendencies for frequent winners can be obtained even with identical outcome distributions. Moreover, systematic choice tendencies in favor of right- or left-skewed options can be obtained by manipulating which option is experienced as the frequent winner. We also find evidence for an intrinsic preference for right-skewed outcome distributions. The frequent-winner phenomenon is robust to variations in outcome distributions and experimental paradigms. These findings are confirmed by computational analyses in which a reinforcement-learning model capturing frequent winning and intrinsic skewness preferences provides the best account of the data. Our work reconciles conflicting findings of aggregated behavior in financial markets and experiments and highlights the need for theories of decision-making sensitive to joint outcome distributions of the available options.
Assuntos
Comportamento de Escolha , Tomada de Decisões , Humanos , Aprendizagem , Reforço PsicológicoRESUMO
Enhanced sampling techniques have traditionally encountered two significant challenges: identifying suitable reaction coordinates and addressing the exploration-exploitation dilemma, particularly the difficulty of escaping local energy minima. Here, we introduce Adaptive CVgen, a universal adaptive sampling framework designed to tackle these issues. Our approach utilizes a set of collective variables (CVs) to comprehensively cover the system's potential evolutionary phase space, generating diverse reaction coordinates to address the first challenge. Moreover, we integrate reinforcement learning strategies to dynamically adjust the generated reaction coordinates, thereby effectively balancing the exploration-exploitation dilemma. We apply this framework to sample the conformational space of six proteins transitioning from completely disordered states to folded states, as well as to model the chemical synthesis process of C60, achieving conformations that perfectly match the standard C60 structure. The results demonstrate Adaptive CVgen's effectiveness in exploring new conformations and escaping local minima, achieving both sampling efficiency and exploration accuracy. This framework holds potential for extending to various related challenges, including protein folding dynamics, drug targeting, and complex chemical reactions, thereby opening promising avenues for application in these fields.
Assuntos
Dobramento de Proteína , Proteínas/química , Conformação Proteica , Algoritmos , Simulação de Dinâmica MolecularRESUMO
Accelerating the measurement for discrimination of samples, such as classification of cell phenotype, is crucial when faced with significant time and cost constraints. Spontaneous Raman microscopy offers label-free, rich chemical information but suffers from long acquisition time due to extremely small scattering cross-sections. One possible approach to accelerate the measurement is by measuring necessary parts with a suitable number of illumination points. However, how to design these points during measurement remains a challenge. To address this, we developed an imaging technique based on a reinforcement learning in machine learning (ML). This ML approach adaptively feeds back "optimal" illumination pattern during the measurement to detect the existence of specific characteristics of interest, allowing faster measurements while guaranteeing discrimination accuracy. Using a set of Raman images of human follicular thyroid and follicular thyroid carcinoma cells, we showed that our technique requires 3,333 to 31,683 times smaller number of illuminations for discriminating the phenotypes than raster scanning. To quantitatively evaluate the number of illuminations depending on the requisite discrimination accuracy, we prepared a set of polymer bead mixture samples to model anomalous and normal tissues. We then applied a home-built programmable-illumination microscope equipped with our algorithm, and confirmed that the system can discriminate the sample conditions with 104 to 4,350 times smaller number of illuminations compared to standard point illumination Raman microscopy. The proposed algorithm can be applied to other types of microscopy that can control measurement condition on the fly, offering an approach for the acceleration of accurate measurements in various applications including medical diagnosis.
Assuntos
Microscopia , Análise Espectral Raman , Humanos , Microscopia/métodos , Análise Espectral Raman/métodos , Glândula Tireoide , Microscopia Óptica não Linear , Aprendizado de MáquinaRESUMO
In everyday life, the outcomes of our actions are rarely certain. Further, we often lack the information needed to precisely estimate the probability and value of potential outcomes as well as how much effort will be required by the courses of action under consideration. Under such conditions of uncertainty, individual differences in the estimation and weighting of these variables, and in reliance on model-free versus model-based decision making, have the potential to strongly influence our behavior. Both anxiety and depression are associated with difficulties in decision making. Further, anxiety is linked to increased engagement in threat-avoidance behaviors and depression is linked to reduced engagement in reward-seeking behaviors. The precise deficits, or biases, in decision making associated with these common forms of psychopathology remain to be fully specified. In this article, we review evidence for which of the computations supporting decision making are altered in anxiety and depression and consider the potential consequences for action selection. In addition, we provide a schematic framework that integrates the findings reviewed and will hopefully be of value to future studies.
Assuntos
Ansiedade , Simulação por Computador , Tomada de Decisões/fisiologia , Depressão , Animais , Humanos , RecompensaRESUMO
The evolution of drug resistance leads to treatment failure and tumor progression. Intermittent androgen deprivation therapy (IADT) helps responsive cancer cells compete with resistant cancer cells in intratumoral competition. However, conventional IADT is population-based, ignoring the heterogeneity of patients and cancer. Additionally, existing IADT relies on pre-determined thresholds of prostate-specific antigen to pause and resume treatment, which is not optimized for individual patients. To address these challenges, we framed a data-driven method in two steps. First, we developed a time-varied, mixed-effect and generative Lotka-Volterra (tM-GLV) model to account for the heterogeneity of the evolution mechanism and the pharmacokinetics of two ADT drugs Cyproterone acetate and Leuprolide acetate for individual patients. Then, we proposed a reinforcement-learning-enabled individualized IADT framework, namely, I$^{2}$ADT, to learn the patient-specific tumor dynamics and derive the optimal drug administration policy. Experiments with clinical trial data demonstrated that the proposed I$^{2}$ADT can significantly prolong the time to progression of prostate cancer patients with reduced cumulative drug dosage. We further validated the efficacy of the proposed methods with a recent pilot clinical trial data. Moreover, the adaptability of I$^{2}$ADT makes it a promising tool for other cancers with the availability of clinical data, where treatment regimens might need to be individualized based on patient characteristics and disease dynamics. Our research elucidates the application of deep reinforcement learning to identify personalized adaptive cancer therapy.
Assuntos
Neoplasias da Próstata , Masculino , Humanos , Neoplasias da Próstata/tratamento farmacológico , Neoplasias da Próstata/genética , Neoplasias da Próstata/patologia , Antagonistas de Androgênios/uso terapêutico , Androgênios/uso terapêuticoRESUMO
Using amino acid residues in peptide generation has solved several key problems, including precise control of amino acid sequence order, customized peptides for property modification, and large-scale peptide synthesis. Proteins contain unknown amino acid residues. Extracting them for the synthesis of drug-like peptides can create novel structures with unique properties, driving drug development. Computer-aided design of novel peptide drug molecules can solve the high-cost and low-efficiency problems in the traditional drug discovery process. Previous studies faced limitations in enhancing the bioactivity and drug-likeness of polypeptide drugs due to less emphasis on the connection relationships in amino acid structures. Thus, we proposed a reinforcement learning-driven generation model based on graph attention mechanisms for peptide generation. By harnessing the advantages of graph attention mechanisms, this model effectively captured the connectivity structures between amino acid residues in peptides. Simultaneously, leveraging reinforcement learning's strength in guiding optimal sequence searches provided a novel approach to peptide design and optimization. This model introduces an actor-critic framework with real-time feedback loops to achieve dynamic balance between attributes, which can customize the generation of multiple peptides for specific targets and enhance the affinity between peptides and targets. Experimental results demonstrate that the generated drug-like peptides meet specified absorption, distribution, metabolism, excretion, and toxicity properties and bioactivity with a success rate of over 90$\%$, thereby significantly accelerating the process of drug-like peptide generation.
Assuntos
Peptídeos , Peptídeos/química , Sequência de Aminoácidos , Descoberta de Drogas , Desenho de Fármacos , Algoritmos , Desenho Assistido por Computador , HumanosRESUMO
Antimicrobial peptides (AMPs), short peptides with diverse functions, effectively target and combat various organisms. The widespread misuse of chemical antibiotics has led to increasing microbial resistance. Due to their low drug resistance and toxicity, AMPs are considered promising substitutes for traditional antibiotics. While existing deep learning technology enhances AMP generation, it also presents certain challenges. Firstly, AMP generation overlooks the complex interdependencies among amino acids. Secondly, current models fail to integrate crucial tasks like screening, attribute prediction and iterative optimization. Consequently, we develop a integrated deep learning framework, Diff-AMP, that automates AMP generation, identification, attribute prediction and iterative optimization. We innovatively integrate kinetic diffusion and attention mechanisms into the reinforcement learning framework for efficient AMP generation. Additionally, our prediction module incorporates pre-training and transfer learning strategies for precise AMP identification and screening. We employ a convolutional neural network for multi-attribute prediction and a reinforcement learning-based iterative optimization strategy to produce diverse AMPs. This framework automates molecule generation, screening, attribute prediction and optimization, thereby advancing AMP research. We have also deployed Diff-AMP on a web server, with code, data and server details available in the Data Availability section.
Assuntos
Aminoácidos , Peptídeos Antimicrobianos , Antibacterianos , Difusão , CinéticaRESUMO
Recent advances in cancer immunotherapy have highlighted the potential of neoantigen-based vaccines. However, the design of such vaccines is hindered by the possibility of weak binding affinity between the peptides and the patient's specific human leukocyte antigen (HLA) alleles, which may not elicit a robust adaptive immune response. Triggering cross-immunity by utilizing peptide mutations that have enhanced binding affinity to target HLA molecules, while preserving their homology with the original one, can be a promising avenue for neoantigen vaccine design. In this study, we introduced UltraMutate, a novel algorithm that combines Reinforcement Learning and Monte Carlo Tree Search, which identifies peptide mutations that not only exhibit enhanced binding affinities to target HLA molecules but also retains a high degree of homology with the original neoantigen. UltraMutate outperformed existing state-of-the-art methods in identifying affinity-enhancing mutations in an independent test set consisting of 3660 peptide-HLA pairs. UltraMutate further showed its applicability in the design of peptide vaccines for Human Papillomavirus and Human Cytomegalovirus, demonstrating its potential as a promising tool in the advancement of personalized immunotherapy.
Assuntos
Algoritmos , Vacinas Anticâncer , Método de Monte Carlo , Humanos , Vacinas Anticâncer/imunologia , Vacinas Anticâncer/genética , Antígenos HLA/imunologia , Antígenos HLA/genética , Antígenos de Neoplasias/imunologia , Antígenos de Neoplasias/genética , MutaçãoRESUMO
Enhancing patient response to immune checkpoint inhibitors (ICIs) is crucial in cancer immunotherapy. We aim to create a data-driven mathematical model of the tumor immune microenvironment (TIME) and utilize deep reinforcement learning (DRL) to optimize patient-specific ICI therapy combined with chemotherapy (ICC). Using patients' genomic and transcriptomic data, we develop an ordinary differential equations (ODEs)-based TIME dynamic evolutionary model to characterize interactions among chemotherapy, ICIs, immune cells, and tumor cells. A DRL agent is trained to determine the personalized optimal ICC therapy. Numerical experiments with real-world data demonstrate that the proposed TIME model can predict ICI therapy response. The DRL-derived personalized ICC therapy outperforms predefined fixed schedules. For tumors with extremely low CD8 + T cell infiltration ('extremely cold tumors'), the DRL agent recommends high-dosage chemotherapy alone. For tumors with higher CD8 + T cell infiltration ('cold' and 'hot tumors'), an appropriate chemotherapy dosage induces CD8 + T cell proliferation, enhancing ICI therapy outcomes. Specifically, for 'hot tumors', chemotherapy and ICI are administered simultaneously, while for 'cold tumors', a mid-dosage of chemotherapy makes the TIME 'hotter' before ICI administration. However, in several 'cold tumors' with rapid resistant tumor cell growth, ICC eventually fails. This study highlights the potential of utilizing real-world clinical data and DRL algorithm to develop personalized optimal ICC by understanding the complex biological dynamics of a patient's TIME. Our ODE-based TIME dynamic evolutionary model offers a theoretical framework for determining the best use of ICI, and the proposed DRL agent may guide personalized ICC schedules.
Assuntos
Inibidores de Checkpoint Imunológico , Neoplasias , Microambiente Tumoral , Humanos , Microambiente Tumoral/imunologia , Inibidores de Checkpoint Imunológico/uso terapêutico , Inibidores de Checkpoint Imunológico/farmacologia , Neoplasias/tratamento farmacológico , Neoplasias/imunologia , Linfócitos T CD8-Positivos/imunologia , Linfócitos T CD8-Positivos/efeitos dos fármacos , Medicina de Precisão , ImunoterapiaRESUMO
Long-range olfactory search is an extremely difficult task in view of the sparsity of odor signals that are available to the searcher and the complex encoding of the information about the source location. Current algorithmic approaches typically require a continuous memory space, sometimes of large dimensionality, which may hamper their optimization and often obscure their interpretation. Here, we show how finite-state controllers with a small set of discrete memory states are expressive enough to display rich, time-extended behavioral modules that resemble the ones observed in living organisms. Finite-state controllers optimized for olfactory search have an immediate interpretation in terms of approximate clocks and coarse-grained spatial maps, suggesting connections with neural models of search behavior.
Assuntos
Odorantes , OlfatoRESUMO
Patch foraging presents a sequential decision-making problem widely studied across organisms-stay with a current option or leave it in search of a better alternative? Behavioral ecology has identified an optimal strategy for these decisions, but, across species, foragers systematically deviate from it, staying too long with an option or "overharvesting" relative to this optimum. Despite the ubiquity of this behavior, the mechanism underlying it remains unclear and an object of extensive investigation. Here, we address this gap by approaching foraging as both a decision-making and learning problem. Specifically, we propose a model in which foragers 1) rationally infer the structure of their environment and 2) use their uncertainty over the inferred structure representation to adaptively discount future rewards. We find that overharvesting can emerge from this rational statistical inference and uncertainty adaptation process. In a patch-leaving task, we show that human participants adapt their foraging to the richness and dynamics of the environment in ways consistent with our model. These findings suggest that definitions of optimal foraging could be extended by considering how foragers reduce and adapt to uncertainty over representations of their environment.