Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 710
Filtrar
Mais filtros

Intervalo de ano de publicação
1.
Cell ; 187(6): 1476-1489.e21, 2024 Mar 14.
Artigo em Inglês | MEDLINE | ID: mdl-38401541

RESUMO

Attention filters sensory inputs to enhance task-relevant information. It is guided by an "attentional template" that represents the stimulus features that are currently relevant. To understand how the brain learns and uses templates, we trained monkeys to perform a visual search task that required them to repeatedly learn new attentional templates. Neural recordings found that templates were represented across the prefrontal and parietal cortex in a structured manner, such that perceptually neighboring templates had similar neural representations. When the task changed, a new attentional template was learned by incrementally shifting the template toward rewarded features. Finally, we found that attentional templates transformed stimulus features into a common value representation that allowed the same decision-making mechanisms to deploy attention, regardless of the identity of the template. Altogether, our results provide insight into the neural mechanisms by which the brain learns to control attention and how attention can be flexibly deployed across tasks.


Assuntos
Atenção , Tomada de Decisões , Aprendizagem , Lobo Parietal , Recompensa , Animais , Haplorrinos
2.
Trends Biochem Sci ; 49(4): 286-289, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38341333

RESUMO

Eukaryotic cells learn and adapt via unknown network architectures. Recent work demonstrated a circuit of two GTPases used by cells to overcome growth factor scarcity, encouraging our view that artificial and biological intelligence share strikingly similar design principles and that cells function as deep reinforcement learning (RL) agents in uncertain environments.


Assuntos
GTP Fosfo-Hidrolases , Transdução de Sinais , GTP Fosfo-Hidrolases/metabolismo
3.
Proc Natl Acad Sci U S A ; 121(14): e2318521121, 2024 Apr 02.
Artigo em Inglês | MEDLINE | ID: mdl-38551832

RESUMO

During foraging behavior, action values are persistently encoded in neural activity and updated depending on the history of choice outcomes. What is the neural mechanism for action value maintenance and updating? Here, we explore two contrasting network models: synaptic learning of action value versus neural integration. We show that both models can reproduce extant experimental data, but they yield distinct predictions about the underlying biological neural circuits. In particular, the neural integrator model but not the synaptic model requires that reward signals are mediated by neural pools selective for action alternatives and their projections are aligned with linear attractor axes in the valuation system. We demonstrate experimentally observable neural dynamical signatures and feasible perturbations to differentiate the two contrasting scenarios, suggesting that the synaptic model is a more robust candidate mechanism. Overall, this work provides a modeling framework to guide future experimental research on probabilistic foraging.


Assuntos
Comportamento de Escolha , Recompensa , Encéfalo , Aprendizagem , Plasticidade Neuronal , Tomada de Decisões
4.
Proc Natl Acad Sci U S A ; 121(20): e2316658121, 2024 May 14.
Artigo em Inglês | MEDLINE | ID: mdl-38717856

RESUMO

Individual survival and evolutionary selection require biological organisms to maximize reward. Economic choice theories define the necessary and sufficient conditions, and neuronal signals of decision variables provide mechanistic explanations. Reinforcement learning (RL) formalisms use predictions, actions, and policies to maximize reward. Midbrain dopamine neurons code reward prediction errors (RPE) of subjective reward value suitable for RL. Electrical and optogenetic self-stimulation experiments demonstrate that monkeys and rodents repeat behaviors that result in dopamine excitation. Dopamine excitations reflect positive RPEs that increase reward predictions via RL; against increasing predictions, obtaining similar dopamine RPE signals again requires better rewards than before. The positive RPEs drive predictions higher again and thus advance a recursive reward-RPE-prediction iteration toward better and better rewards. Agents also avoid dopamine inhibitions that lower reward prediction via RL, which allows smaller rewards than before to elicit positive dopamine RPE signals and resume the iteration toward better rewards. In this way, dopamine RPE signals serve a causal mechanism that attracts agents via RL to the best rewards. The mechanism improves daily life and benefits evolutionary selection but may also induce restlessness and greed.


Assuntos
Dopamina , Neurônios Dopaminérgicos , Recompensa , Animais , Dopamina/metabolismo , Neurônios Dopaminérgicos/fisiologia , Neurônios Dopaminérgicos/metabolismo , Humanos , Reforço Psicológico
5.
Proc Natl Acad Sci U S A ; 121(8): e2310238121, 2024 Feb 20.
Artigo em Inglês | MEDLINE | ID: mdl-38359294

RESUMO

The adaptive and surprising emergent properties of biological materials self-assembled in far-from-equilibrium environments serve as an inspiration for efforts to design nanomaterials. In particular, controlling the conditions of self-assembly can modulate material properties, but there is no systematic understanding of either how to parameterize external control or how controllable a given material can be. Here, we demonstrate that branched actin networks can be encoded with metamaterial properties by dynamically controlling the applied force under which they grow and that the protocols can be selected using multi-task reinforcement learning. These actin networks have tunable responses over a large dynamic range depending on the chosen external protocol, providing a pathway to encoding "memory" within these structures. Interestingly, we obtain a bound that relates the dissipation rate and the rate of "encoding" that gives insight into the constraints on control-both physical and information theoretical. Taken together, these results emphasize the utility and necessity of nonequilibrium control for designing self-assembled nanostructures.


Assuntos
Actinas , Nanoestruturas , Actinas/metabolismo , Nanoestruturas/química
6.
Proc Natl Acad Sci U S A ; 121(12): e2304866121, 2024 Mar 19.
Artigo em Inglês | MEDLINE | ID: mdl-38483992

RESUMO

Accelerating the measurement for discrimination of samples, such as classification of cell phenotype, is crucial when faced with significant time and cost constraints. Spontaneous Raman microscopy offers label-free, rich chemical information but suffers from long acquisition time due to extremely small scattering cross-sections. One possible approach to accelerate the measurement is by measuring necessary parts with a suitable number of illumination points. However, how to design these points during measurement remains a challenge. To address this, we developed an imaging technique based on a reinforcement learning in machine learning (ML). This ML approach adaptively feeds back "optimal" illumination pattern during the measurement to detect the existence of specific characteristics of interest, allowing faster measurements while guaranteeing discrimination accuracy. Using a set of Raman images of human follicular thyroid and follicular thyroid carcinoma cells, we showed that our technique requires 3,333 to 31,683 times smaller number of illuminations for discriminating the phenotypes than raster scanning. To quantitatively evaluate the number of illuminations depending on the requisite discrimination accuracy, we prepared a set of polymer bead mixture samples to model anomalous and normal tissues. We then applied a home-built programmable-illumination microscope equipped with our algorithm, and confirmed that the system can discriminate the sample conditions with 104 to 4,350 times smaller number of illuminations compared to standard point illumination Raman microscopy. The proposed algorithm can be applied to other types of microscopy that can control measurement condition on the fly, offering an approach for the acceleration of accurate measurements in various applications including medical diagnosis.


Assuntos
Microscopia , Análise Espectral Raman , Humanos , Microscopia/métodos , Análise Espectral Raman/métodos , Glândula Tireoide , Microscopia Óptica não Linear , Aprendizado de Máquina
7.
Proc Natl Acad Sci U S A ; 121(12): e2317751121, 2024 Mar 19.
Artigo em Inglês | MEDLINE | ID: mdl-38489382

RESUMO

Do people's attitudes toward the (a)symmetry of an outcome distribution affect their choices? Financial investors seek return distributions with frequent small returns but few large ones, consistent with leading models of choice in economics and finance that assume right-skewed preferences. In contrast, many experiments in which decision-makers learn about choice options through experience find the opposite choice tendency, in favor of left-skewed options. To reconcile these seemingly contradicting findings, the present work investigates the effect of skewness on choices in experience-based decisions. Across seven studies, we show that apparent preferences for left-skewed outcome distributions are a consequence of those distributions having a higher value in most direct outcome comparisons, a "frequent-winner effect." By manipulating which option is the frequent winner, we show that choice tendencies for frequent winners can be obtained even with identical outcome distributions. Moreover, systematic choice tendencies in favor of right- or left-skewed options can be obtained by manipulating which option is experienced as the frequent winner. We also find evidence for an intrinsic preference for right-skewed outcome distributions. The frequent-winner phenomenon is robust to variations in outcome distributions and experimental paradigms. These findings are confirmed by computational analyses in which a reinforcement-learning model capturing frequent winning and intrinsic skewness preferences provides the best account of the data. Our work reconciles conflicting findings of aggregated behavior in financial markets and experiments and highlights the need for theories of decision-making sensitive to joint outcome distributions of the available options.


Assuntos
Comportamento de Escolha , Tomada de Decisões , Humanos , Aprendizagem , Reforço Psicológico
8.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38493345

RESUMO

The evolution of drug resistance leads to treatment failure and tumor progression. Intermittent androgen deprivation therapy (IADT) helps responsive cancer cells compete with resistant cancer cells in intratumoral competition. However, conventional IADT is population-based, ignoring the heterogeneity of patients and cancer. Additionally, existing IADT relies on pre-determined thresholds of prostate-specific antigen to pause and resume treatment, which is not optimized for individual patients. To address these challenges, we framed a data-driven method in two steps. First, we developed a time-varied, mixed-effect and generative Lotka-Volterra (tM-GLV) model to account for the heterogeneity of the evolution mechanism and the pharmacokinetics of two ADT drugs Cyproterone acetate and Leuprolide acetate for individual patients. Then, we proposed a reinforcement-learning-enabled individualized IADT framework, namely, I$^{2}$ADT, to learn the patient-specific tumor dynamics and derive the optimal drug administration policy. Experiments with clinical trial data demonstrated that the proposed I$^{2}$ADT can significantly prolong the time to progression of prostate cancer patients with reduced cumulative drug dosage. We further validated the efficacy of the proposed methods with a recent pilot clinical trial data. Moreover, the adaptability of I$^{2}$ADT makes it a promising tool for other cancers with the availability of clinical data, where treatment regimens might need to be individualized based on patient characteristics and disease dynamics. Our research elucidates the application of deep reinforcement learning to identify personalized adaptive cancer therapy.


Assuntos
Neoplasias da Próstata , Masculino , Humanos , Neoplasias da Próstata/tratamento farmacológico , Neoplasias da Próstata/genética , Neoplasias da Próstata/patologia , Antagonistas de Androgênios/uso terapêutico , Androgênios/uso terapêutico
9.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38770719

RESUMO

Recent advances in cancer immunotherapy have highlighted the potential of neoantigen-based vaccines. However, the design of such vaccines is hindered by the possibility of weak binding affinity between the peptides and the patient's specific human leukocyte antigen (HLA) alleles, which may not elicit a robust adaptive immune response. Triggering cross-immunity by utilizing peptide mutations that have enhanced binding affinity to target HLA molecules, while preserving their homology with the original one, can be a promising avenue for neoantigen vaccine design. In this study, we introduced UltraMutate, a novel algorithm that combines Reinforcement Learning and Monte Carlo Tree Search, which identifies peptide mutations that not only exhibit enhanced binding affinities to target HLA molecules but also retains a high degree of homology with the original neoantigen. UltraMutate outperformed existing state-of-the-art methods in identifying affinity-enhancing mutations in an independent test set consisting of 3660 peptide-HLA pairs. UltraMutate further showed its applicability in the design of peptide vaccines for Human Papillomavirus and Human Cytomegalovirus, demonstrating its potential as a promising tool in the advancement of personalized immunotherapy.


Assuntos
Algoritmos , Vacinas Anticâncer , Método de Monte Carlo , Humanos , Vacinas Anticâncer/imunologia , Vacinas Anticâncer/genética , Antígenos HLA/imunologia , Antígenos HLA/genética , Antígenos de Neoplasias/imunologia , Antígenos de Neoplasias/genética , Mutação
10.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38446739

RESUMO

Antimicrobial peptides (AMPs), short peptides with diverse functions, effectively target and combat various organisms. The widespread misuse of chemical antibiotics has led to increasing microbial resistance. Due to their low drug resistance and toxicity, AMPs are considered promising substitutes for traditional antibiotics. While existing deep learning technology enhances AMP generation, it also presents certain challenges. Firstly, AMP generation overlooks the complex interdependencies among amino acids. Secondly, current models fail to integrate crucial tasks like screening, attribute prediction and iterative optimization. Consequently, we develop a integrated deep learning framework, Diff-AMP, that automates AMP generation, identification, attribute prediction and iterative optimization. We innovatively integrate kinetic diffusion and attention mechanisms into the reinforcement learning framework for efficient AMP generation. Additionally, our prediction module incorporates pre-training and transfer learning strategies for precise AMP identification and screening. We employ a convolutional neural network for multi-attribute prediction and a reinforcement learning-based iterative optimization strategy to produce diverse AMPs. This framework automates molecule generation, screening, attribute prediction and optimization, thereby advancing AMP research. We have also deployed Diff-AMP on a web server, with code, data and server details available in the Data Availability section.


Assuntos
Aminoácidos , Peptídeos Antimicrobianos , Antibacterianos , Difusão , Cinética
11.
J Neurosci ; 44(23)2024 Jun 05.
Artigo em Inglês | MEDLINE | ID: mdl-38684367

RESUMO

Humans need social closeness to prosper. There is evidence that empathy can induce social closeness. However, it remains unclear how empathy-related social closeness is formed and how stable it is as time passes. We applied an acquisition-extinction paradigm combined with computational modeling and fMRI, to investigate the formation and stability of empathy-related social closeness. Female participants observed painful stimulation of another person with high probability (acquisition) and low probability (extinction) and rated their closeness to that person. The results of two independent studies showed increased social closeness in the acquisition block that resisted extinction in the extinction block. Providing insights into underlying mechanisms, reinforcement learning modeling revealed that the formation of social closeness is based on a learning signal (prediction error) generated from observing another's pain, whereas maintaining social closeness is based on a learning signal generated from observing another's pain relief. The results of a reciprocity control study indicate that this feedback recalibration is specific to learning of empathy-related social closeness. On the neural level, the recalibration of the feedback signal was associated with neural responses in anterior insula and adjacent inferior frontal gyrus and the bilateral superior temporal sulcus/temporoparietal junction. Together, these findings show that empathy-related social closeness generated in bad times, that is, empathy with the misfortune of another person, transfers to good times and thus may form one important basis for stable social relationships.


Assuntos
Empatia , Imageamento por Ressonância Magnética , Humanos , Empatia/fisiologia , Feminino , Adulto Jovem , Adulto , Mapeamento Encefálico , Encéfalo/fisiologia , Encéfalo/diagnóstico por imagem
12.
J Neurosci ; 44(24)2024 Jun 12.
Artigo em Inglês | MEDLINE | ID: mdl-38670805

RESUMO

Reinforcement learning is a theoretical framework that describes how agents learn to select options that maximize rewards and minimize punishments over time. We often make choices, however, to obtain symbolic reinforcers (e.g., money, points) that are later exchanged for primary reinforcers (e.g., food, drink). Although symbolic reinforcers are ubiquitous in our daily lives, widely used in laboratory tasks because they can be motivating, mechanisms by which they become motivating are less understood. In the present study, we examined how monkeys learn to make choices that maximize fluid rewards through reinforcement with tokens. The question addressed here is how the value of a state, which is a function of multiple task features (e.g., the current number of accumulated tokens, choice options, task epoch, trials since the last delivery of primary reinforcer, etc.), drives value and affects motivation. We constructed a Markov decision process model that computes the value of task states given task features to then correlate with the motivational state of the animal. Fixation times, choice reaction times, and abort frequency were all significantly related to values of task states during the tokens task (n = 5 monkeys, three males and two females). Furthermore, the model makes predictions for how neural responses could change on a moment-by-moment basis relative to changes in the state value. Together, this task and model allow us to capture learning and behavior related to symbolic reinforcement.


Assuntos
Comportamento de Escolha , Macaca mulatta , Motivação , Reforço Psicológico , Recompensa , Animais , Motivação/fisiologia , Masculino , Comportamento de Escolha/fisiologia , Tempo de Reação/fisiologia , Cadeias de Markov , Feminino
13.
J Neurosci ; 44(17)2024 Apr 24.
Artigo em Inglês | MEDLINE | ID: mdl-38423764

RESUMO

Pavlovian conditioning is thought to involve the formation of learned associations between stimuli and values, and between stimuli and specific features of outcomes. Here, we leveraged human single neuron recordings in ventromedial prefrontal, dorsomedial frontal, hippocampus, and amygdala while patients of both sexes performed an appetitive Pavlovian conditioning task probing both stimulus-value and stimulus-stimulus associations. Ventromedial prefrontal cortex encoded predictive value along with the amygdala, and also encoded predictions about the identity of stimuli that would subsequently be presented, suggesting a role for neurons in this region in encoding predictive information beyond value. Unsigned error signals were found in dorsomedial frontal areas and hippocampus, potentially supporting learning of non-value related outcome features. Our findings implicate distinct human prefrontal and medial temporal neuronal populations in mediating predictive associations which could partially support model-based mechanisms during Pavlovian conditioning.


Assuntos
Condicionamento Clássico , Neurônios , Córtex Pré-Frontal , Humanos , Condicionamento Clássico/fisiologia , Masculino , Feminino , Córtex Pré-Frontal/fisiologia , Neurônios/fisiologia , Adulto , Lobo Temporal/fisiologia , Adulto Jovem , Comportamento Apetitivo/fisiologia , Aprendizagem por Associação/fisiologia
14.
Mol Biol Evol ; 41(6)2024 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-38829798

RESUMO

The computational search for the maximum-likelihood phylogenetic tree is an NP-hard problem. As such, current tree search algorithms might result in a tree that is the local optima, not the global one. Here, we introduce a paradigm shift for predicting the maximum-likelihood tree, by approximating long-term gains of likelihood rather than maximizing likelihood gain at each step of the search. Our proposed approach harnesses the power of reinforcement learning to learn an optimal search strategy, aiming at the global optimum of the search space. We show that when analyzing empirical data containing dozens of sequences, the log-likelihood improvement from the starting tree obtained by the reinforcement learning-based agent was 0.969 or higher compared to that achieved by current state-of-the-art techniques. Notably, this performance is attained without the need to perform costly likelihood optimizations apart from the training process, thus potentially allowing for an exponential increase in runtime. We exemplify this for data sets containing 15 sequences of length 18,000 bp and demonstrate that the reinforcement learning-based method is roughly three times faster than the state-of-the-art software. This study illustrates the potential of reinforcement learning in addressing the challenges of phylogenetic tree reconstruction.


Assuntos
Algoritmos , Filogenia , Funções Verossimilhança , Modelos Genéticos , Biologia Computacional/métodos , Software
15.
Nano Lett ; 24(5): 1650-1659, 2024 Feb 07.
Artigo em Inglês | MEDLINE | ID: mdl-38265360

RESUMO

Precision nanoengineering of porous two-dimensional structures has emerged as a promising avenue for finely tuning catalytic reactions. However, understanding the pore-structure-dependent catalytic performance remains challenging, given the lack of comprehensive guidelines, appropriate material models, and precise synthesis strategies. Here, we propose the optimization of two-dimensional carbon materials through the utilization of mesopores with 5-10 nm diameter to facilitate fluid acceleration, guided by finite element simulations. As proof of concept, the optimized mesoporous carbon nanosheet sample exhibited exceptional electrocatalytic performance, demonstrating high selectivity (>95%) and a notable diffusion-limiting disk current density of -3.1 mA cm-2 for H2O2 production. Impressively, the electrolysis process in the flow cell achieved a production rate of 14.39 mol gcatalyst-1 h-1 to yield a medical-grade disinfectant-worthy H2O2 solution. Our pore engineering research focuses on modulating oxygen reduction reaction activity and selectivity by affecting local fluid transport behavior, providing insights into the mesoscale catalytic mechanism.

16.
Eur J Neurosci ; 59(8): 2118-2127, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38282277

RESUMO

Early diagnosis is crucial to slowing the progression of Alzheimer's disease (AD), so it is urgent to find an effective diagnostic method for AD. This study intended to investigate whether the transfer learning approach of deep Q-network (DQN) could effectively distinguish AD patients using local metrics of resting-state functional magnetic resonance imaging (rs-fMRI) as features. This study included 1310 subjects from the Consortium for Reliability and Reproducibility (CoRR) and 50 subjects from the Alzheimer's Disease Neuroimaging Initiative (ADNI) GO/2. The amplitude of low-frequency fluctuation (ALFF), fractional ALFF (fALFF) and percent amplitude of fluctuation (PerAF) were extracted as features using the Power 264 atlas. Based on gender bias in AD, we searched for transferable similar parts between the CoRR feature matrix and the ADNI feature matrix, resulting in the CoRR similar feature matrix served as the source domain and the ADNI similar feature matrix served as the target domain. A DQN classifier was pre-trained in the source domain and transferred to the target domain. Finally, the transferred DQN classifier was used to classify AD and healthy controls (HC). A permutation test was performed. The DQN transfer learning achieved a classification accuracy of 86.66% (p < 0.01), recall of 83.33% and precision of 83.33%. The findings suggested that the transfer learning approach using DQN could be an effective way to distinguish AD from HC. It also revealed the potential value of local brain activity in AD clinical diagnosis.


Assuntos
Doença de Alzheimer , Encéfalo , Humanos , Masculino , Feminino , Doença de Alzheimer/diagnóstico por imagem , Reprodutibilidade dos Testes , Imageamento por Ressonância Magnética/métodos , Sexismo , Aprendizado de Máquina
17.
Eur J Neurosci ; 59(3): 457-472, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38178558

RESUMO

Millions of people suffer from dopamine-related disorders spanning disturbances in movement, cognition and emotion. These changes are often attributed to changes in striatal dopamine function. Thus, understanding how dopamine signalling in the striatum and basal ganglia shapes human behaviour is fundamental to advancing the treatment of affected patients. Dopaminergic neurons innervate large-scale brain networks, and accordingly, many different roles for dopamine signals have been proposed, such as invigoration of movement and tracking of reward contingencies. The canonical circuit architecture of cortico-striatal loops sparks the question, of whether dopamine signals in the basal ganglia serve an overarching computational principle. Such a holistic understanding of dopamine functioning could provide new insights into symptom generation in psychiatry to neurology. Here, we review the perspective that dopamine could bidirectionally control neural population dynamics, increasing or decreasing their strength and likelihood to reoccur in the future, a process previously termed neural reinforcement. We outline how the basal ganglia pathways could drive strengthening and weakening of circuit dynamics and discuss the implication of this hypothesis on the understanding of motor signs of Parkinson's disease (PD), the most frequent dopaminergic disorder. We propose that loss of dopamine in PD may lead to a pathological brain state where repetition of neural activity leads to weakening and instability, possibly explanatory for the fact that movement in PD deteriorates with repetition. Finally, we speculate on how therapeutic interventions such as deep brain stimulation may be able to reinstate reinforcement signals and thereby improve treatment strategies for PD in the future.


Assuntos
Estimulação Encefálica Profunda , Doença de Parkinson , Humanos , Dopamina/metabolismo , Gânglios da Base , Encéfalo/metabolismo
18.
Am Nat ; 203(6): 695-712, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38781528

RESUMO

AbstractA change to a population's social network is a change to the substrate of cultural transmission, affecting behavioral diversity and adaptive cultural evolution. While features of network structure such as population size and density have been well studied, less is understood about the influence of social processes such as population turnover-or the repeated replacement of individuals by naive individuals. Experimental data have led to the hypothesis that naive learners can drive cultural evolution by better assessing the relative value of behaviors, although this hypothesis has been expressed only verbally. We conducted a formal exploration of this hypothesis using a generative model that concurrently simulated its two key ingredients: social transmission and reinforcement learning. We simulated competition between high- and low-reward behaviors while varying turnover magnitude and tempo. Variation in turnover influenced changes in the distributions of cultural behaviors, irrespective of initial knowledge-state conditions. We found optimal turnover regimes that amplified the production of higher reward behaviors through two key mechanisms: repertoire composition and enhanced valuation by agents that knew both behaviors. These effects depended on network and learning parameters. Our model provides formal theoretical support for, and predictions about, the hypothesis that naive learners can shape cultural change through their enhanced sampling ability. By moving from experimental data to theory, we illuminate an underdiscussed generative process that can lead to changes in cultural behavior, arising from an interaction between social dynamics and learning.


Assuntos
Evolução Cultural , Aprendizagem , Humanos , Recompensa , Comportamento Social , Modelos Teóricos , Reforço Psicológico
19.
J Comput Chem ; 45(15): 1289-1302, 2024 Jun 05.
Artigo em Inglês | MEDLINE | ID: mdl-38357973

RESUMO

Reinforcement learning (RL) methods have helped to define the state of the art in the field of modern artificial intelligence, mostly after the breakthrough involving AlphaGo and the discovery of novel algorithms. In this work, we present a RL method, based on Q-learning, for the structural determination of adsorbate@substrate models in silico, where the minimization of the energy landscape resulting from adsorbate interactions with a substrate is made by actions on states (translations and rotations) chosen from an agent's policy. The proposed RL method is implemented in an early version of the reinforcement learning software for materials design and discovery (RLMaterial), developed in Python3.x. RLMaterial interfaces with deMon2k, DFTB+, ORCA, and Quantum Espresso codes to compute the adsorbate@substrate energies. The RL method was applied for the structural determination of (i) the amino acid glycine and (ii) 2-amino-acetaldehyde, both interacting with a boron nitride (BN) monolayer, (iii) host-guest interactions between phenylboronic acid and ß-cyclodextrin and (iv) ammonia on naphthalene. Density functional tight binding calculations were used to build the complex search surfaces with a reasonably low computational cost for systems (i)-(iii) and DFT for system (iv). Artificial neural network and gradient boosting regression techniques were employed to approximate the Q-matrix or Q-table for better decision making (policy) on next actions. Finally, we have developed a transfer-learning protocol within the RL framework that allows learning from one chemical system and transferring the experience to another, as well as from different DFT or DFTB levels.

20.
J Comput Chem ; 45(22): 1886-1898, 2024 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-38698628

RESUMO

Reinforcement learning (RL) has been applied to various domains in computational chemistry and has found wide-spread success. In this review, we first motivate the application of RL to chemistry and list some broad application domains, for example, molecule generation, geometry optimization, and retrosynthetic pathway search. We set up some of the formalism associated with reinforcement learning that should help the reader translate their chemistry problems into a form where RL can be used to solve them. We then discuss the solution formulations and algorithms proposed in recent literature for these problems, the advantages of one over the other, together with the necessary details of the RL algorithms they employ. This article should help the reader understand the state of RL applications in chemistry, learn about some relevant actively-researched open problems, gain insight into how RL can be used to approach them and hopefully inspire innovative RL applications in Chemistry.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa