Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 17.927
Filter
1.
Behav Brain Sci ; 47: e165, 2024 Sep 23.
Article in English | MEDLINE | ID: mdl-39311518

ABSTRACT

Building on the affectivism approach, we expand on Binz et al.'s meta-learning research program by highlighting that emotion and other affective phenomena should be key to the modeling of human learning. We illustrate the added value of affective processes for models of learning across multiple domains with a focus on reinforcement learning, knowledge acquisition, and social learning.


Subject(s)
Affect , Cognition , Learning , Humans , Cognition/physiology , Learning/physiology , Affect/physiology , Social Learning/physiology , Models, Psychological , Reinforcement, Psychology , Emotions/physiology
2.
Behav Brain Sci ; 47: e168, 2024 Sep 23.
Article in English | MEDLINE | ID: mdl-39311524

ABSTRACT

We argue that the type of meta-learning proposed by Binz et al. generates models with low interpretability and falsifiability that have limited usefulness for neuroscience research. An alternative approach to meta-learning based on hyperparameter optimization obviates these concerns and can generate empirically testable hypotheses of biological computations.


Subject(s)
Learning , Reinforcement, Psychology , Humans , Models, Psychological
3.
Learn Mem ; 31(8)2024 Aug.
Article in English | MEDLINE | ID: mdl-39260876

ABSTRACT

Safety signals reinforce instrumental avoidance behavior in nonhuman animals. However, there are no conclusive demonstrations of this phenomenon in humans. Using human participants in an avoidance task, Experiments 1-3 and 5 were conducted online to assess the reinforcing properties of safety signals, and Experiment 4 was conducted in the laboratory. Participants were trained with CSs+ and CSs-, and they could avoid an aversive outcome during presentations of the CSs+ by pressing their space bar at a specific time. If successful, the aversive outcome was not presented but instead a safety signal was. Participants were then tested-whilst on extinction-with two new ambiguous test CSs. If participants made an avoidance response, one of the test CSs produced the trained safety signal and the other was a control. In Experiments 1 and 4, the control was followed by no signal. In Experiment 2, the control was followed by a signal that differed in one dimension (color) with the trained safety signal, and in Experiment 3, the control differed in two dimensions (shape and color) from the trained safety signal. Experiment 5 tested the reinforcing properties of the safety signal using a choice procedure and a new response during test. We observed that participants made more avoidance responses to the ambiguous test CSs when followed by the trained signal in Experiments 1, 3, 4, and 5 (but not in Experiment 2). Overall, these results suggest that trained safety signals can reinforce avoidance behavior in humans.


Subject(s)
Avoidance Learning , Conditioning, Operant , Reinforcement, Psychology , Humans , Avoidance Learning/physiology , Male , Female , Young Adult , Adult , Conditioning, Operant/physiology , Extinction, Psychological/physiology , Adolescent
4.
Learn Mem ; 31(8)2024 Aug.
Article in English | MEDLINE | ID: mdl-39284619

ABSTRACT

"Pavlovian" or "motivational" biases are the phenomenon that the valence of prospective outcomes modulates action invigoration: the prospect of reward invigorates actions, while the prospect of punishment suppresses actions. Effects of the valence of prospective outcomes are well established, but it remains unclear how the magnitude of outcomes ("stake magnitude") modulates these biases. In this preregistered study (N = 55), we manipulated stake magnitude (high vs. low) in an orthogonalized Motivational Go/NoGo Task. We tested whether higher stakes (a) strengthen biases or (b) elicit cognitive control recruitment, enhancing the suppression of biases in motivationally incongruent conditions. Confirmatory tests showed that high stakes slowed down responding, especially in motivationally incongruent conditions. However, high stakes did not affect whether a response was made or not, and did not change the magnitude of Pavlovian biases. Reinforcement-learning drift-diffusion models (RL-DDMs) fit to the data suggested that response slowing was best captured by stakes prolonging the non-decision time. There was no effect of the stakes on the response threshold (as in typical speed-accuracy trade-offs). In sum, these results suggest that high stakes slow down responses without affecting the expression of Pavlovian biases in behavior. We speculate that this slowing under high stakes might reflect heightened cognitive control, which is however ineffectively used, or reflect positive conditioned suppression, i.e., the interference between goal-directed and consummatory behaviors, a phenomenon previously observed in rodents that might also exist in humans. Pavlovian biases and slowing under high stakes may arise in parallel to each other.


Subject(s)
Conditioning, Classical , Motivation , Reward , Humans , Male , Motivation/physiology , Young Adult , Female , Conditioning, Classical/physiology , Adult , Reaction Time/physiology , Adolescent , Punishment , Reinforcement, Psychology , Psychomotor Performance/physiology
5.
PLoS Comput Biol ; 20(9): e1012404, 2024 Sep.
Article in English | MEDLINE | ID: mdl-39231162

ABSTRACT

Humans tend to give more weight to information confirming their beliefs than to information that disconfirms them. Nevertheless, this apparent irrationality has been shown to improve individual decision-making under uncertainty. However, little is known about this bias' impact on decision-making in a social context. Here, we investigate the conditions under which confirmation bias is beneficial or detrimental to decision-making under social influence. To do so, we develop a Collective Asymmetric Reinforcement Learning (CARL) model in which artificial agents observe others' actions and rewards, and update this information asymmetrically. We use agent-based simulations to study how confirmation bias affects collective performance on a two-armed bandit task, and how resource scarcity, group size and bias strength modulate this effect. We find that a confirmation bias benefits group learning across a wide range of resource-scarcity conditions. Moreover, we discover that, past a critical bias strength, resource abundance favors the emergence of two different performance regimes, one of which is suboptimal. In addition, we find that this regime bifurcation comes with polarization in small groups of agents. Overall, our results suggest the existence of an optimal, moderate level of confirmation bias for decision-making in a social context.


Subject(s)
Decision Making , Reinforcement, Psychology , Decision Making/physiology , Humans , Computer Simulation , Computational Biology , Reward , Bias , Learning/physiology , Models, Psychological
6.
Article in English | MEDLINE | ID: mdl-39302783

ABSTRACT

Deep Brain Stimulation (DBS) is effective for movement disorders, particularly Parkinson's disease (PD). However, a closed-loop DBS system using reinforcement learning (RL) for automatic parameter tuning, offering enhanced energy efficiency and the effect of thalamus restoration, is yet to be developed for clinical and commercial applications. In this research, we instantiate a basal ganglia-thalamic (BGT) model and design it as an interactive environment suitable for RL models. Four finely tuned RL agents based on different frameworks, namely Soft Actor-Critic (SAC), Twin Delayed Deep Deterministic Policy Gradient (TD3), Proximal Policy Optimization (PPO), and Advantage Actor-Critic (A2C), are established for further comparison. Within the implemented RL architectures, the optimized TD3 demonstrates a significant 67% reduction in average power dissipation when compared to the open-loop system while preserving the normal response of the simulated BGT circuitry. As a result, our method mitigates thalamic error responses under pathological conditions and prevents overstimulation. In summary, this study introduces a novel approach to implementing an adaptive parameter-tuning closed-loop DBS system. Leveraging the advantages of TD3, our proposed approach holds significant promise for advancing the integration of RL applications into DBS systems, ultimately optimizing therapeutic effects in future clinical trials.


Subject(s)
Algorithms , Basal Ganglia , Computer Simulation , Deep Brain Stimulation , Reinforcement, Psychology , Thalamus , Deep Brain Stimulation/methods , Humans , Thalamus/physiology , Basal Ganglia/physiology , Parkinson Disease/therapy , Models, Neurological , Neural Networks, Computer
7.
Transl Psychiatry ; 14(1): 394, 2024 Sep 30.
Article in English | MEDLINE | ID: mdl-39349428

ABSTRACT

Psilocybin has shown promise as a novel pharmacological intervention for treatment of depression, where post-acute effects of psilocybin treatment have been associated with increased positive mood and decreased pessimism. Although psilocybin is proving to be effective in clinical trials for treatment of psychiatric disorders, the information processing mechanisms affected by psilocybin are not well understood. Here, we fit active inference and reinforcement learning computational models to a novel two-armed bandit reversal learning task capable of capturing engagement behaviour in rats. The model revealed that after receiving psilocybin, rats achieve more rewards through increased task engagement, mediated by modification of forgetting rates and reduced loss aversion. These findings suggest that psilocybin may afford an optimism bias that arises through altered belief updating, with translational potential for clinical populations characterised by lack of optimism.


Subject(s)
Behavior, Animal , Psilocybin , Animals , Psilocybin/pharmacology , Rats , Male , Behavior, Animal/drug effects , Optimism , Hallucinogens/pharmacology , Computer Simulation , Reversal Learning/drug effects , Reward , Reinforcement, Psychology
8.
Dev Neurorehabil ; 27(7): 268-272, 2024 Oct.
Article in English | MEDLINE | ID: mdl-39217464

ABSTRACT

Pica is a life-threatening behavior that is relatively common among individuals with intellectual and developmental disabilities. Pica can be conceptualized as a response chain in which the pica item acts as a discriminative stimulus for the next response (i.e. picking up the pica item), which itself acts as a discriminative stimulus for the final response (i.e. consumption). Interventions that disrupt this response chain and alter the discriminative properties of the pica stimulus may be clinically indicated. Preliminary research supports response-interruption and redirection (RIRD) with differential reinforcement of alternative behavior (DRA) as an effective intervention for pica. We evaluated this procedure in an inpatient unit with a young boy with who engaged in pica. Our outcomes provide additional support for DRA with RIRD as an effective pica treatment.


Subject(s)
Behavior Therapy , Pica , Humans , Male , Behavior Therapy/methods , Child , Reinforcement, Psychology , Intellectual Disability , Developmental Disabilities/rehabilitation
9.
Elife ; 132024 Sep 06.
Article in English | MEDLINE | ID: mdl-39240757

ABSTRACT

Theoretical computational models are widely used to describe latent cognitive processes. However, these models do not equally explain data across participants, with some individuals showing a bigger predictive gap than others. In the current study, we examined the use of theory-independent models, specifically recurrent neural networks (RNNs), to classify the source of a predictive gap in the observed data of a single individual. This approach aims to identify whether the low predictability of behavioral data is mainly due to noisy decision-making or misspecification of the theoretical model. First, we used computer simulation in the context of reinforcement learning to demonstrate that RNNs can be used to identify model misspecification in simulated agents with varying degrees of behavioral noise. Specifically, both prediction performance and the number of RNN training epochs (i.e., the point of early stopping) can be used to estimate the amount of stochasticity in the data. Second, we applied our approach to an empirical dataset where the actions of low IQ participants, compared with high IQ participants, showed lower predictability by a well-known theoretical model (i.e., Daw's hybrid model for the two-step task). Both the predictive gap and the point of early stopping of the RNN suggested that model misspecification is similar across individuals. This led us to a provisional conclusion that low IQ subjects are mostly noisier compared to their high IQ peers, rather than being more misspecified by the theoretical model. We discuss the implications and limitations of this approach, considering the growing literature in both theoretical and data-driven computational modeling in decision-making science.


Subject(s)
Choice Behavior , Neural Networks, Computer , Humans , Choice Behavior/physiology , Computer Simulation , Stochastic Processes , Reinforcement, Psychology , Male , Female , Decision Making/physiology , Adult , Young Adult
10.
Sci Adv ; 10(36): eadi7137, 2024 Sep 06.
Article in English | MEDLINE | ID: mdl-39241065

ABSTRACT

Contemporary theories guiding the search for neural mechanisms of learning and memory assume that associative learning results from the temporal pairing of cues and reinforcers resulting in coincident activation of associated neurons, strengthening their synaptic connection. While enduring, this framework has limitations: Temporal pairing-based models of learning do not fit with many experimental observations and cannot be used to make quantitative predictions about behavior. Here, we present behavioral data that support an alternative, information-theoretic conception: The amount of information that cues provide about the timing of reward delivery predicts behavior. Furthermore, this approach accounts for the rate and depth of both inhibitory and excitatory learning across paradigms and species. We also show that dopamine release in the ventral striatum reflects cue-predicted changes in reinforcement rates consistent with subjects understanding temporal relationships between task events. Our results reshape the conceptual and biological framework for understanding associative learning.


Subject(s)
Cues , Dopamine , Learning , Dopamine/metabolism , Animals , Learning/physiology , Male , Reward , Association Learning/physiology , Rats , Humans , Reinforcement, Psychology
11.
Elife ; 132024 Sep 10.
Article in English | MEDLINE | ID: mdl-39255007

ABSTRACT

Previous studies on reinforcement learning have identified three prominent phenomena: (1) individuals with anxiety or depression exhibit a reduced learning rate compared to healthy subjects; (2) learning rates may increase or decrease in environments with rapidly changing (i.e. volatile) or stable feedback conditions, a phenomenon termed learning rate adaptation; and (3) reduced learning rate adaptation is associated with several psychiatric disorders. In other words, multiple learning rate parameters are needed to account for behavioral differences across participant populations and volatility contexts in this flexible learning rate (FLR) model. Here, we propose an alternative explanation, suggesting that behavioral variation across participant populations and volatile contexts arises from the use of mixed decision strategies. To test this hypothesis, we constructed a mixture-of-strategies (MOS) model and used it to analyze the behaviors of 54 healthy controls and 32 patients with anxiety and depression in volatile reversal learning tasks. Compared to the FLR model, the MOS model can reproduce the three classic phenomena by using a single set of strategy preference parameters without introducing any learning rate differences. In addition, the MOS model can successfully account for several novel behavioral patterns that cannot be explained by the FLR model. Preferences for different strategies also predict individual variations in symptom severity. These findings underscore the importance of considering mixed strategy use in human learning and decision-making and suggest atypical strategy preference as a potential mechanism for learning deficits in psychiatric disorders.


Subject(s)
Anxiety , Decision Making , Depression , Humans , Male , Female , Adult , Decision Making/physiology , Uncertainty , Young Adult , Reinforcement, Psychology , Models, Psychological , Reversal Learning/physiology
12.
Neural Netw ; 179: 106596, 2024 Nov.
Article in English | MEDLINE | ID: mdl-39163823

ABSTRACT

De novo molecular design is the process of learning knowledge from existing data to propose new chemical structures that satisfy the desired properties. By using de novo design to generate compounds in a directed manner, better solutions can be obtained in large chemical libraries with less comparison cost. But drug design needs to take multiple factors into consideration. For example, in polypharmacology, molecules that activate or inhibit multiple target proteins produce multiple pharmacological activities and are less susceptible to drug resistance. However, most existing molecular generation methods either focus only on affinity for a single target or fail to effectively balance the relationship between multiple targets, resulting in insufficient validity and desirability of the generated molecules. To address the problems, an approach called clustered Pareto-based reinforcement learning (CPRL) is proposed. In CPRL, a pre-trained model is constructed to grasp existing molecular knowledge in a supervised learning manner. In addition, the clustered Pareto optimization algorithm is presented to find the best solution between different objectives. The algorithm first extracts an update set from the sampled molecules through the designed aggregation-based molecular clustering. Then, the final reward is computed by constructing the Pareto frontier ranking of the molecules from the updated set. To explore the vast chemical space, a reinforcement learning agent is designed in CPRL that can be updated under the guidance of the final reward to balance multiple properties. Furthermore, to increase the internal diversity of the molecules, a fixed-parameter exploration model is used for sampling in conjunction with the agent. The experimental results demonstrate that CPRL is capable of balancing multiple properties of the molecule and has higher desirability and validity, reaching 0.9551 and 0.9923, respectively.


Subject(s)
Algorithms , Drug Design/methods , Reinforcement, Psychology , Cluster Analysis , Supervised Machine Learning , Neural Networks, Computer
13.
Neural Netw ; 179: 106552, 2024 Nov.
Article in English | MEDLINE | ID: mdl-39089154

ABSTRACT

Multi-agent reinforcement learning (MARL) effectively improves the learning speed of agents in sparse reward tasks with the guide of subgoals. However, existing works sever the consistency of the learning objectives of the subgoal generation and subgoal reached stages, thereby significantly inhibiting the effectiveness of subgoal learning. To address this problem, we propose a novel Potential field Subgoal-based Multi-Agent reinforcement learning (PSMA) method, which introduces the potential field (PF) to unify the two-stage learning objectives. Specifically, we design a state-to-PF representation model that describes agents' states as potential fields, allowing easy measurement of the interaction effect for both allied and enemy agents. With the PF representation, a subgoal selector is designed to automatically generate multiple subgoals for each agent, drawn from the experience replay buffer that contains both individual and total PF values. Based on the determined subgoals, we define an intrinsic reward function to guide the agent to reach their respective subgoals while maximizing the joint action-value. Experimental results show that our method outperforms the state-of-the-art MARL method on both StarCraft II micro-management (SMAC) and Google Research Football (GRF) tasks with sparse reward settings.


Subject(s)
Reinforcement, Psychology , Reward , Neural Networks, Computer , Humans , Algorithms , Machine Learning
14.
Neural Netw ; 179: 106543, 2024 Nov.
Article in English | MEDLINE | ID: mdl-39089158

ABSTRACT

Recent successes in robot learning have significantly enhanced autonomous systems across a wide range of tasks. However, they are prone to generate similar or the same solutions, limiting the controllability of the robot to behave according to user intentions. These limited robot behaviors may lead to collisions and potential harm to humans. To resolve these limitations, we introduce a semi-autonomous teleoperation framework that enables users to operate a robot by selecting a high-level command, referred to as option. Our approach aims to provide effective and diverse options by a learned policy, thereby enhancing the efficiency of the proposed framework. In this work, we propose a quality-diversity (QD) based sampling method that simultaneously optimizes both the quality and diversity of options using reinforcement learning (RL). Additionally, we present a mixture of latent variable models to learn multiple policy distributions defined as options. In experiments, we show that the proposed method achieves superior performance in terms of the success rate and diversity of the options in simulation environments. We further demonstrate that our method outperforms manual keyboard control for time duration over cluttered real-world environments.


Subject(s)
Reinforcement, Psychology , Robotics , Robotics/methods , Humans , Machine Learning , Computer Simulation , Algorithms , Neural Networks, Computer
15.
Neural Netw ; 179: 106579, 2024 Nov.
Article in English | MEDLINE | ID: mdl-39096749

ABSTRACT

How to accurately learn task-relevant state representations from high-dimensional observations with visual distractions is a realistic and challenging problem in visual reinforcement learning. Recently, unsupervised representation learning methods based on bisimulation metrics, contrast, prediction, and reconstruction have shown the ability for task-relevant information extraction. However, due to the lack of appropriate mechanisms for the extraction of task information in the prediction, contrast, and reconstruction-related approaches and the limitations of bisimulation-related methods in domains with sparse rewards, it is still difficult for these methods to be effectively extended to environments with distractions. To alleviate these problems, in the paper, the action sequences, which contain task-intensive signals, are incorporated into representation learning. Specifically, we propose a Sequential Action-induced invariant Representation (SAR) method, which decouples the controlled part (i.e., task-relevant information) and the uncontrolled part (i.e., task-irrelevant information) in noisy observations through sequential actions, thereby extracting effective representations related to decision tasks. To achieve it, the characteristic function of the action sequence's probability distribution is modeled to specifically optimize the state encoder. We conduct extensive experiments on the distracting DeepMind Control suite while achieving the best performance over strong baselines. We also demonstrate the effectiveness of our method at disregarding task-irrelevant information by applying SAR to real-world CARLA-based autonomous driving with natural distractions. Finally, we provide the analysis results of generalization drawn from the generalization decay and t-SNE visualization. Code and demo videos are available at https://github.com/DMU-XMU/SAR.git.


Subject(s)
Reinforcement, Psychology , Humans , Neural Networks, Computer , Algorithms
16.
Neural Netw ; 179: 106565, 2024 Nov.
Article in English | MEDLINE | ID: mdl-39111159

ABSTRACT

In cooperative multi-agent reinforcement learning, agents jointly optimize a centralized value function based on the rewards shared by all agents and learn decentralized policies through value function decomposition. Although such a learning framework is considered effective, estimating individual contribution from the rewards, which is essential for learning highly cooperative behaviors, is difficult. In addition, it becomes more challenging when reinforcement and punishment, help in increasing or decreasing the specific behaviors of agents, coexist because the processes of maximizing reinforcement and minimizing punishment can often conflict in practice. This study proposes a novel exploration scheme called multi-agent decomposed reward-based exploration (MuDE), which preferably explores the action spaces associated with positive sub-rewards based on a modified reward decomposition scheme, thus effectively exploring action spaces not reachable by existing exploration schemes. We evaluate MuDE with a challenging set of StarCraft II micromanagement and modified predator-prey tasks extended to include reinforcement and punishment. The results show that MuDE accurately estimates sub-rewards and outperforms state-of-the-art approaches in both convergence speed and win rates.


Subject(s)
Punishment , Reinforcement, Psychology , Reward , Neural Networks, Computer , Cooperative Behavior , Humans , Algorithms
17.
Neural Netw ; 179: 106621, 2024 Nov.
Article in English | MEDLINE | ID: mdl-39153402

ABSTRACT

Vehicular edge computing (VEC), a promising paradigm for the development of emerging intelligent transportation systems, can provide lower service latency for vehicular applications. However, it is still a challenge to fulfill the requirements of such applications with stringent latency requirements in the VEC system with limited resources. In addition, existing methods focus on handling the offloading task in a certain time slot with statically allocated resources, but ignore the heterogeneous tasks' different resource requirements, resulting in resource wastage. To solve the real-time task offloading and heterogeneous resource allocation problem in VEC system, we propose a decentralized solution based on the attention mechanism and recurrent neural networks (RNN) with a multi-agent distributed deep deterministic policy gradient (AR-MAD4PG). First, to address the partial observability of agents, we construct a shared agent graph and propose a periodic communication mechanism that enables edge nodes to aggregate information from other edge nodes. Second, to help agents better understand the current system state, we design an RNN-based feature extraction network to capture the historical state and resource allocation information of the VEC system. Thirdly, to tackle the challenges of excessive joint observation-action space and ineffective information interference, we adopt the multi-head attention mechanism to compress the dimension of the observation-action space of agents. Finally, we build a simulation model based on the actual vehicle trajectories, and the experimental results show that our proposed method outperforms the existing approaches.


Subject(s)
Neural Networks, Computer , Resource Allocation , Reinforcement, Psychology , Internet , Transportation , Algorithms , Computer Simulation , Deep Learning
18.
Neural Comput ; 36(9): 1854-1885, 2024 Aug 19.
Article in English | MEDLINE | ID: mdl-39106455

ABSTRACT

In reinforcement learning (RL), artificial agents are trained to maximize numerical rewards by performing tasks. Exploration is essential in RL because agents must discover information before exploiting it. Two rewards encouraging efficient exploration are the entropy of action policy and curiosity for information gain. Entropy is well established in the literature, promoting randomized action selection. Curiosity is defined in a broad variety of ways in literature, promoting discovery of novel experiences. One example, prediction error curiosity, rewards agents for discovering observations they cannot accurately predict. However, such agents may be distracted by unpredictable observational noises known as curiosity traps. Based on the free energy principle (FEP), this letter proposes hidden state curiosity, which rewards agents by the KL divergence between the predictive prior and posterior probabilities of latent variables. We trained six types of agents to navigate mazes: baseline agents without rewards for entropy or curiosity and agents rewarded for entropy and/or either prediction error curiosity or hidden state curiosity. We find that entropy and curiosity result in efficient exploration, especially both employed together. Notably, agents with hidden state curiosity demonstrate resilience against curiosity traps, which hinder agents with prediction error curiosity. This suggests implementing the FEP that may enhance the robustness and generalization of RL models, potentially aligning the learning processes of artificial and biological agents.


Subject(s)
Exploratory Behavior , Reinforcement, Psychology , Reward , Exploratory Behavior/physiology , Humans , Entropy , Computer Simulation
19.
Nat Commun ; 15(1): 7590, 2024 Aug 31.
Article in English | MEDLINE | ID: mdl-39217160

ABSTRACT

Neural systems have evolved not only to solve environmental challenges through internal representations but also, under social constraints, to communicate these to conspecifics. In this work, we aim to understand the structure of these internal representations and how they may be optimized to transmit pertinent information from one individual to another. Thus, we build on previous teacher-student communication protocols to analyze the formation of individual and shared abstractions and their impact on task performance. We use reinforcement learning in grid-world mazes where a teacher network passes a message to a student to improve task performance. This framework allows us to relate environmental variables with individual and shared representations. We compress high-dimensional task information within a low-dimensional representational space to mimic natural language features. In coherence with previous results, we find that providing teacher information to the student leads to a higher task completion rate and an ability to generalize tasks it has not seen before. Further, optimizing message content to maximize student reward improves information encoding, suggesting that an accurate representation in the space of messages requires bi-directional input. These results highlight the role of language as a common representation among agents and its implications on generalization capabilities.


Subject(s)
Language , Social Learning , Humans , Reinforcement, Psychology , Learning/physiology , Task Performance and Analysis
20.
Artif Intell Med ; 156: 102945, 2024 Oct.
Article in English | MEDLINE | ID: mdl-39178622

ABSTRACT

In the formulation of strategies for walking rehabilitation, achieving precise identification of the current state and making rational predictions about the future state are crucial but often unrealized. To tackle this challenge, our study introduces a unified framework that integrates a novel 3D walking motion capture method using multi-source image fusion and a walking rehabilitation simulation approach based on multi-agent reinforcement learning. We found that, (i) the proposal achieved an accurate 3D walking motion capture and outperforms other advanced methods. Experimental evidence indicates that, compared to similar visual skeleton tracking methods, the proposed approach yields results with higher Pearson correlation (r=0.93), intra-class correlation coefficient (ICC(2,1)=0.91), and narrower confidence intervals ([0.90,0.95] for r, [0.88,0.94] for ICC(2,1)) when compared to standard results. The outcomes of the proposed approach also exhibit commendable correlation and concurrence with those obtained through the IMU-based skeleton tracking method in the assessment of gait parameters ([0.85,0.89] for r, [0.75,0.81] for ICC(2,1)); (ii) multi-agent reinforcement learning has the potential to be used to solve the simulation task of gait rehabilitation. In mimicry experiment, our proposed simulation method for gait rehabilitation not only enables the intelligent agent to converge from the initial state to the target state, but also observes evolutionary patterns similar to those observed in clinical practice through motor state resolution. This study offers valuable contributions to walking rehabilitation, enabling precise assessment and simulation-based interventions, with potential implications for clinical practice and patient outcomes.


Subject(s)
Gait , Walking , Humans , Walking/physiology , Gait/physiology , Computer Simulation , Reinforcement, Psychology , Imaging, Three-Dimensional/methods , Machine Learning
SELECTION OF CITATIONS
SEARCH DETAIL