Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 3 de 3
Filter
Add more filters










Database
Language
Publication year range
1.
Article in English | MEDLINE | ID: mdl-38781066

ABSTRACT

Numerous real-world decision or control problems involve multiple conflicting objectives whose relative importance (preference) is required to be weighed in different scenarios. While Pareto optimality is desired, environmental uncertainties (e.g., environmental changes or observational noises) may mislead the agent into performing suboptimal policies. In this article, we present a novel multiobjective optimization paradigm, robust multiobjective reinforcement learning (RMORL) considering environmental uncertainties, to train a single model that can approximate robust Pareto-optimal policies across the entire preference space. To enhance policy robustness against environmental changes, an environmental disturbance is modeled as an adversarial agent across the entire preference space via incorporating a zero-sum game into a multiobjective Markov decision process (MOMDP). Additionally, we devise an adversarial defense technique against observational perturbations, which ensures that policy variations, perturbed by adversarial attacks on state observations, remain within bounds under any specified preferences. The proposed technique is assessed in five multiobjective environments with continuous action spaces, showcasing its effectiveness through comparisons with competitive baselines, which encompass classical and state-of-the-art schemes.

2.
Article in English | MEDLINE | ID: mdl-37788189

ABSTRACT

Stochastic exploration is the key to the success of the deep Q -network (DQN) algorithm. However, most existing stochastic exploration approaches either explore actions heuristically regardless of their Q values or couple the sampling with Q values, which inevitably introduce bias into the learning process. In this article, we propose a novel preference-guided ϵ -greedy exploration algorithm that can efficiently facilitate exploration for DQN without introducing additional bias. Specifically, we design a dual architecture consisting of two branches, one of which is a copy of DQN, namely, the Q branch. The other branch, which we call the preference branch, learns the action preference that the DQN implicitly follows. We theoretically prove that the policy improvement theorem holds for the preference-guided ϵ -greedy policy and experimentally show that the inferred action preference distribution aligns with the landscape of corresponding Q values. Intuitively, the preference-guided ϵ -greedy exploration motivates the DQN agent to take diverse actions, so that actions with larger Q values can be sampled more frequently, and those with smaller Q values still have a chance to be explored, thus encouraging the exploration. We comprehensively evaluate the proposed method by benchmarking it with well-known DQN variants in nine different environments. Extensive results confirm the superiority of our proposed method in terms of performance and convergence speed.

3.
Bioengineering (Basel) ; 10(4)2023 Mar 30.
Article in English | MEDLINE | ID: mdl-37106623

ABSTRACT

Based on the principles of neuromechanics, human arm movements result from the dynamic interaction between the nervous, muscular, and skeletal systems. To develop an effective neural feedback controller for neuro-rehabilitation training, it is important to consider both the effects of muscles and skeletons. In this study, we designed a neuromechanics-based neural feedback controller for arm reaching movements. To achieve this, we first constructed a musculoskeletal arm model based on the actual biomechanical structure of the human arm. Subsequently, a hybrid neural feedback controller was developed that mimics the multifunctional areas of the human arm. The performance of this controller was then validated through numerical simulation experiments. The simulation results demonstrated a bell-shaped movement trajectory, consistent with the natural motion of human arm movements. Furthermore, the experiment testing the tracking ability of the controller revealed real-time errors within one millimeter, with the tensile force generated by the controller's muscles being stable and maintained at a low value, thereby avoiding the issue of muscle strain that can occur due to excessive excitation during the neurorehabilitation process.

SELECTION OF CITATIONS
SEARCH DETAIL
...