RESUMO
Unmanned Aerial Vehicles (UAVs) have gained significant popularity in both military and civilian applications due to their cost-effectiveness and flexibility. However, the increased utilization of UAVs raises concerns about the risk of illegal data gathering and potential criminal use. As a result, the accurate detection and identification of intruding UAVs has emerged as a critical research concern. Many algorithms have shown their effectiveness in detecting different objects through different approaches, including radio frequency (RF), computer vision (visual), and sound-based detection. This article proposes a novel approach for detecting and identifying intruding UAVs based on their RF signals by using a hierarchical reinforcement learning technique. We train a UAV agent hierarchically with multiple policies using the REINFORCE algorithm with entropy regularization term to improve the overall accuracy. The research focuses on utilizing extracted features from RF signals to detect intruding UAVs, which contributes to the field of reinforcement learning by investigating a less-explored UAV detection approach. Through extensive evaluation, our findings show the remarkable results of the proposed approach in achieving accurate RF-based detection and identification, with an outstanding detection accuracy of 99.7%. Additionally, our approach demonstrates improved cumulative return performance and reduced loss. The obtained results highlight the effectiveness of the proposed solution in enhancing UAV security and surveillance while advancing the field of UAV detection.
RESUMO
This paper introduces a novel data-driven self-triggered control approach based on a hierarchical reinforcement learning framework in networked motor control systems. This approach divides the self-triggered control policy into higher and lower layers, with the higher-level policy guiding the lower-level policy in decision-making, thereby reducing the exploration space of the lower-level policy and improving the efficiency of the learning process. The data-driven framework integrates with the dual-actor critic algorithm, using two interconnected neural networks to approximate the hierarchical policies. In this framework, we use recurrent neural networks as the network architecture for the critic, utilizing the temporal dynamics of recurrent neural networks to better capture the dependencies between costs, thus enhancing the critic network's efficiency and accuracy in approximating the multi-time cumulative cost function. Additionally, we have developed a pre-training method for the control policy networks to further improve learning efficiency. The effectiveness of our proposed method is validated through a series of numerical simulations.
RESUMO
Infectious diseases, such as Black Death, Spanish Flu, and COVID-19, have accompanied human history and threatened public health, resulting in enormous infections and even deaths among citizens. Because of their rapid development and huge impact, laying out interventions becomes one of the most critical paths for policymakers to respond to the epidemic. However, the existing studies mainly focus on epidemic control with a single intervention, which makes the epidemic control effectiveness severely compromised. In view of this, we propose a Hierarchical Reinforcement Learning decision framework for multi-mode Epidemic Control with multiple interventions called HRL4EC. We devise an epidemiological model, referred to as MID-SEIR, to describe multiple interventions' impact on transmission explicitly, and use it as the environment for HRL4EC. Besides, to address the complexity introduced by multiple interventions, this work transforms the multi-mode intervention decision problem into a multi-level control problem, and employs hierarchical reinforcement learning to find the optimal strategies. Finally, extensive experiments are conducted with real and simulated epidemic data to validate the effectiveness of our proposed method. We further analyze the experiment data in-depth, conclude a series of findings on epidemic intervention strategies, and make a visualization accordingly, which can provide heuristic support for policymakers' pandemic response.
RESUMO
This paper proposes an air combat training framework based on hierarchical reinforcement learning to address the problem of non-convergence in training due to the curse of dimensionality caused by the large state space during air combat tactical pursuit. Using hierarchical reinforcement learning, three-dimensional problems can be transformed into two-dimensional problems, improving training performance compared to other baselines. To further improve the overall learning performance, a meta-learning-based algorithm is established, and the corresponding reward function is designed to further improve the performance of the agent in the air combat tactical chase scenario. The results show that the proposed framework can achieve better performance than the baseline approach.
RESUMO
A hallmark of human intelligence, but challenging for reinforcement learning (RL) agents, is the ability to compositionally generalise, that is, to recompose familiar knowledge components in novel ways to solve new problems. For instance, when navigating in a city, one needs to know the location of the destination and how to operate a vehicle to get there, whether it be pedalling a bike or operating a car. In RL, these correspond to the reward function and transition function, respectively. To compositionally generalize, these two components need to be transferable independently of each other: multiple modes of transport can reach the same goal, and any given mode can be used to reach multiple destinations. Yet there are also instances where it can be helpful to learn and transfer entire structures, jointly representing goals and transitions, particularly whenever these recur in natural tasks (e.g., given a suggestion to get ice cream, one might prefer to bike, even in new towns). Prior theoretical work has explored how, in model-based RL, agents can learn and generalize task components (transition and reward functions). But a satisfactory account for how a single agent can simultaneously satisfy the two competing demands is still lacking. Here, we propose a hierarchical RL agent that learns and transfers individual task components as well as entire structures (particular compositions of components) by inferring both through a non-parametric Bayesian model of the task. It maintains a factorised representation of task components through a hierarchical Dirichlet process, but it also represents different possible covariances between these components through a standard Dirichlet process. We validate our approach on a variety of navigation tasks covering a wide range of statistical correlations between task components and show that it can also improve generalisation and transfer in more complex, hierarchical tasks with goal/subgoal structures. Finally, we end with a discussion of our work including how this clustering algorithm could conceivably be implemented by cortico-striatal gating circuits in the brain.
RESUMO
In the process of disease identification, as the number of diseases increases, the collection of both diseases and symptoms becomes larger. However, existing computer-aided diagnosis systems do not completely solve the dimensional disaster caused by the increasing data set. To address the above problems, we propose methods of using symptom filtering and a weighted network with the goal of deeper processing of the collected symptom information. Symptom filtering is similar to a filter in signal transmission, which can filter the collected symptom information, further reduce the dimensional space of the system, and make the important symptoms more prominent. The weighted network, on the other hand, mines deeper disease information by modeling the channels of symptom information, amplifying important information, and suppressing unimportant information. Compared with existing hierarchical reinforcement learning models, the feature extraction methods proposed in this paper can help existing models improve their accuracy by more than 10%.
RESUMO
In several papers published in Biological Cybernetics in the 1980s and 1990s, Kawato and colleagues proposed computational models explaining how internal models are acquired in the cerebellum. These models were later supported by neurophysiological experiments using monkeys and neuroimaging experiments involving humans. These early studies influenced neuroscience from basic, sensory-motor control to higher cognitive functions. One of the most perplexing enigmas related to internal models is to understand the neural mechanisms that enable animals to learn large-dimensional problems with so few trials. Consciousness and metacognition-the ability to monitor one's own thoughts, may be part of the solution to this enigma. Based on literature reviews of the past 20 years, here we propose a computational neuroscience model of metacognition. The model comprises a modular hierarchical reinforcement-learning architecture of parallel and layered, generative-inverse model pairs. In the prefrontal cortex, a distributed executive network called the "cognitive reality monitoring network" (CRMN) orchestrates conscious involvement of generative-inverse model pairs in perception and action. Based on mismatches between computations by generative and inverse models, as well as reward prediction errors, CRMN computes a "responsibility signal" that gates selection and learning of pairs in perception, action, and reinforcement learning. A high responsibility signal is given to the pairs that best capture the external world, that are competent in movements (small mismatch), and that are capable of reinforcement learning (small reward-prediction error). CRMN selects pairs with higher responsibility signals as objects of metacognition, and consciousness is determined by the entropy of responsibility signals across all pairs. This model could lead to new-generation AI, which exhibits metacognition, consciousness, dimension reduction, selection of modules and corresponding representations, and learning from small samples. It may also lead to the development of a new scientific paradigm that enables the causal study of consciousness by combining CRMN and decoded neurofeedback.
Assuntos
Metacognição , Animais , Inteligência Artificial , Cognição , Reforço Psicológico , RecompensaRESUMO
With the increase in Internet of Things (IoT) devices and network communications, but with less bandwidth growth, the resulting constraints must be overcome. Due to the network complexity and uncertainty of emergency distribution parameters in smart environments, using predetermined rules seems illogical. Reinforcement learning (RL), as a powerful machine learning approach, can handle such smart environments without a trainer or supervisor. Recently, we worked on bandwidth management in a smart environment with several fog fragments using limited shared bandwidth, where IoT devices may experience uncertain emergencies in terms of the time and sequence needed for more bandwidth for further higher-level communication. We introduced fog fragment cooperation using an RL approach under a predefined fixed threshold constraint. In this study, we promote this approach by removing the fixed level of restriction of the threshold through hierarchical reinforcement learning (HRL) and completing the cooperation qualification. At the first learning hierarchy level of the proposed approach, the best threshold level is learned over time, and the final results are used by the second learning hierarchy level, where the fog node learns the best device for helping an emergency device by temporarily lending the bandwidth. Although equipping the method to the adaptive threshold and restricting fog fragment cooperation make the learning procedure more difficult, the HRL approach increases the method's efficiency in terms of time and performance.
Assuntos
Aprendizado Profundo , Internet das Coisas , Aprendizado de MáquinaRESUMO
Extensive studies have shown that many animals' capability of forming spatial representations for self-localization, path planning, and navigation relies on the functionalities of place and head-direction (HD) cells in the hippocampus. Although there are numerous hippocampal modeling approaches, only a few span the wide functionalities ranging from processing raw sensory signals to planning and action generation. This paper presents a vision-based navigation system that involves generating place and HD cells through learning from visual images, building topological maps based on learned cell representations and performing navigation using hierarchical reinforcement learning. First, place and HD cells are trained from sequences of visual stimuli in an unsupervised learning fashion. A modified Slow Feature Analysis (SFA) algorithm is proposed to learn different cell types in an intentional way by restricting their learning to separate phases of the spatial exploration. Then, to extract the encoded metric information from these unsupervised learning representations, a self-organized learning algorithm is adopted to learn over the emerged cell activities and to generate topological maps that reveal the topology of the environment and information about a robot's head direction, respectively. This enables the robot to perform self-localization and orientation detection based on the generated maps. Finally, goal-directed navigation is performed using reinforcement learning in continuous state spaces which are represented by the population activities of place cells. In particular, considering that the topological map provides a natural hierarchical representation of the environment, hierarchical reinforcement learning (HRL) is used to exploit this hierarchy to accelerate learning. The HRL works on different spatial scales, where a high-level policy learns to select subgoals and a low-level policy learns over primitive actions to specialize on the selected subgoals. Experimental results demonstrate that our system is able to navigate a robot to the desired position effectively, and the HRL shows a much better learning performance than the standard RL in solving our navigation tasks.
Assuntos
Aprendizado Profundo , Robótica/métodos , Aprendizado de Máquina não Supervisionado , Modelos Biológicos , Navegação EspacialRESUMO
Recent advances in computational reinforcement learning suggest that humans and animals can learn from different types of reinforcers in a hierarchically organised fashion. According to this theoretical framework, while humans learn to coordinate subroutines based on external reinforcers such as food rewards, simple actions within those subroutines are reinforced by an internal reinforcer called a pseudo-reward. Although the neural mechanisms underlying these processes are unknown, recent empirical evidence suggests that the medial prefrontal cortex (MPFC) is involved. To elucidate this issue, we measured a component of the human event-related brain potential, called the reward positivity, that is said to reflect a reward prediction error signal generated in the MPFC. Using a task paradigm involving reinforcers at two levels of hierarchy, we show that reward positivity amplitude is sensitive to the valence of low-level pseudo-rewards but, contrary to our expectation, is not modulated by high-level rewards. Further, reward positivity amplitude to low-level feedback is modulated by the goals of the higher level. These results, which were further replicated in a control experiment, suggest that the MPFC is involved in the processing of rewards at multiple levels of hierarchy.
Assuntos
Eletroencefalografia/métodos , Potenciais Evocados/fisiologia , Retroalimentação Psicológica/fisiologia , Neuroimagem Funcional/métodos , Córtex Pré-Frontal/fisiologia , Desempenho Psicomotor/fisiologia , Reforço Psicológico , Recompensa , Adolescente , Adulto , Humanos , Adulto JovemRESUMO
Humans routinely formulate plans in domains so complex that even the most powerful computers are taxed. To do so, they seem to avail themselves of many strategies and heuristics that efficiently simplify, approximate, and hierarchically decompose hard tasks into simpler subtasks. Theoretical and cognitive research has revealed several such strategies; however, little is known about their establishment, interaction, and efficiency. Here, we use model-based behavioral analysis to provide a detailed examination of the performance of human subjects in a moderately deep planning task. We find that subjects exploit the structure of the domain to establish subgoals in a way that achieves a nearly maximal reduction in the cost of computing values of choices, but then combine partial searches with greedy local steps to solve subtasks, and maladaptively prune the decision trees of subtasks in a reflexive manner upon encountering salient losses. Subjects come idiosyncratically to favor particular sequences of actions to achieve subgoals, creating novel complex actions or "options."
Assuntos
Técnicas de Planejamento , Humanos , Inteligência , Processos EstocásticosRESUMO
Foraging is a natural behavior that involves making sequential decisions to maximize rewards while minimizing the costs incurred when doing so. The prevalence of foraging across species suggests that a common brain computation underlies its implementation. Although anterior cingulate cortex is believed to contribute to foraging behavior, its specific role has been contentious, with predominant theories arguing either that it encodes environmental value or choice difficulty. Additionally, recent attempts to characterize foraging have taken place within the reinforcement learning framework, with increasingly complex models scaling with task complexity. Here we review reinforcement learning foraging models, highlighting the hierarchical structure of many foraging problems. We extend this literature by proposing that ACC guides foraging according to principles of model-based hierarchical reinforcement learning. This idea holds that ACC function is organized hierarchically along a rostral-caudal gradient, with rostral structures monitoring the status and completion of high-level task goals (like finding food), and midcingulate structures overseeing the execution of task options (subgoals, like harvesting fruit) and lower-level actions (such as grabbing an apple).
Assuntos
Tomada de Decisões , Giro do Cíngulo , Humanos , Animais , Reforço Psicológico , Recompensa , Comportamento Animal , Comportamento de EscolhaRESUMO
In recent years, significant progress has been made in employing reinforcement learning for controlling legged robots. However, a major challenge arises with quadruped robots due to their continuous states and vast action space, making optimal control using simple reinforcement learning controllers particularly challenging. This paper introduces a hierarchical reinforcement learning framework based on the Deep Deterministic Policy Gradient (DDPG) algorithm to achieve optimal motion control for quadruped robots. The framework consists of a high-level planner responsible for generating ideal motion parameters, a low-level controller using model predictive control (MPC), and a trajectory generator. The agents within the high-level planner are trained to provide the ideal motion parameters for the low-level controller. The low-level controller uses MPC and PD controllers to generate the foot-end force and calculates the joint motor torque through inverse kinematics. The simulation results show that the motion performance of the trained hierarchical framework is superior to that obtained using only the DDPG method.
RESUMO
The current paper proposes a hierarchical reinforcement learning (HRL) method to decompose a complex task into simpler sub-tasks and leverage those to improve the training of an autonomous agent in a simulated environment. For practical reasons (i.e., illustrating purposes, easy implementation, user-friendly interface, and useful functionalities), we employ two Python frameworks called TextWorld and MiniGrid. MiniGrid functions as a 2D simulated representation of the real environment, while TextWorld functions as a high-level abstraction of this simulated environment. Training on this abstraction disentangles manipulation from navigation actions and allows us to design a dense reward function instead of a sparse reward function for the lower-level environment, which, as we show, improves the performance of training. Formal methods are utilized throughout the paper to establish that our algorithm is not prevented from deriving solutions.
RESUMO
Humans have the exceptional ability to efficiently structure past knowledge during learning to enable fast generalization. Xia and Collins (2021) evaluated this ability in a hierarchically structured, sequential decision-making task, where participants could build "options" (strategy "chunks") at multiple levels of temporal and state abstraction. A quantitative model, the Option Model, captured the transfer effects observed in human participants, suggesting that humans create and compose hierarchical options and use them to explore novel contexts. However, it is not well understood how learning in a new context is attributed to new and old options (i.e., the credit assignment problem). In a new context with new contingencies, where participants can recompose some aspects of previously learned options, do they reliably create new options or overwrite existing ones? Does the credit assignment depend on how similar the new option is to an old one? In our experiment, two groups of participants (n=124 and n=104) learned hierarchically structured options, experienced different amounts of negative transfer in a new option context, and were subsequently tested on the previously learned options. Behavioral analysis showed that old options were successfully reused without interference, and new options were appropriately created and credited. This credit assignment did not depend on how similar the new option was to the old option, showing great flexibility and precision in human hierarchical learning. These behavioral results were captured by the Option Model, providing further evidence for option learning and transfer in humans.
RESUMO
Modern air defense battlefield situations are complex and varied, requiring high-speed computing capabilities and real-time situational processing for task assignment. Current methods struggle to balance the quality and speed of assignment strategies. This paper proposes a hierarchical reinforcement learning architecture for ground-to-air confrontation (HRL-GC) and an algorithm combining model predictive control with proximal policy optimization (MPC-PPO), which effectively combines the advantages of centralized and distributed approaches. To improve training efficiency while ensuring the quality of the final decision. In a large-scale area air defense scenario, this paper validates the effectiveness and superiority of the HRL-GC architecture and MPC-PPO algorithm, proving that the method can meet the needs of large-scale air defense task assignment in terms of quality and speed.
RESUMO
Social media have become an integral part of our lives, expanding our interlinking capabilities to new levels. There is plenty to be said about their positive effects. On the other hand, however, some serious negative implications of social media have been repeatedly highlighted in recent years, pointing at various threats to society and its more vulnerable members, such as teenagers, in particular, ranging from much-discussed problems such as digital addiction and polarization to manipulative influences of algorithms and further to more teenager-specific issues (e.g., body stereotyping). The impact of social media-both at an individual and societal level-is characterized by the complex interplay between the users' interactions and the intelligent components of the platform. Thus, users' understanding of social media mechanisms plays a determinant role. We thus propose a theoretical framework based on an adaptive "Social Media Virtual Companion" for educating and supporting an entire community, teenage students, to interact in social media environments in order to achieve desirable conditions, defined in terms of a community-specific and participatory designed measure of Collective Well-Being (CWB). This Companion combines automatic processing with expert intervention and guidance. The virtual Companion will be powered by a Recommender System (CWB-RS) that will optimize a CWB metric instead of engagement or platform profit, which currently largely drives recommender systems thereby disregarding any societal collateral effect. CWB-RS will optimize CWB both in the short term by balancing the level of social media threats the users are exposed to, and in the long term by adopting an Intelligent Tutor System role and enabling adaptive and personalized sequencing of playful learning activities. We put an emphasis on experts and educators in the educationally managed social media community of the Companion. They play five key roles: (a) use the Companion in classroom-based educational activities; (b) guide the definition of the CWB; (c) provide a hierarchical structure of learning strategies, objectives and activities that will support and contain the adaptive sequencing algorithms of the CWB-RS based on hierarchical reinforcement learning; (d) act as moderators of direct conflicts between the members of the community; and, finally, (e) monitor and address ethical and educational issues that are beyond the intelligent agent's competence and control. This framework offers a possible approach to understanding how to design social media systems and embedded educational interventions that favor a more healthy and positive society. Preliminary results on the performance of the Companion's components and studies of the educational and psychological underlying principles are presented.
RESUMO
Humans have the astonishing capacity to quickly adapt to varying environmental demands and reach complex goals in the absence of extrinsic rewards. Part of what underlies this capacity is the ability to flexibly reuse and recombine previous experiences, and to plan future courses of action in a psychological space that is shaped by these experiences. Decades of research have suggested that humans use hierarchical representations for efficient planning and flexibility, but the origin of these representations has remained elusive. This study investigates how 73 participants learned hierarchical representations through experience, in a task in which they had to perform complex action sequences to obtain rewards. Complex action sequences were composed of simpler action sequences, which were not rewarded, but whose completion was signaled to participants. We investigated the process with which participants learned to perform simpler action sequences and combined them into complex action sequences. After learning action sequences, participants completed a transfer phase in which either simple sequences or complex sequences were manipulated without notice. Relearning progressed slower when simple than complex sequences were changed, in accordance with a hierarchical representations in which lower levels are quickly consolidated, potentially stabilizing exploration, while higher levels remain malleable, with benefits for flexible recombination.
RESUMO
Despite continual debate for the past 30 years about the function of anterior cingulate cortex (ACC), its key contribution to neurocognition remains unknown. However, recent computational modeling work has provided insight into this question. Here we review computational models that illustrate three core principles of ACC function, related to hierarchy, world models, and cost. We also discuss four constraints on the neural implementation of these principles, related to modularity, binding, encoding, and learning and regulation. These observations suggest a role for ACC in hierarchical model-based hierarchical reinforcement learning (HMB-HRL), which instantiates a mechanism motivating the execution of high-level plans.
Assuntos
Giro do Cíngulo , Reforço Psicológico , Humanos , AprendizagemRESUMO
Existing mobile robots cannot complete some functions. To solve these problems, which include autonomous learning in path planning, the slow convergence of path planning, and planned paths that are not smooth, it is possible to utilize neural networks to enable to the robot to perceive the environment and perform feature extraction, which enables them to have a fitness of environment to state action function. By mapping the current state of these actions through Hierarchical Reinforcement Learning (HRL), the needs of mobile robots are met. It is possible to construct a path planning model for mobile robots based on neural networks and HRL. In this article, the proposed algorithm is compared with different algorithms in path planning. It underwent a performance evaluation to obtain an optimal learning algorithm system. The optimal algorithm system was tested in different environments and scenarios to obtain optimal learning conditions, thereby verifying the effectiveness of the proposed algorithm. Deep Deterministic Policy Gradient (DDPG), a path planning algorithm for mobile robots based on neural networks and hierarchical reinforcement learning, performed better in all aspects than other algorithms. Specifically, when compared with Double Deep Q-Learning (DDQN), DDPG has a shorter path planning time and a reduced number of path steps. When introducing an influence value, this algorithm shortens the convergence time by 91% compared with the Q-learning algorithm and improves the smoothness of the planned path by 79%. The algorithm has a good generalization effect in different scenarios. These results have significance for research on guiding, the precise positioning, and path planning of mobile robots.