Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 499
Filter
1.
Sci Rep ; 14(1): 22188, 2024 Sep 27.
Article in English | MEDLINE | ID: mdl-39333598

ABSTRACT

In the air-to-ground transmissions, the lifespan of the network is based on the "unmanned aerial vehicle's (UAV)" life span because of the limited battery capacity. Thus, the enhancement of energy efficiency and the outage of the ground candidate's minimization are significant factors of the network functionality. UAV-aided transmission can highly enhance the spectrum efficacy and coverage. Because of their flexible deployment and the high maneuverability, the UAVs can be the best alternative for the situations where the "Internet of Things (IoT)" systems utilize more energy to attain the essential information rate, when they are far away from the terrestrial base station. Therefore, it is significant to win over the few troubles in the conventional UAV-aided efficiency approaches. Thus, this proposed work is aimed to design an innovative energy efficiency framework in the UAV-assisted network using a reinforcement learning mechanism. The energy efficiency optimization in the UAV offers better wireless coverage to the static and mobile ground user. Presently, reinforcement learning techniques effectively optimize the energy efficiency rate of the system by employing the 2D trajectory mechanism, which effectively removes the interference rate attained in the nearby UAV cells. The main objective of the recommended framework is to maximize the energy efficiency rate of the UAV network by performing the joint optimization using UAV 3D trajectory, with the energy utilized during interference accounting, and connected user counts. Hence, an efficient Adaptive Deep Reinforcement Learning with Novel Loss Function (ADRL-NLF) framework is designed to provide a better energy efficiency rate to the UAV network. Moreover, the parameter of ADRL is tuned using the Hybrid Energy Valley and Hermit Crab (HEVHC) algorithm. Various experimental observations are performed to observe the effectualness rate of the recommended energy efficiency model for UAV-based networks over the classical energy efficiency framework in UAV Networks.

2.
Sensors (Basel) ; 24(18)2024 Sep 18.
Article in English | MEDLINE | ID: mdl-39338773

ABSTRACT

Due to the radial network structures, small cross-sectional lines, and light loads characteristic of existing AC distribution networks in mountainous areas, the development of active distribution networks (ADNs) in these regions has revealed significant issues with integrating distributed generation (DGs) and consuming renewable energy. Focusing on this issue, this paper proposes a wide-range thyristor-controlled series compensation (TCSC)-based ADN and presents a deep reinforcement learning (DRL)-based optimal operation strategy. This strategy takes into account the complementarity of hydropower, photovoltaic (PV) systems, and energy storage systems (ESSs) to enhance the capacity for consuming renewable energy. In the proposed ADN, a wide-range TCSC connects the sub-networks where PV and hydropower systems are located, with ESSs configured for each renewable energy generation. The designed wide-range TCSC allows for power reversal and improves power delivery efficiency, providing conditions for the optimization operation. The optimal operation issue is formulated as a Markov decision process (MDP) with continuous action space and solved using the twin delayed deep deterministic policy gradient (TD3) algorithm. The optimal objective is to maximize the consumption of renewable energy sources (RESs) and minimize line losses by coordinating the charging/discharging of ESSs with the operation mode of the TCSC. The simulation results demonstrate the effectiveness of the proposed method.

3.
Sensors (Basel) ; 24(18)2024 Sep 20.
Article in English | MEDLINE | ID: mdl-39338825

ABSTRACT

Next-generation mobile networks, such as those beyond the 5th generation (B5G) and 6th generation (6G), have diverse network resource demands. Network slicing (NS) and device-to-device (D2D) communication have emerged as promising solutions for network operators. NS is a candidate technology for this scenario, where a single network infrastructure is divided into multiple (virtual) slices to meet different service requirements. Combining D2D and NS can improve spectrum utilization, providing better performance and scalability. This paper addresses the challenging problem of dynamic resource allocation with wireless network slices and D2D communications using deep reinforcement learning (DRL) techniques. More specifically, we propose an approach named DDPG-KRP based on deep deterministic policy gradient (DDPG) with K-nearest neighbors (KNNs) and reward penalization (RP) for undesirable action elimination to determine the resource allocation policy maximizing long-term rewards. The simulation results show that the DDPG-KRP is an efficient solution for resource allocation in wireless networks with slicing, outperforming other considered DRL algorithms.

4.
Sensors (Basel) ; 24(18)2024 Sep 20.
Article in English | MEDLINE | ID: mdl-39338833

ABSTRACT

Vehicle-to-everything (V2X) communication is pivotal in enhancing cooperative awareness in vehicular networks. Typically, awareness is viewed as a vehicle's ability to perceive and share real-time kinematic information. We present a novel definition of awareness in V2X communications, conceptualizing it as a multi-faceted concept involving vehicle detection, tracking, and maintaining their safety distances. To enhance this awareness, we propose a deep reinforcement learning framework for the joint control of beacon rate and transmit power (DRL-JCBRTP). Our DRL-JCBRTP framework integrates LSTM-based actor networks and MLP-based critic networks within the Soft Actor-Critic (SAC) algorithm to effectively learn optimal policies. Leveraging local state information, the DRL-JCBRTP scheme uses an innovative reward function to increase the minimum awareness failure distance. Our SLMLab-Gym-VEINS simulations show that the DRL-JCBRTP scheme outperforms existing beaconing schemes in minimizing awareness failure probability and maximizing awareness distance, ultimately improving driving safety.

5.
J Neurosci Methods ; 412: 110292, 2024 Sep 17.
Article in English | MEDLINE | ID: mdl-39299579

ABSTRACT

BACKGROUND: Due to the sparse encoding character of the human visual cortex and the scarcity of paired training samples for {images, fMRIs}, voxel selection is an effective means of reconstructing perceived images from fMRI. However, the existing data-driven voxel selection methods have not achieved satisfactory results. NEW METHOD: Here, a novel deep reinforcement learning-guided sparse voxel (DRL-SV) decoding model is proposed to reconstruct perceived images from fMRI. We innovatively describe voxel selection as a Markov decision process (MDP), training agents to select voxels that are highly involved in specific visual encoding. RESULTS: Experimental results on two public datasets verify the effectiveness of the proposed DRL-SV, which can accurately select voxels highly involved in neural encoding, thereby improving the quality of visual image reconstruction. COMPARISON WITH EXISTING METHODS: We qualitatively and quantitatively compared our results with the state-of-the-art (SOTA) methods, getting better reconstruction results. We compared the proposed DRL-SV with traditional data-driven baseline methods, obtaining sparser voxel selection results, but better reconstruction performance. CONCLUSIONS: DRL-SV can accurately select voxels involved in visual encoding on few-shot, compared to data-driven voxel selection methods. The proposed decoding model provides a new avenue to improving the image reconstruction quality of the primary visual cortex.

6.
Med Image Anal ; 99: 103345, 2024 Sep 16.
Article in English | MEDLINE | ID: mdl-39293187

ABSTRACT

Spinal fusion surgery requires highly accurate implantation of pedicle screw implants, which must be conducted in critical proximity to vital structures with a limited view of the anatomy. Robotic surgery systems have been proposed to improve placement accuracy. Despite remarkable advances, current robotic systems still lack advanced mechanisms for continuous updating of surgical plans during procedures, which hinders attaining higher levels of robotic autonomy. These systems adhere to conventional rigid registration concepts, relying on the alignment of preoperative planning to the intraoperative anatomy. In this paper, we propose a safe deep reinforcement learning (DRL) planning approach (SafeRPlan) for robotic spine surgery that leverages intraoperative observation for continuous path planning of pedicle screw placement. The main contributions of our method are (1) the capability to ensure safe actions by introducing an uncertainty-aware distance-based safety filter; (2) the ability to compensate for incomplete intraoperative anatomical information, by encoding a-priori knowledge of anatomical structures with neural networks pre-trained on pre-operative images; and (3) the capability to generalize over unseen observation noise thanks to the novel domain randomization techniques. Planning quality was assessed by quantitative comparison with the baseline approaches, gold standard (GS) and qualitative evaluation by expert surgeons. In experiments with human model datasets, our approach was capable of achieving over 5% higher safety rates compared to baseline approaches, even under realistic observation noise. To the best of our knowledge, SafeRPlan is the first safety-aware DRL planning approach specifically designed for robotic spine surgery.

7.
Sci Rep ; 14(1): 21850, 2024 Sep 19.
Article in English | MEDLINE | ID: mdl-39300104

ABSTRACT

Task scheduling problem (TSP) is huge challenge in cloud computing paradigm as number of tasks comes to cloud application platform vary from time to time and all the tasks consists of variable length, runtime capacities. All these tasks may generated from various heterogeneous resources which comes onto cloud console directly effects the performance of cloud paradigm with increase in makespan, energy consumption, resource costs. Traditional task scheduling algorithms cannot handle these type of complex workloads in cloud paradigm. Many authors developed Task Scheduling algorithms by using metaheuristic techniques, hybrid approaches but all these algorithms give near optimal solutions but still TSP is a highly challenging and dynamic scenario as it resembles NP hard problem. Therefore, to tackle the TSP in cloud computing paradigm and schedule the tasks in an effective way in cloud paradigm, we formulated Adaptive Task scheduler which segments all the tasks comes to cloud console as sub tasks and fed these to the scheduler which is modeled by Improved Asynchronous Advantage Actor Critic Algorithm(IA3C) to generate schedules. This scheduling process is carried out in two stages. In first stage, all incoming tasks are segmented as sub tasks. After segmentation, all these sub tasks according to their size, execution time, communication time are grouped together and fed to the (ATSIA3C) scheduler. In the second stage, it checks for the above said constraints and disperse them onto the corresponding suitable processing capacity VMs resided in datacenters. Proposed ATSIA3C is simulated on Cloudsim. Extensive simulations are conducted using both fabricated worklogs and as well as realtime supercomputing worklogs. Our proposed mechanism evaluated over baseline algorithms i.e. RATS-HM, AINN-BPSO, MOABCQ. From results it is evident that our proposed ATSIA3C outperforms existing task schedulers by improving makespan by 70.49%. Resource cost is improved by 77.42%. Energy Consumption is improved over compared algorithms 74.24% in multi cloud environment by proposed ATSIA3C.

8.
Artif Organs ; 2024 Sep 17.
Article in English | MEDLINE | ID: mdl-39289857

ABSTRACT

BACKGROUND: The improvement of controllers of left ventricular assist device (LVAD) technology supporting heart failure (HF) patients has enormous impact, given the high prevalence and mortality of HF in the population. The use of reinforcement learning for control applications in LVAD remains minimally explored. This work introduces a preload-based deep reinforcement learning control for LVAD based on the proximal policy optimization algorithm. METHODS: The deep reinforcement learning control is built upon data derived from a deterministic high-fidelity cardiorespiratory simulator exposed to variations of total blood volume, heart rate, systemic vascular resistance, pulmonary vascular resistance, right ventricular end-systolic elastance, and left ventricular end-systolic elastance, to replicate realistic inter- and intra-patient variability of patients with a severe HF supported by LVAD. The deep reinforcement learning control obtained in this work is trained to avoid ventricular suction and allow aortic valve opening by using left ventricular pressure signals: end-diastolic pressure, maximum pressure in the left ventricle (LV), and maximum pressure in the aorta. RESULTS: The results show controller obtained in this work, compared to the constant speed LVAD alternative, assures a more stable end-diastolic volume (EDV), with a standard deviation of 5 mL and 9 mL, respectively, and a higher degree of aortic flow, with an average flow of 1.1 L/min and 0.9 L/min, respectively. CONCLUSION: This work implements a deep reinforcement learning controller in a high-fidelity cardiorespiratory simulator, resulting in increases of flow through the aortic valve and increases of EDV stability, when compared to a constant speed LVAD strategy.

9.
Biomimetics (Basel) ; 9(9)2024 Sep 11.
Article in English | MEDLINE | ID: mdl-39329570

ABSTRACT

Biologically inspired jumping robots exhibit exceptional movement capabilities and can quickly overcome obstacles. However, the stability and accuracy of jumping movements are significantly compromised by rapid changes in posture. Here, we propose a stable jumping control algorithm for a locust-inspired jumping robot based on deep reinforcement learning. The algorithm utilizes a training framework comprising two neural network modules (actor network and critic network) to enhance training performance. The framework can control jumping by directly mapping the robot's observations (robot position and velocity, obstacle position, target position, etc.) to its joint torques. The control policy increases randomness and exploration by introducing an entropy term to the policy function. Moreover, we designed a stage incentive mechanism to adjust the reward function dynamically, thereby improving the robot's jumping stability and accuracy. We established a locus-inspired jumping robot platform and conducted a series of jumping experiments in simulation. The results indicate that the robot could perform smooth and non-flip jumps, with the error of the distance from the target remaining below 3%. The robot consumed 44.6% less energy to travel the same distance by jumping compared with walking. Additionally, the proposed algorithm exhibited a faster convergence rate and improved convergence effects compared with other classical algorithms.

10.
Sensors (Basel) ; 24(17)2024 Aug 28.
Article in English | MEDLINE | ID: mdl-39275469

ABSTRACT

Mobile Edge Computing (MEC) is crucial for reducing latency by bringing computational resources closer to the network edge, thereby enhancing the quality of services (QoS). However, the broad deployment of cloudlets poses challenges in efficient network slicing, particularly when traffic distribution is uneven. Therefore, these challenges include managing diverse resource requirements across widely distributed cloudlets, minimizing resource conflicts and delays, and maintaining service quality amid fluctuating request rates. Addressing this requires intelligent strategies to predict request types (common or urgent), assess resource needs, and allocate resources efficiently. Emerging technologies like edge computing and 5G with network slicing can handle delay-sensitive IoT requests rapidly, but a robust mechanism for real-time resource and utility optimization remains necessary. To address these challenges, we designed an end-to-end network slicing approach that predicts common and urgent user requests through T distribution. We formulated our problem as a multi-agent Markov decision process (MDP) and introduced a multi-agent soft actor-critic (MAgSAC) algorithm. This algorithm prevents the wastage of scarce resources by intelligently activating and deactivating virtual network function (VNF) instances, thereby balancing the allocation process. Our approach aims to optimize overall utility, balancing trade-offs between revenue, energy consumption costs, and latency. We evaluated our method, MAgSAC, through simulations, comparing it with the following six benchmark schemes: MAA3C, SACT, DDPG, S2Vec, Random, and Greedy. The results demonstrate that our approach, MAgSAC, optimizes utility by 30%, minimizes energy consumption costs by 12.4%, and reduces execution time by 21.7% compared to the closest related multi-agent approach named MAA3C.

11.
Neural Netw ; 180: 106741, 2024 Sep 20.
Article in English | MEDLINE | ID: mdl-39321563

ABSTRACT

State representations considerably accelerate learning speed and improve data efficiency for deep reinforcement learning (DRL), especially for visual tasks. Task-relevant state representations could focus on features relevant to the task, filter out irrelevant elements, and thus further improve performance. However, task-relevant representations are typically obtained through model-based DRL methods, which involves the challenging task of learning a transition function. Moreover, inaccuracies in the learned transition function can potentially lead to performance degradation and negatively impact the learning of the policy. In this paper, to address the above issue, we propose a novel method of explainable task-relevant state representation (ETrSR) for model-free DRL that is direct, robust, and without any requirement of learning of a transition model. More specifically, the proposed ETrSR first disentangles the features from the states based on the beta variational autoencoder (ß-VAE). Then, a reward prediction model is employed to bootstrap these features to be relevant to the task, and the explainable states can be obtained by decoding the task-related features. Finally, we validate our proposed method on the CarRacing environment and various tasks in the DeepMind control suite (DMC), which demonstrates the explainability for better understanding of the decision-making process and the outstanding performance of the proposed method even in environments with strong distractions.

12.
Phys Med ; 125: 104498, 2024 Sep.
Article in English | MEDLINE | ID: mdl-39163802

ABSTRACT

PURPOSE: The formulation and optimization of radiation therapy plans are complex and time-consuming processes that heavily rely on the expertise of medical physicists. Consequently, there is an urgent need for automated optimization methods. Recent advancements in reinforcement learning, particularly deep reinforcement learning (DRL), show great promise for automating radiotherapy planning. This review summarizes the current state of DRL applications in this field, evaluates their effectiveness, and identifies challenges and future directions. METHODS: A systematic search was conducted in Google Scholar, PubMed, IEEE Xplore, and Scopus using keywords such as "deep reinforcement learning", "radiation therapy", and "treatment planning". The extracted data were synthesized for an overview and critical analysis. RESULTS: The application of deep reinforcement learning in radiation therapy plan optimization can generally be divided into three categories: optimizing treatment planning parameters, directly optimizing machine parameters, and adaptive radiotherapy. From the perspective of disease sites, DRL has been applied to cervical cancer, prostate cancer, vestibular schwannoma, and lung cancer. Regarding types of radiation therapy, it has been used in HDRBT, IMRT, SBRT, VMAT, GK, and Cyberknife. CONCLUSIONS: Deep reinforcement learning technology has played a significant role in advancing the automated optimization of radiation therapy plans. However, there is still a considerable gap before it can be widely applied in clinical settings due to three main reasons: inefficiency, limited methods for quality assessment, and poor interpretability. To address these challenges, significant research opportunities exist in the future, such as constructing evaluators, parallelized training, and exploring continuous action spaces.


Subject(s)
Deep Learning , Radiotherapy Planning, Computer-Assisted , Radiotherapy Planning, Computer-Assisted/methods , Humans
13.
Sensors (Basel) ; 24(16)2024 Aug 06.
Article in English | MEDLINE | ID: mdl-39204780

ABSTRACT

A map of the environment is the basis for the robot's navigation. Multi-robot collaborative autonomous exploration allows for rapidly constructing maps of unknown environments, essential for application areas such as search and rescue missions. Traditional autonomous exploration methods are inefficient due to the repetitive exploration problem. For this reason, we propose a multi-robot autonomous exploration method based on the Transformer model. Our multi-agent deep reinforcement learning method includes a multi-agent learning method to effectively improve exploration efficiency. We conducted experiments comparing our proposed method with existing methods in a simulation environment, and the experimental results showed that our proposed method had a good performance and a specific generalization ability.

14.
Sensors (Basel) ; 24(16)2024 Aug 15.
Article in English | MEDLINE | ID: mdl-39204975

ABSTRACT

Time-sensitive networking (TSN) technologies have garnered attention for supporting time-sensitive communication services, with recent interest extending to the wireless domain. However, adapting TSN to wireless areas faces challenges due to the competitive channel utilization in IEEE 802.11, necessitating exclusive channels for low-latency services. Additionally, traditional TSN scheduling algorithms may cause significant transmission delays due to dynamic wireless characteristics, which must be addressed. This paper proposes a wireless TSN model of IEEE 802.11 networks for the exclusive channel access and a novel time-sensitive traffic scheduler, named the wireless intelligent scheduler (WISE), based on deep reinforcement learning. We designed a deep reinforcement learning (DRL) framework to learn the repetitive transmission patterns of time-sensitive traffic and address potential latency issues from changing wireless conditions. Within this framework, we identified the most suitable DRL model, presenting the WISE algorithm with the best performance. Experimental results indicate that the proposed mechanisms meet up to 99.9% under the various wireless communication scenarios. In addition, they show that the processing delay is successfully limited within the specific time requirements and the scalability of TSN streams is guaranteed by the proposed mechanisms.

15.
Sensors (Basel) ; 24(16)2024 Aug 20.
Article in English | MEDLINE | ID: mdl-39205064

ABSTRACT

This study proposes a method named Hybrid Heuristic Proximal Policy Optimization (HHPPO) to implement online 3D bin-packing tasks. Some heuristic algorithms for bin-packing and the Proximal Policy Optimization (PPO) algorithm of deep reinforcement learning are integrated to implement this method. In the heuristic algorithms for bin-packing, an extreme point priority sorting method is proposed to sort the generated extreme points according to their waste spaces to improve space utilization. In addition, a 3D grid representation of the space status of the container is used, and some partial support constraints are proposed to increase the possibilities for stacking objects and enhance overall space utilization. In the PPO algorithm, some heuristic algorithms are integrated, and the reward function and the action space of the policy network are designed so that the proposed method can effectively complete the online 3D bin-packing task. Some experimental results illustrate that the proposed method has good results in achieving online 3D bin-packing tasks in some simulation environments. In addition, an environment with image vision is constructed to show that the proposed method indeed enables an actual robot manipulator to successfully and effectively complete the bin-packing task in a real environment.

16.
Bioinspir Biomim ; 19(5)2024 Sep 03.
Article in English | MEDLINE | ID: mdl-39163889

ABSTRACT

Autonomous ocean-exploring vehicles have begun to take advantage of onboard sensor measurements of water properties such as salinity and temperature to locate oceanic features in real time. Such targeted sampling strategies enable more rapid study of ocean environments by actively steering towards areas of high scientific value. Inspired by the ability of aquatic animals to navigate via flow sensing, this work investigates hydrodynamic cues for accomplishing targeted sampling using a palm-sized robotic swimmer. As proof-of-concept analogy for tracking hydrothermal vent plumes in the ocean, the robot is tasked with locating the center of turbulent jet flows in a 13,000-liter water tank using data from onboard pressure sensors. To learn a navigation strategy, we first implemented RL on a simulated version of the robot navigating in proximity to turbulent jets. After training, the RL algorithm discovered an effective strategy for locating the jets by following transverse velocity gradients sensed by pressure sensors located on opposite sides of the robot. When implemented on the physical robot, this gradient following strategy enabled the robot to successfully locate the turbulent plumes at more than twice the rate of random searching. Additionally, we found that navigation performance improved as the distance between the pressure sensors increased, which can inform the design of distributed flow sensors in ocean robots. Our results demonstrate the effectiveness and limits of flow-based navigation for autonomously locating hydrodynamic features of interest.


Subject(s)
Biomimetics , Fishes , Hydrodynamics , Oceans and Seas , Robotics , Swimming , Robotics/instrumentation , Animals , Fishes/physiology , Biomimetics/methods , Biomimetics/instrumentation , Swimming/physiology , Water Movements , Algorithms , Equipment Design , Computer Simulation
17.
Entropy (Basel) ; 26(8)2024 Jul 30.
Article in English | MEDLINE | ID: mdl-39202118

ABSTRACT

With the popularity of the Internet and the increase in the level of information technology, cyber attacks have become an increasingly serious problem. They pose a great threat to the security of individuals, enterprises, and the state. This has made network intrusion detection technology critically important. In this paper, a malicious traffic detection model is constructed based on a decision tree classifier of entropy and a proximal policy optimisation algorithm (PPO) of deep reinforcement learning. Firstly, the decision tree idea in machine learning is used to make a preliminary classification judgement on the dataset based on the information entropy. The importance score of each feature in the classification work is calculated and the features with lower contributions are removed. Then, it is handed over to the PPO algorithm model for detection. An entropy regularity term is introduced in the process of the PPO algorithm update. Finally, the deep reinforcement learning algorithm is used to continuously train and update the parameters during the detection process, and finally, the detection model with higher accuracy is obtained. Experiments show that the binary classification accuracy of the malicious traffic detection model based on the deep reinforcement learning PPO algorithm can reach 99.17% under the CIC-IDS2017 dataset used in this paper.

18.
Risk Anal ; 2024 Aug 11.
Article in English | MEDLINE | ID: mdl-39128862

ABSTRACT

Urban flooding is among the costliest natural disasters worldwide. Timely and effective rescue path planning is crucial for minimizing loss of life and property. However, current research on path planning often fails to adequately consider the need to assess area risk uncertainties and bypass complex obstacles in flood rescue scenarios, presenting significant challenges for developing optimal rescue paths. This study proposes a deep reinforcement learning (RL) algorithm incorporating four main mechanisms to address these issues. Dual-priority experience replays and backtrack punishment mechanisms enhance the precise estimation of area risks. Concurrently, random noisy networks and dynamic exploration techniques encourage the agent to explore unknown areas in the environment, thereby improving sampling and optimizing strategies for bypassing complex obstacles. The study constructed multiple grid simulation scenarios based on real-world rescue operations in major urban flood disasters. These scenarios included uncertain risk values for all passable areas and an increased presence of complex elements, such as narrow passages, C-shaped barriers, and jagged paths, significantly raising the challenge of path planning. The comparative analysis demonstrated that only the proposed algorithm could bypass all obstacles and plan the optimal rescue path across nine scenarios. This research advances the theoretical progress for urban flood rescue path planning by extending the scale of scenarios to unprecedented levels. It also develops RL mechanisms adaptable to various extremely complex obstacles in path planning. Additionally, it provides methodological insights into artificial intelligence to enhance real-world risk management.

19.
Heliyon ; 10(14): e33944, 2024 Jul 30.
Article in English | MEDLINE | ID: mdl-39114005

ABSTRACT

It is challenging to accurately model the overall uncertainty of the power system when it is connected to large-scale intermittent generation sources such as wind and photovoltaic generation due to the inherent volatility, uncertainty, and indivisibility of renewable energy. Deep reinforcement learning (DRL) algorithms are introduced as a solution to avoid modeling the complex uncertainties and to adapt the fluctuation of uncertainty by interacting with the environment and using feedback to continuously improve their strategies. However, the large-scale nature and uncertainty of the system lead to the sparse reward problem and high-dimensional space issue in DRL. A hierarchical deep reinforcement learning (HDRL) scheme is designed to decompose the process of solving this problem into two stages, using the reinforcement learning (RL) agent in the global stage and the heuristic algorithm in the local stage to find optimal dispatching decisions for power systems under uncertainty. Simulation studies have shown that the proposed HDRL scheme is efficient in solving power system economic dispatch problems under both deterministic and uncertain scenarios thanks to its adaptation system uncertainty, and coping with the volatility of uncertain factors while significantly improving the speed of online decision-making.

20.
Front Robot AI ; 11: 1402846, 2024.
Article in English | MEDLINE | ID: mdl-39109322

ABSTRACT

Traditional spacecraft attitude control often relies heavily on the dimension and mass information of the spacecraft. In active debris removal scenarios, these characteristics cannot be known beforehand because the debris can take any shape or mass. Additionally, it is not possible to measure the mass of the combined system of satellite and debris object in orbit. Therefore, it is crucial to develop an adaptive satellite attitude control that can extract mass information about the satellite system from other measurements. The authors propose using deep reinforcement learning (DRL) algorithms, employing stacked observations to handle widely varying masses. The satellite is simulated in Basilisk software, and the control performance is assessed using Monte Carlo simulations. The results demonstrate the benefits of DRL with stacked observations compared to a classical proportional-integral-derivative (PID) controller for the spacecraft attitude control. The algorithm is able to adapt, especially in scenarios with changing physical properties.

SELECTION OF CITATIONS
SEARCH DETAIL