RESUMEN
This article develops a cooperative-critic learning-based secure tracking control (CLSTC) method for unknown nonlinear systems in the presence of multisensor faults. By introducing a low-pass filter, the sensor faults are transformed into "pseudo" actuator faults, and an augmented system that integrates the system state and the filter output is constructed. To reduce design costs, a joint neural network Luenberger observer (NNLO) structure is established by using neural network and input/output data of the system to identify unknown system dynamics and sensor faults online. To achieve the optimal secure tracking control, an augmented tracking system is formed by integrating the dynamics of tracking error, reference trajectory, and filter output. Then, a novel cost function is designed for the augmented tracking system, which employs the fault estimation and the discount factor. The Hamilton-Jacobi-Bellman equation is solved to obtain the CLSTC strategy through an adaptive critic structure with cooperative tuning laws. Besides, the Lyapunov stability theorem is utilized to prove that all signals of the closed-loop system converge to a small neighborhood of the equilibrium point. Simulation results demonstrate that the proposed control method has good fault tolerance performance and is suitable for solving secure control problems of nonlinear systems with various sensor faults.
RESUMEN
In the field of robot grasping detection, due to uncertain factors such as different shapes, distinct colors, diverse materials, and various poses, robot grasping has become very challenging. This article introduces a integrated robotic system designed to address the challenge of grasping numerous unknown objects within a scene from a set of α -channel images. We propose a lightweight and object-independent pixel-level generative adaptive residual depthwise separable convolutional neural network (GARDSCN) with an inference speed of around 28 ms, which can be applied to real-time grasping detection. It can effectively deal with the grasping detection of unknown objects with different shapes and poses in various scenes and overcome the limitations of current robot grasping technology. The proposed network achieves 98.88 % grasp detection accuracy on the Cornell dataset and 95.23 % on the Jacquard dataset. To further verify the validity, the grasping experiment is conducted on a physical robot Kinova Gen2, and the grasp success rate is 96.67 % in the single-object scene and 94.10 % in the multiobject cluttered scene.
RESUMEN
This study investigates pigeon-like flexible flapping wings, which are known for their low energy consumption, high flexibility, and lightweight design. However, such flexible flapping wing systems are prone to deformation and vibration during flight, leading to performance degradation. It is thus necessary to design a control method to effectively manage the vibration of flexible wings. This paper proposes an improved rigid finite element method (IRFE) to develop a dynamic visualization model of flexible flapping wings. Subsequently, an adaptive vibration controller was designed based on non-singular terminal sliding mode (NTSM) control and fuzzy neural network (FNN) in order to effectively solve the problems of system uncertainty and actuator failure. With the proposed control, stability of the closed loop system is achieved in the context of Lyapunov's stability theory. At last, a joint simulation using MapleSim and MATLAB/Simulink was conducted to verify the effectiveness and robustness of the proposed controller in terms of trajectory tracking and vibration suppression.The obtained results have demonstrated great practical value of the proposed method in both military (low-altitude reconnaissance, urban operations, and accurate delivery, etc.) and civil (field research, monitoring, and relief for disasters, etc.) applications.
RESUMEN
Autonomous race driving poses a complex control challenge as vehicles must be operated at the edge of their handling limits to reduce lap times while respecting physical and safety constraints. This paper presents a novel reinforcement learning (RL)-based approach, incorporating the action mapping (AM) mechanism to manage state-dependent input constraints arising from limited tire-road friction. A numerical approximation method is proposed to implement AM, addressing the complex dynamics associated with the friction constraints. The AM mechanism also allows the learned driving policy to be generalized to different friction conditions. Experimental results in our developed race simulator demonstrate that the proposed AM-RL approach achieves superior lap times and better success rates compared to the conventional RL-based approaches. The generalization capability of driving policy with AM is also validated in the experiments.
RESUMEN
Vision-and-language navigation requires an agent to navigate in a photo-realistic environment by following natural language instructions. Mainstream methods employ imitation learning (IL) to let the agent imitate the behavior of the teacher. The trained model will overfit the teacher's biased behavior, resulting in poor model generalization. Recently, researchers have sought to combine IL and reinforcement learning (RL) to overcome overfitting and enhance model generalization. However, these methods still face the problem of expensive trajectory annotation. We propose a hierarchical RL-based method-discovering intrinsic subgoals via hierarchical (DISH) RL-which overcomes the generalization limitations of current methods and gets rid of expensive label annotations. First, the high-level agent (manager) decomposes the complex navigation problem into simple intrinsic subgoals. Then, the low-level agent (worker) uses an intrinsic subgoal-driven attention mechanism for action prediction in a smaller state space. We place no constraints on the semantics that subgoals may convey, allowing the agent to autonomously learn intrinsic, more generalizable subgoals from navigation tasks. Furthermore, we design a novel history-aware discriminator (HAD) for the worker. The discriminator incorporates historical information into subgoal discrimination and provides the worker with additional intrinsic rewards to alleviate the reward sparsity. Without labeled actions, our method provides supervision for the worker in the form of self-supervision by generating subgoals from the manager. The final results of multiple comparison experiments on the Room-to-Room (R2R) dataset show that our DISH can significantly outperform the baseline in accuracy and efficiency.
RESUMEN
Recent years have witnessed numerous technical breakthroughs in connected and autonomous vehicles (CAVs). On the one hand, these breakthroughs have significantly advanced the development of intelligent transportation systems (ITSs); on the other hand, these new traffic participants introduce more complex and uncertain elements to ITSs from the social space. Digital twins (DTs) provide real-time, data-driven, precise modeling for constructing the digital mapping of physical-world ITSs. Meanwhile, the metaverse integrates emerging technologies such as virtual reality/mixed reality, artificial intelligence, and DTs to model and explore how to realize improved sustainability, increased efficiency, and enhanced safety. More recently, as a leading effort toward general artificial intelligence, the concept of foundation model was proposed and has achieved significant success, showing great potential to lay the cornerstone for diverse artificial intelligence applications across different domains. In this article, we explore the big models embodied foundation intelligence for parallel driving in cyber-physical-social spaces, which integrate metaverse and DTs to construct a parallel training space for CAVs, and present a comprehensive elucidation of the crucial characteristics and operational mechanisms. Beyond providing the infrastructure and foundation intelligence of big models for parallel driving, this article also discusses future trends and potential research directions, and the "6S" goals of parallel driving.
RESUMEN
The structurally sensitive amide II infrared (IR) bands of proteins provide valuable information about the hydrogen bonding of protein secondary structures, which is crucial for understanding protein dynamics and associated functions. However, deciphering protein structures from experimental amide II spectra relies on time-consuming quantum chemical calculations on tens of thousands of representative configurations in solvent water. Currently, the accurate simulation of amide II spectra for whole proteins remains a challenge. Here, we present a machine learning (ML)-based protocol designed to efficiently simulate the amide II IR spectra of various proteins with an accuracy comparable to experimental results. This protocol stands out as a cost-effective and efficient alternative for studying protein dynamics, including the identification of secondary structures and monitoring the dynamics of protein hydrogen bonding under different pH conditions and during protein folding process. Our method provides a valuable tool in the field of protein research, focusing on the study of dynamic properties of proteins, especially those related to hydrogen bonding, using amide II IR spectroscopy.
Asunto(s)
Amidas , Inteligencia Artificial , Amidas/química , Enlace de Hidrógeno , Espectrofotometría Infrarroja/métodos , Proteínas/químicaRESUMEN
In this article, a reinforcement learning (RL)-based strategy for unmanned surface vehicle (USV) path following control is developed. The proposed method learns integrated guidance and heading control policy, which directly maps the USV's navigation states to motor control commands. By introducing a twin-critic design and an integral compensator to the conventional deep deterministic policy gradient (DDPG) algorithm, the tracking accuracy and robustness of the controller can be significantly improved. Moreover, a pretrained neural network-based USV model is built to help the learning algorithm efficiently deal with unknown nonlinear dynamics. The self-learning and path following capabilities of the proposed method were validated in both simulations and real sea experiments. The results show that our control policy can achieve better performance than a traditional cascade control policy and a DDPG-based control policy.
RESUMEN
Gastric cancer is a deadly disease and gastric polyps are at high risk of becoming cancerous. Therefore, the timely detection of gastric polyp is of great importance which can reduce the incidence of gastric cancer effectively. At present, the object detection method based on deep learning is widely used in medical images. However, as the contrast between the background and the polyps is not strong in gastroscopic image, it is difficult to distinguish various sizes of polyps from the background. In this paper, to improve the detection performance metrics of endoscopic gastric polyps, we propose an improved attentional feature fusion module. First, in order to enhance the contrast between the background and the polyps, we propose an attention module that enables the network to make full use of the target location information, it can suppress the interference of the background information and highlight the effective features. Therefore, on the basis of accurate positioning, it can focus on detecting whether the current location is the gastric polyp or background. Then, it is combined with our feature fusion module to form a new attentional feature fusion model that can mitigate the effects caused by semantic differences in the processing of feature fusion, using multi-scale fusion information to obtain more accurate attention weights and improve the detection performance of polyps of different sizes. In this work, we conduct experiments on our own dataset of gastric polyps. Experimental results show that the proposed attentional feature fusion module is better than the common feature fusion module and can improve the situation where polyps are missed or misdetected.
Asunto(s)
Neoplasias Gástricas , Humanos , Neoplasias Gástricas/diagnóstico por imagenRESUMEN
This article studies the detection of discontinuous false data-injection (FDI) attacks on cyber-physical systems (CPSs). Considering the unknown stochastic properties of the process noise and measurement noise, deep reinforcement learning is applied to designing an FDI attack detector. First, the discontinuous attack detection problem is modeled as a partially observable Markov decision process (POMDP) and a neural network is used to explore the POMDP. In the network, sliding observation windows which are composed of the offline fragment historical data are used as the input. An approach to designing the reward in POMDP is provided to ensure the precision of the detection when there are even some state recognition errors. Second, sufficient conditions on attack frequency and duration to guarantee the applicability of the detector and the expected estimation performance are further given. Finally, simulation examples illustrate the effectiveness of the attack detector.
RESUMEN
This paper investigates visual navigation and control of a cooperative unmanned surface vehicle (USV)-unmanned aerial vehicle (UAV) system for marine search and rescue. First, a deep learning-based visual detection architecture is developed to extract positional information from the images taken by the UAV. With specially designed convolutional layers and spatial softmax layers, the visual positioning accuracy and computational efficiency are improved. Next, a reinforcement learning-based USV control strategy is proposed, which could learn a motion control policy with an enhanced ability to reject wave disturbances. The simulation experiment results show that the proposed visual navigation architecture can provide stable and accurate position and heading angle estimation in different weather and lighting conditions. The trained control policy also demonstrates satisfactory USV control ability under wave disturbances.
RESUMEN
Underwater dynamic target tracking technology has a wide application prospect in marine resource exploration, underwater engineering operations, naval battlefield monitoring, and underwater precision guidance. Aiming at the underwater dynamic target tracking problem, an autonomous underwater vehicle tracking control method based on trajectory prediction is studied. First, a deep learning-based target detection algorithm is developed. For the image collected by the multibeam forward-looking sonar image, this algorithm uses the YOLO v3 network to determine the target in a sonar image and obtain the position of the target. Then, a time profit Elman neural network (TPENN) is constructed to predict the trajectory information of the dynamic target. Compared with an ordinary Elman neural network, its accuracy of dynamic target prediction is increased. Finally, underwater tracking of the dynamic target is realized using the model predictive controller (MPC), and the tracking result is stable and reliable. Through simulations and experiment, the proposed underwater dynamic target tracking control method is demonstrated to be effective and feasible.
RESUMEN
Reinforcement learning (RL) plays an essential role in the field of artificial intelligence but suffers from data inefficiency and model-shift issues. One possible solution to deal with such issues is to exploit transfer learning. However, interpretability problems and negative transfer may occur without explainable models. In this article, we define Relation Transfer as explainable and transferable learning based on graphical model representations, inferring the skeleton and relations among variables in a causal view and generalizing to the target domain. The proposed algorithm consists of the following three steps. First, we leverage a suitable casual discovery method to identify the causal graph based on the augmented source domain data. After that, we make inferences on the target model based on the prior causal knowledge. Finally, offline RL training on the target model is utilized as prior knowledge to improve the policy training in the target domain. The proposed method can answer the question of what to transfer and realize zero-shot transfer across related domains in a principled way. To demonstrate the robustness of the proposed framework, we conduct experiments on four classical control problems as well as one simulation to the real-world application. Experimental results on both continuous and discrete cases demonstrate the efficacy of the proposed method.
RESUMEN
Existing approaches to constrained-input optimal control problems mainly focus on systems with input saturation, whereas other constraints, such as combined inequality constraints and state-dependent constraints, are seldom discussed. In this article, a reinforcement learning (RL)-based algorithm is developed for constrained-input optimal control of discrete-time (DT) systems. The deterministic policy gradient (DPG) is introduced to iteratively search the optimal solution to the Hamilton-Jacobi-Bellman (HJB) equation. To deal with input constraints, an action mapping (AM) mechanism is proposed. The objective of this mechanism is to transform the exploration space from the subspace generated by the given inequality constraints to the standard Cartesian product space, which can be searched effectively by existing algorithms. By using the proposed architecture, the learned policy can output control signals satisfying the given constraints, and the original reward function can be kept unchanged. In our study, the convergence analysis is given. It is shown that the iterative algorithm is convergent to the optimal solution of the HJB equation. In addition, the continuity of the iterative estimated Q -function is investigated. Two numerical examples are provided to demonstrate the effectiveness of our approach.
RESUMEN
Due to the complexity of the ocean environment, an autonomous underwater vehicle (AUV) is disturbed by obstacles when performing tasks. Therefore, the research on underwater obstacle detection and avoidance is particularly important. Based on the images collected by a forward-looking sonar on an AUV, this article proposes an obstacle detection and avoidance algorithm. First, a deep learning-based obstacle candidate area detection algorithm is developed. This algorithm uses the You Only Look Once (YOLO) v3 network to determine obstacle candidate areas in a sonar image. Then, in the determined obstacle candidate areas, the obstacle detection algorithm based on the improved threshold segmentation algorithm is used to detect obstacles accurately. Finally, using the obstacle detection results obtained from the sonar images, an obstacle avoidance algorithm based on deep reinforcement learning (DRL) is developed to plan a reasonable obstacle avoidance path of an AUV. Experimental results show that the proposed algorithms improve obstacle detection accuracy and processing speed of sonar images. At the same time, the proposed algorithms ensure AUV navigation safety in a complex obstacle environment.
RESUMEN
This article presents a nearly optimal solution to the cooperative formation control problem for large-scale multiagent system (MAS). First, multigroup technique is widely used for the decomposition of the large-scale problem, but there is no consensus between different subgroups. Inspired by the hierarchical structure applied in the MAS, a hierarchical leader-following formation control structure with multigroup technique is constructed, where two layers and three types of agents are designed. Second, adaptive dynamic programming technique is conformed to the optimal formation control problem by the establishment of performance index function. Based on the traditional generalized policy iteration (PI) algorithm, the multistep generalized policy iteration (MsGPI) is developed with the modification of policy evaluation. The novel algorithm not only inherits the advantages of high convergence speed and low computational complexity in the generalized PI algorithm but also further accelerates the convergence speed and reduces run time. Besides, the stability analysis, convergence analysis, and optimality analysis are given for the proposed multistep PI algorithm. Afterward, a neural network-based actor-critic structure is built for approximating the iterative control policies and value functions. Finally, a large-scale formation control problem is provided to demonstrate the performance of our developed hierarchical leader-following formation control structure and MsGPI algorithm.
RESUMEN
Model-based reinforcement learning (RL) is regarded as a promising approach to tackle the challenges that hinder model-free RL. The success of model-based RL hinges critically on the quality of the predicted dynamic models. However, for many real-world tasks involving high-dimensional state spaces, current dynamics prediction models show poor performance in long-term prediction. To that end, we propose a novel two-branch neural network architecture with multi-timescale memory augmentation to handle long-term and short-term memory differently. Specifically, we follow previous works to introduce a recurrent neural network architecture to encode history observation sequences into latent space, characterizing the long-term memory of agents. Different from previous works, we view the most recent observations as the short-term memory of agents and employ them to directly reconstruct the next frame to avoid compounding error. This is achieved by introducing a self-supervised optical flow prediction structure to model the action-conditional feature transformation at pixel level. The reconstructed observation is finally augmented by the long-term memory to ensure semantic consistency. Experimental results show that our approach is able to generate visually-realistic long-term predictions in DeepMind maze navigation games, and outperforms the prevalent state-of-the-art methods in prediction accuracy by a large margin. Furthermore, we also evaluate the usefulness of our world model by using the predicted frames to drive an imagination-augmented exploration strategy to improve the model-free RL controller.
RESUMEN
Based on the reinforcement learning mechanism, a data-based scheme is proposed to address the optimal control problem of discrete-time non-linear switching systems. In contrast to conventional systems, in the switching systems, the control signal consists of the active mode (discrete) and the control inputs (continuous). First, the Hamilton-Jacobi-Bellman equation of the hybrid action space is derived, and a two-stage value iteration method is proposed to learn the optimal solution. In addition, a neural network structure is designed by decomposing the Q-function into the value function and the normalized advantage value function, which is quadratic with respect to the continuous control of subsystems. In this way, the Q-function and the continuous policy can be simultaneously updated at each iteration step so that the training of hybrid policies is simplified to a one-step manner. Moreover, the convergence analysis of the proposed algorithm with consideration of approximation error is provided. Finally, the algorithm is applied evaluated on three different simulation examples. Compared to the related work, the results demonstrate the potential of our method.
RESUMEN
In this article, a novel hybrid multirobot motion planner that can be applied under no explicit communication and local observable conditions is presented. The planner is model-free and can realize the end-to-end mapping of multirobot state and observation information to final smooth and continuous trajectories. The planner is a front-end and back-end separated architecture. The design of the front-end collaborative waypoints searching module is based on the multiagent soft actor-critic (MASAC) algorithm under the centralized training with decentralized execution (CTDE) diagram. The design of the back-end trajectory optimization module is based on the minimal snap method with safety zone constraints. This module can output the final dynamic-feasible and executable trajectories. Finally, multigroup experimental results verify the effectiveness of the proposed motion planner.
RESUMEN
This article investigates the co-design problem of adaptive event-triggered schemes (AETSs) and asynchronous fault detection filter (AFDF) for nonhomogeneous higher-level Markov jump systems, involving the hidden Markov model (HMM), higher-level Markov chain (MC), and conic-type nonlinearities. The transformation of the system transition probability can be reflected by the designed higher-level MC. An HMM with another conditional transition probability is applied to detect higher-level Markov processes and make the system be more practical. In order to balance the utilization of network resources and system performance, a novel AETS is proposed and used in the construction of the AFDF. By the Lyapunov theory, sufficient conditions are given to ensure the existences of the AETS and AFDF. It is not only an appropriate tradeoff between the utilization of network resources and system performance, but also reduces the conservatism. Finally, a numerical example is given to detect the faults effectively by the co-designed AFDF.