Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 37
Filtrar
1.
Sensors (Basel) ; 24(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38276391

RESUMO

In the research of robot systems, path planning and obstacle avoidance are important research directions, especially in unknown dynamic environments where flexibility and rapid decision makings are required. In this paper, a state attention network (SAN) was developed to extract features to represent the interaction between an intelligent robot and its obstacles. An auxiliary actor discriminator (AAD) was developed to calculate the probability of a collision. Goal-directed and gap-based navigation strategies were proposed to guide robotic exploration. The proposed policy was trained through simulated scenarios and updated by the Soft Actor-Critic (SAC) algorithm. The robot executed the action depending on the AAD output. Heuristic knowledge (HK) was developed to prevent blind exploration of the robot. Compared to other methods, adopting our approach in robot systems can help robots converge towards an optimal action strategy. Furthermore, it enables them to explore paths in unknown environments with fewer moving steps (showing a decrease of 33.9%) and achieve higher average rewards (showning an increase of 29.15%).

2.
Sensors (Basel) ; 23(2)2023 Jan 09.
Artigo em Inglês | MEDLINE | ID: mdl-36679561

RESUMO

Deep Reinforcement Learning (DRL) algorithms have been widely studied for sequential decision-making problems, and substantial progress has been achieved, especially in autonomous robotic skill learning. However, it is always difficult to deploy DRL methods in practical safety-critical robot systems, since the training and deployment environment gap always exists, and this issue would become increasingly crucial due to the ever-changing environment. Aiming at efficiently robotic skill transferring in a dynamic environment, we present a meta-reinforcement learning algorithm based on a variational information bottleneck. More specifically, during the meta-training stage, the variational information bottleneck first has been applied to infer the complete basic tasks for the whole task space, then the maximum entropy regularized reinforcement learning framework has been used to learn the basic skills consistent with that of basic tasks. Once the training stage is completed, all of the tasks in the task space can be obtained by a nonlinear combination of the basic tasks, thus, the according skills to accomplish the tasks can also be obtained by some way of a combination of the basic skills. Empirical results on several highly nonlinear, high-dimensional robotic locomotion tasks show that the proposed variational information bottleneck regularized deep reinforcement learning algorithm can improve sample efficiency by 200-5000 times on new tasks. Furthermore, the proposed algorithm achieves substantial asymptotic performance improvement. The results indicate that the proposed meta-reinforcement learning framework makes a significant step forward to deploy the DRL-based algorithm to practical robot systems.


Assuntos
Procedimentos Cirúrgicos Robóticos , Robótica , Robótica/métodos , Algoritmos , Aclimatação , Locomoção
3.
Sensors (Basel) ; 23(20)2023 Oct 23.
Artigo em Inglês | MEDLINE | ID: mdl-37896743

RESUMO

An end-to-end approach to autonomous navigation that is based on deep reinforcement learning (DRL) with a survival penalty function is proposed in this paper. Two actor-critic (AC) frameworks, namely, deep deterministic policy gradient (DDPG) and twin-delayed DDPG (TD3), are employed to enable a nonholonomic wheeled mobile robot (WMR) to perform navigation in dynamic environments containing obstacles and for which no maps are available. A comprehensive reward based on the survival penalty function is introduced; this approach effectively solves the sparse reward problem and enables the WMR to move toward its target. Consecutive episodes are connected to increase the cumulative penalty for scenarios involving obstacles; this method prevents training failure and enables the WMR to plan a collision-free path. Simulations are conducted for four scenarios-movement in an obstacle-free space, in a parking lot, at an intersection without and with a central obstacle, and in a multiple obstacle space-to demonstrate the efficiency and operational safety of our method. For the same navigation environment, compared with the DDPG algorithm, the TD3 algorithm exhibits faster numerical convergence and higher stability in the training phase, as well as a higher task execution success rate in the evaluation phase.

4.
Sensors (Basel) ; 23(5)2023 Feb 24.
Artigo em Inglês | MEDLINE | ID: mdl-36904758

RESUMO

A reconfigurable intelligent surface (RIS) is a development of conventional relay technology that can send a signal by reflecting the signal received from a transmitter to a receiver without additional power. RISs are a promising technology for future wireless communication due to their improvement of the quality of the received signal, energy efficiency, and power allocation. In addition, machine learning (ML) is widely used in many technologies because it can create machines that mimic human mindsets with mathematical algorithms without requiring direct human assistance. Meanwhile, it is necessary to implement a subfield of ML, reinforcement learning (RL), to automatically allow a machine to make decisions based on real-time conditions. However, few studies have provided comprehensive information related to RL algorithms-especially deep RL (DRL)-for RIS technology. Therefore, in this study, we provide an overview of RISs and an explanation of the operations and implementations of RL algorithms for optimizing the parameters of RIS technology. Optimizing the parameters of RISs can offer several benefits for communication systems, such as the maximization of the sum rate, user power allocation, and energy efficiency or the minimization of the information age. Finally, we highlight several issues to consider in implementing RL algorithms for RIS technology in wireless communications in the future and provide possible solutions.

5.
Sensors (Basel) ; 23(14)2023 Jul 20.
Artigo em Inglês | MEDLINE | ID: mdl-37514846

RESUMO

A proactive mobile network (PMN) is a novel architecture enabling extremely low-latency communication. This architecture employs an open-loop transmission mode that prohibits all real-time control feedback processes and employs virtual cell technology to allocate resources non-exclusively to users. However, such a design also results in significant potential user interference and worsens the communication's reliability. In this paper, we propose introducing multi-reconfigurable intelligent surface (RIS) technology into the downlink process of the PMN to increase the network's capacity against interference. Since the PMN environment is complex and time varying and accurate channel state information cannot be acquired in real time, it is challenging to manage RISs to service the PMN effectively. We begin by formulating an optimization problem for RIS phase shifts and reflection coefficients. Furthermore, motivated by recent developments in deep reinforcement learning (DRL), we propose an asynchronous advantage actor-critic (A3C)-based method for solving the problem by appropriately designing the action space, state space, and reward function. Simulation results indicate that deploying RISs within a region can significantly facilitate interference suppression. The proposed A3C-based scheme can achieve a higher capacity than baseline schemes and approach the upper limit as the number of RISs increases.

6.
Sensors (Basel) ; 23(18)2023 Sep 06.
Artigo em Inglês | MEDLINE | ID: mdl-37765765

RESUMO

The fifth generation achieved tremendous success, which brings high hopes for the next generation, as evidenced by the sixth generation (6G) key performance indicators, which include ultra-reliable low latency communication (URLLC), extremely high data rate, high energy and spectral efficiency, ultra-dense connectivity, integrated sensing and communication, and secure communication. Emerging technologies such as intelligent reflecting surface (IRS), unmanned aerial vehicles (UAVs), non-orthogonal multiple access (NOMA), and others have the ability to provide communications for massive users, high overhead, and computational complexity. This will address concerns over the outrageous 6G requirements. However, optimizing system functionality with these new technologies was found to be hard for conventional mathematical solutions. Therefore, using the ML algorithm and its derivatives could be the right solution. The present study aims to offer a thorough and organized overview of the various machine learning (ML), deep learning (DL), and reinforcement learning (RL) algorithms concerning the emerging 6G technologies. This study is motivated by the fact that there is a lack of research on the significance of these algorithms in this specific context. This study examines the potential of ML algorithms and their derivatives in optimizing emerging technologies to align with the visions and requirements of the 6G network. It is crucial in ushering in a new era of communication marked by substantial advancements and requires grand improvement. This study highlights potential challenges for wireless communications in 6G networks and suggests insights into possible ML algorithms and their derivatives as possible solutions. Finally, the survey concludes that integrating Ml algorithms and emerging technologies will play a vital role in developing 6G networks.

7.
Entropy (Basel) ; 25(2)2023 Feb 05.
Artigo em Inglês | MEDLINE | ID: mdl-36832665

RESUMO

This article offers an optimal control tracking method using an event-triggered technique and the internal reinforcement Q-learning (IrQL) algorithm to address the tracking control issue of unknown nonlinear systems with multiple agents (MASs). Relying on the internal reinforcement reward (IRR) formula, a Q-learning function is calculated, and then the iteration IRQL method is developed. In contrast to mechanisms triggered by time, an event-triggered algorithm reduces the rate of transmission and computational load, since the controller may only be upgraded when the predetermined triggering circumstances are met. In addition, in order to implement the suggested system, a neutral reinforce-critic-actor (RCA) network structure is created that may assess the indices of performance and online learning of the event-triggering mechanism. This strategy is intended to be data-driven without having in-depth knowledge of system dynamics. We must develop the event-triggered weight tuning rule, which only modifies the parameters of the actor neutral network (ANN) in response to triggering cases. In addition, a Lyapunov-based convergence study of the reinforce-critic-actor neutral network (NN) is presented. Lastly, an example demonstrates the accessibility and efficiency of the suggested approach.

8.
J Biomed Inform ; 128: 104049, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-35283266

RESUMO

Renal cell carcinoma (RCC) is one of the deadliest cancers and mainly consists of three subtypes: kidney clear cell carcinoma (KIRC), kidney papillary cell carcinoma (KIRP), and kidney chromophobe (KICH). Gene signature identification plays an important role in the precise classification of RCC subtypes and personalized treatment. However, most of the existing gene selection methods focus on statically selecting the same informative genes for each subtype, and fail to consider the heterogeneity of patients which causes pattern differences in each subtype. In this work, to explore different informative gene subsets for each subtype, we propose a novel gene selection method, named sequential reinforcement active feature learning (SRAFL), which dynamically acquire the different genes in each sample to identify the different gene signatures for each subtype. The proposed SRAFL method combines the cancer subtype classifier with the reinforcement learning (RL) agent, which sequentially select the active genes in each sample from three mixed RCC subtypes in a cost-sensitive manner. Moreover, the module-based gene filtering is run before gene selection to filter the redundant genes. We mainly evaluate the proposed SRAFL method based on mRNA and long non-coding RNA (lncRNA) expression profiles of RCC datasets from The Cancer Genome Atlas (TCGA). The experimental results demonstrate that the proposed method can automatically identify different gene signatures for different subtypes to accurately classify RCC subtypes. More importantly, we here for the first time show the proposed SRAFL method can consider the heterogeneity of samples to select different gene signatures for different RCC subtypes, which shows more potential for the precision-based RCC care in the future.


Assuntos
Carcinoma de Células Renais , Neoplasias Renais , Carcinoma de Células Renais/diagnóstico , Carcinoma de Células Renais/genética , Carcinoma de Células Renais/metabolismo , Genoma , Humanos , Neoplasias Renais/diagnóstico , Neoplasias Renais/genética , Neoplasias Renais/metabolismo , RNA Mensageiro
9.
Sensors (Basel) ; 22(3)2022 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-35161591

RESUMO

Using reinforcement learning (RL) for torque distribution of skid steering vehicles has attracted increasing attention recently. Various RL-based torque distribution methods have been proposed to deal with this classical vehicle control problem, achieving a better performance than traditional control methods. However, most RL-based methods focus only on improving the performance of skid steering vehicles, while actuator faults that may lead to unsafe conditions or catastrophic events are frequently omitted in existing control schemes. This study proposes a meta-RL-based fault-tolerant control (FTC) method to improve the tracking performance of vehicles in the case of actuator faults. Based on meta deep deterministic policy gradient (meta-DDPG), the proposed FTC method has a representative gradient-based metalearning algorithm workflow, which includes an offline stage and an online stage. In the offline stage, an experience replay buffer with various actuator faults is constructed to provide data for training the metatraining model; then, the metatrained model is used to develop an online meta-RL update method to quickly adapt its control policy to actuator fault conditions. Simulations of four scenarios demonstrate that the proposed FTC method can achieve a high performance and adapt to actuator fault conditions stably.

10.
Sensors (Basel) ; 22(14)2022 Jul 21.
Artigo em Inglês | MEDLINE | ID: mdl-35891130

RESUMO

In this paper, we present a design method for a wideband non-uniformly spaced linear array (NUSLA), with both symmetric and asymmetric geometries, using the modified reinforcement learning algorithm (MORELA). We designed a cost function that provided freedom to the beam pattern by setting limits only on the beam width (BW) and side-lobe level (SLL) in order to satisfy the desired BW and SLL in the wide band. We added the scan angle condition to the cost function to design the scanned beam pattern, as the ability to scan a beam in the desired direction is important in various applications. In order to prevent possible pointing angle errors for asymmetric NUSLA, we employed a penalty function to ensure the peak at the desired direction. Modified reinforcement learning algorithm (MORELA), which is a reinforcement learning-based algorithm used to determine a global optimum of the cost function, is applied to optimize the spacing and weights of the NUSLA by minimizing the proposed cost function. The performance of the proposed scheme was verified by comparing it with that of existing heuristic optimization algorithms via computer simulations.


Assuntos
Algoritmos , Reforço Psicológico , Simulação por Computador
11.
Sensors (Basel) ; 23(1)2022 Dec 22.
Artigo em Inglês | MEDLINE | ID: mdl-36616694

RESUMO

In this study, we propose a method to automatically find features from a dataset that are effective for classification or prediction, using a new method called multi-agent reinforcement learning and a guide agent. Each feature of the dataset has one of the main and guide agents, and these agents decide whether to select a feature. Main agents select the optimal features, and guide agents present the criteria for judging the main agents' actions. After obtaining the main and guide rewards for the features selected by the agents, the main agent that behaves differently from the guide agent updates their Q-values by calculating the learning reward delivered to the main agents. The behavior comparison helps the main agent decide whether its own behavior is correct, without using other algorithms. After performing this process for each episode, the features are finally selected. The feature selection method proposed in this study uses multiple agents, reducing the number of actions each agent can perform and finding optimal features effectively and quickly. Finally, comparative experimental results on multiple datasets show that the proposed method can select effective features for classification and increase classification accuracy.


Assuntos
Algoritmos , Aprendizagem , Recompensa
12.
Sensors (Basel) ; 22(14)2022 Jul 14.
Artigo em Inglês | MEDLINE | ID: mdl-35890943

RESUMO

Reinforcement learning (RL) with both exploration and exploit abilities is applied to games to demonstrate that it can surpass human performance. This paper mainly applies Deep Q-Network (DQN), which combines reinforcement learning and deep learning to the real-time action response of NS-SHAFT game with Cheat Engine as the API of game information autonomously. Based on a personal computer, we build an experimental learning environment that automatically captures the NS-SHAFT's frame, which is provided to DQN to decide the action of moving left, moving right, or stay in same location, survey different parameters: such as the sample frequency, different reward function, and batch size, etc. The experiment found that the relevant parameter settings have a certain degree of influence on the DQN learning effect. Moreover, we use Cheat Engine as the API of NS-SHAFT game information to locate the relevant values in the NS-SHAFT game, and then read the relevant values to achieve the operation of the overall experimental platform and the calculation of Reward. Accordingly, we successfully establish an instant learning environment and instant game training for the NS-SHAFT game.


Assuntos
Redes Neurais de Computação , Reforço Psicológico , Humanos , Recompensa
13.
Sensors (Basel) ; 22(15)2022 Jul 23.
Artigo em Inglês | MEDLINE | ID: mdl-35897994

RESUMO

The underwater wireless sensor network is an important component of the underwater three-dimensional monitoring system. Due to the high bit error rate, high delay, low bandwidth, limited energy, and high dynamic of underwater networks, it is very difficult to realize efficient and reliable data transmission. Therefore, this paper posits that it is not enough to design the routing algorithm only from the perspective of the transmission environment; the comprehensive design of the data transmission algorithm should also be combined with the application. An edge prediction-based adaptive data transmission algorithm (EP-ADTA) is proposed that can dynamically adapt to the needs of underwater monitoring applications and the changes in the transmission environment. EP-ADTA uses the end-edge-cloud architecture to define the underwater wireless sensor networks. The algorithm uses communication nodes as the agents, realizes the monitoring data prediction and compression according to the edge prediction, dynamically selects the transmission route, and controls the data transmission accuracy based on reinforcement learning. The simulation results show that EP-ADTA can meet the accuracy requirements of underwater monitoring applications, dynamically adapt to the changes in the transmission environment, and ensure efficient and reliable data transmission in underwater wireless sensor networks.

14.
Sensors (Basel) ; 22(3)2022 Jan 26.
Artigo em Inglês | MEDLINE | ID: mdl-35161688

RESUMO

The industrial manufacturing sector is undergoing a tremendous revolution moving from traditional production processes to intelligent techniques. Under this revolution, known as Industry 4.0 (I40), a robot is no longer static equipment but an active workforce to the factory production alongside human operators. Safety becomes crucial for humans and robots to ensure a smooth production run in such environments. The loss of operating moving robots in plant evacuation can be avoided with the adequate safety induction for them. Operators are subject to frequent safety inductions to react in emergencies, but very little is done for robots. Our research proposes an experimental safety response mechanism for a small manufacturing plant, through which an autonomous robot learns the obstacle-free trajectory to the closest safety exit in emergencies. We implement a reinforcement learning (RL) algorithm, Q-learning, to enable the path learning abilities of the robot. After obtaining the robot optimal path selection options with Q-learning, we code the outcome as a rule-based system for the safety response. We also program a speech recognition system for operators to react timeously, with a voice command, to an emergency that requires stopping all plant activities even when they are far away from the emergency stops (ESTOPs) button. An ESTOP or a voice command sent directly to the factory central controller can give the factory an emergency signal. We tested this functionality on real hardware from an S7-1200 Siemens programmable logic controller (PLC). We simulate a simple and small manufacturing environment overview to test our safety procedure. Our results show that the safety response mechanism successfully generates paths without obstacles to the closest safety exits from all the factory locations. Our research benefits any manufacturing SME intending to implement the initial and primary use of autonomous moving robots (AMR) in their factories. It also impacts manufacturing SMEs using legacy devices such as traditional PLCs by offering them intelligent strategies to incorporate current state-of-the-art technologies such as speech recognition to improve their performances. Our research empowers SMEs to adopt advanced and innovative technological concepts within their operations.


Assuntos
Robótica , Percepção da Fala , Algoritmos , Humanos , Indústrias
15.
Sensors (Basel) ; 21(13)2021 Jun 27.
Artigo em Inglês | MEDLINE | ID: mdl-34199075

RESUMO

The demand for bandwidth-intensive and delay-sensitive services is surging daily with the development of 5G technology, resulting in fierce competition for scarce radio resources. Power domain Nonorthogonal Multiple Access (NOMA) technologies can dramatically improve system capacity and spectrum efficiency. Unlike existing NOMA scheduling that mainly focuses on fairness, this paper proposes a power control solution for uplink hybrid OMA and PD-NOMA in dual dynamic environments: dynamic and imperfect channel information together with the random user-specific hierarchical quality of service (QoS). This paper models the power control problem as a nonconvex stochastic, which aims to maximize system energy efficiency while guaranteeing hierarchical user QoS requirements. Then, the problem is formulated as a partially observable Markov decision process (POMDP). Owing to the difficulty of modeling time-varying scenes, the urgency of fast convergency, the adaptability in a dynamic environment, and the continuity of the variables, a Deep Reinforcement Learning (DRL)-based method is proposed. This paper also transforms the hierarchical QoS constraint under the NOMA serial interference cancellation (SIC) scene to fit DRL. The simulation results verify the effectiveness and robustness of the proposed algorithm under a dual uncertain environment. As compared with the baseline Particle Swarm Optimization algorithm (PSO), the proposed DRL-based method has demonstrated satisfying performance.

16.
Sensors (Basel) ; 21(23)2021 Nov 27.
Artigo em Inglês | MEDLINE | ID: mdl-34883935

RESUMO

Modern radar jamming scenarios are complex and changeable. In order to improve the adaptability of frequency-agile radar under complex environmental conditions, reinforcement learning (RL) is introduced into the radar anti-jamming research. There are two aspects of the radar system that do not obey with the Markov decision process (MDP), which is the basic theory of RL: Firstly, the radar cannot confirm the interference rules of the jammer in advance, resulting in unclear environmental boundaries; secondly, the radar has frequency-agility characteristics, which does not meet the sequence change requirements of the MDP. As the existing RL algorithm is directly applied to the radar system, there would be problems, such as low sample utilization rate, poor computational efficiency and large error oscillation amplitude. In this paper, an adaptive frequency agile radar anti-jamming efficient RL model is proposed. First, a radar-jammer system model based on Markov game (MG) established, and the Nash equilibrium point determined and set as a dynamic environment boundary. Subsequently, the state and behavioral structure of RL model is improved to be suitable for processing frequency-agile data. Experiments that our proposal effectively the anti-jamming performance and efficiency of frequency-agile radar.

17.
Sensors (Basel) ; 21(17)2021 Sep 02.
Artigo em Inglês | MEDLINE | ID: mdl-34502796

RESUMO

External disturbance poses the primary threat to robot balance in dynamic environments. This paper provides a learning-based control architecture for quadrupedal self-balancing, which is adaptable to multiple unpredictable scenes of external continuous disturbance. Different from conventional methods which construct analytical models which explicitly reason the balancing process, our work utilized reinforcement learning and artificial neural network to avoid incomprehensible mathematical modeling. The control policy is composed of a neural network and a Tanh Gaussian policy, which implicitly establishes the fuzzy mapping from proprioceptive signals to action commands. During the training process, the maximum-entropy method (soft actor-critic algorithm) is employed to endow the policy with powerful exploration and generalization ability. The trained policy is validated in both simulations and realistic experiments with a customized quadruped robot. The results demonstrate that the policy can be easily transferred to the real world without elaborate configurations. Moreover, although this policy is trained in merely one specific vibration condition, it demonstrates robustness under conditions that were never encountered during training.


Assuntos
Redes Neurais de Computação , Reforço Psicológico , Algoritmos , Entropia , Aprendizagem
18.
Sensors (Basel) ; 20(17)2020 Aug 31.
Artigo em Inglês | MEDLINE | ID: mdl-32878089

RESUMO

Currently, many intelligent building energy management systems (BEMSs) are emerging for saving energy in new and existing buildings and realizing a sustainable society worldwide. However, installing an intelligent BEMS in existing buildings does not realize an innovative and advanced society because it only involves simple equipment replacement (i.e., replacement of old equipment or LED (Light Emitting Diode) lamps) and energy savings based on a stand-alone system. Therefore, artificial intelligence (AI) is applied to a BEMS to implement intelligent energy optimization based on the latest ICT (Information and Communications Technologies) technology. AI can analyze energy usage data, predict future energy requirements, and establish an appropriate energy saving policy. In this paper, we present a dynamic heating, ventilation, and air conditioning (HVAC) scheduling method that collects, analyzes, and infers energy usage data to intelligently save energy in buildings based on reinforcement learning (RL). In this regard, a hotel is used as the testbed in this study. The proposed method collects, analyzes, and infers IoT data from a building to provide an energy saving policy to realize a futuristic HVAC (heating system) system based on RL. Through this process, a purpose-oriented energy saving methodology to achieve energy saving goals is proposed.

19.
Artif Intell Med ; 154: 102901, 2024 Jun 04.
Artigo em Inglês | MEDLINE | ID: mdl-38838400

RESUMO

There is evidence that reducing modifiable risk factors and strengthening medical and health interventions can reduce early mortality and economic losses from non-communicable diseases (NCDs). Machine learning (ML) algorithms have been successfully applied to preventing and controlling NCDs. Reinforcement learning (RL) is the most promising of these approaches because of its ability to dynamically adapt interventions to NCD disease progression and its commitment to achieving long-term intervention goals. This paper reviews the preferred algorithms, data sources, design details, and obstacles to clinical application in existing studies to facilitate the early application of RL algorithms in clinical practice research for NCD interventions. We screened 40 relevant papers for quantitative and qualitative analysis using the PRISMA review flow diagram. The results show that researchers tend to use Deep Q-Network (DQN) and Actor-Critic as well as their improved or hybrid algorithms to train and validate RL models on retrospective datasets. Often, the patient's physical condition is the main defining parameter of the state space, while interventions are the main defining parameter of the action space. Mostly, changes in the patient's physical condition are used as a basis for immediate rewards to the agent. Various attempts have been made to address the challenges to clinical application, and several approaches have been proposed from existing research. However, as there is currently no universally accepted solution, the use of RL algorithms in clinical practice for NCD interventions necessitates more comprehensive responses to the issues addressed in this paper, which are safety, interpretability, training efficiency, and the technical aspect of exploitation and exploration in RL algorithms.

20.
Sci Rep ; 14(1): 6671, 2024 Mar 20.
Artigo em Inglês | MEDLINE | ID: mdl-38509163

RESUMO

The Internet era is an era of information explosion. By 2022, the global Internet users have reached more than 4 billion, and the social media users have exceeded 3 billion. People face a lot of news content every day, and it is almost impossible to get interesting information by browsing all the news content. Under this background, personalized news recommendation technology has been widely used, but it still needs to be further optimized and improved. In order to better push the news content of interest to different readers, users' satisfaction with major news websites should be further improved. This study proposes a new recommendation algorithm based on deep learning and reinforcement learning. Firstly, the RL algorithm is introduced based on deep learning. Deep learning is excellent in processing large-scale data and complex pattern recognition, but it often faces the challenge of low sample efficiency when it comes to complex decision-making and sequential tasks. While reinforcement learning (RL) emphasizes learning optimization strategies through continuous trial and error through interactive learning with the environment. Compared with deep learning, RL is more suitable for scenes that need long-term decision-making and trial-and-error learning. By feeding back the reward signal of the action, the system can better adapt to the unknown environment and complex tasks, which makes up for the relative shortcomings of deep learning in these aspects. A scenario is applied to an action to solve the sequential decision problem in the news dissemination process. In order to enable the news recommendation system to consider the dynamic changes in users' interest in news content, the Deep Deterministic Policy Gradient algorithm is applied to the news recommendation scenario. Opposing learning complements and combines Deep Q-network with the strategic network. On the basis of fully summarizing and thinking, this paper puts forward the mode of intelligent news dissemination and push. The push process of news communication information based on edge computing technology is proposed. Finally, based on Area Under Curve a Q-Leaning Area Under Curve for RL models is proposed. This indicator can measure the strengths and weaknesses of RL models efficiently and facilitates comparing models and evaluating offline experiments. The results show that the DDPG algorithm improves the click-through rate by 2.586% compared with the conventional recommendation algorithm. It shows that the algorithm designed in this paper has more obvious advantages in accurate recommendation by users. This paper effectively improves the efficiency of news dissemination by optimizing the push mode of intelligent news dissemination. In addition, the paper also deeply studies the innovative application of intelligent edge technology in news communication, which brings new ideas and practices to promote the development of news communication methods. Optimizing the push mode of intelligent news dissemination not only improves the user experience, but also provides strong support for the application of intelligent edge technology in this field, which has important practical application prospects.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa