Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 167
Filtrar
1.
Sensors (Basel) ; 24(13)2024 Jul 03.
Artigo em Inglês | MEDLINE | ID: mdl-39001102

RESUMO

Visible light communication (VLC) is a promising complementary technology to its radio frequency (RF) counterpart to satisfy the high quality-of-service (QoS) requirements of intelligent vehicular communications by reusing LED street lights. In this paper, a hybrid handover scheme for vehicular VLC/RF communication networks is proposed to balance QoS and handover costs by considering the vertical handover and horizontal handover together judging from the mobile state of the vehicle. A Markov decision process (MDP) is formulated to describe this hybrid handover problem, with a cost function balancing the handover consumption, delay, and reliability. A value iteration algorithm was applied to solve the optimal handover policy. The simulation results demonstrated the performance of the proposed hybrid handover scheme in comparison to other benchmark schemes.

2.
Front Artif Intell ; 7: 1308031, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39026967

RESUMO

This study focuses on a rescue mission problem, particularly enabling agents/robots to navigate efficiently in unknown environments. Technological advances, including manufacturing, sensing, and communication systems, have raised interest in using robots or drones for rescue operations. Effective rescue operations require quick identification of changes in the environment and/or locating the victims/injuries as soon as possible. Several techniques have been developed in recent years for autonomy in rescue missions, including motion planning, adaptive control, and more recently, reinforcement learning techniques. These techniques rely on full knowledge of the environment or the availability of simulators that can represent real environments during rescue operations. However, in practice, agents might have little or no information about the environment or the number or locations of injuries, preventing/limiting the application of most existing techniques. This study provides a probabilistic/Bayesian representation of the unknown environment, which jointly models the stochasticity in the agent's navigation and the environment uncertainty into a vector called the belief state. This belief state allows offline learning of the optimal Bayesian policy in an unknown environment without the need for any real data/interactions, which guarantees taking actions that are optimal given all available information. To address the large size of belief space, deep reinforcement learning is developed for computing an approximate Bayesian planning policy. The numerical experiments using different maze problems demonstrate the high performance of the proposed policy.

3.
Artigo em Inglês | MEDLINE | ID: mdl-38836923

RESUMO

Forty percent of diabetics will develop chronic kidney disease (CKD) in their lifetimes. However, as many as 50% of these CKD cases may go undiagnosed. We developed screening recommendations stratified by age and previous test history for individuals with diagnosed diabetes and unknown proteinuria status by race and gender groups. To do this, we used a Partially Observed Markov Decision Process (POMDP) to identify whether a patient should be screened at every three-month interval from ages 30-85. Model inputs were drawn from nationally-representative datasets, the medical literature, and a microsimulation that integrates this information into group-specific disease progression rates. We implement the POMDP solution policy in the microsimulation to understand how this policy may impact health outcomes and generate an easily-implementable, non-belief-based approximate policy for easier clinical interpretability. We found that the status quo policy, which is to screen annually for all ages and races, is suboptimal for maximizing expected discounted future net monetary benefits (NMB). The POMDP policy suggests more frequent screening after age 40 in all race and gender groups, with screenings 2-4 times a year for ages 61-70. Black individuals are recommended for screening more frequently than their White counterparts. This policy would increase NMB from the status quo policy between $1,000 to  $8,000 per diabetic patient at a willingness-to-pay of $150,000 per quality-adjusted life year (QALY).

4.
Artigo em Inglês | MEDLINE | ID: mdl-38766899

RESUMO

The intrinsic stochasticity of patients' response to treatment is a major consideration for clinical decision-making in radiation therapy. Markov models are powerful tools to capture this stochasticity and render effective treatment decisions. This paper provides an overview of the Markov models for clinical decision analysis in radiation oncology. A comprehensive literature search was conducted within MEDLINE using PubMed, following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. Only studies published from 2000 to 2023 were considered. Selected publications were summarized in two categories: (i) studies that compare two (or more) fixed treatment policies using Monte Carlo simulation and (ii) studies that seek an optimal treatment policy through Markov Decision Processes (MDPs). Relevant to the scope of this study, 61 publications were selected for detailed review. The majority of these publications (n = 56) focused on comparative analysis of two or more fixed treatment policies using Monte Carlo simulation. Classifications based on cancer site, utility measures and the type of sensitivity analysis are presented. Five publications considered MDPs with the aim of computing an optimal treatment policy; a detailed statement of the analysis and results is provided for each work. As an extension of Markov model-based simulation analysis, MDP offers a flexible framework to identify an optimal treatment policy among a possibly large set of treatment policies. However, the applications of MDPs to oncological decision-making have been understudied, and the full capacity of this framework to render complex optimal treatment decisions warrants further consideration.

5.
J Neurosci ; 44(24)2024 Jun 12.
Artigo em Inglês | MEDLINE | ID: mdl-38670805

RESUMO

Reinforcement learning is a theoretical framework that describes how agents learn to select options that maximize rewards and minimize punishments over time. We often make choices, however, to obtain symbolic reinforcers (e.g., money, points) that are later exchanged for primary reinforcers (e.g., food, drink). Although symbolic reinforcers are ubiquitous in our daily lives, widely used in laboratory tasks because they can be motivating, mechanisms by which they become motivating are less understood. In the present study, we examined how monkeys learn to make choices that maximize fluid rewards through reinforcement with tokens. The question addressed here is how the value of a state, which is a function of multiple task features (e.g., the current number of accumulated tokens, choice options, task epoch, trials since the last delivery of primary reinforcer, etc.), drives value and affects motivation. We constructed a Markov decision process model that computes the value of task states given task features to then correlate with the motivational state of the animal. Fixation times, choice reaction times, and abort frequency were all significantly related to values of task states during the tokens task (n = 5 monkeys, three males and two females). Furthermore, the model makes predictions for how neural responses could change on a moment-by-moment basis relative to changes in the state value. Together, this task and model allow us to capture learning and behavior related to symbolic reinforcement.


Assuntos
Comportamento de Escolha , Macaca mulatta , Motivação , Reforço Psicológico , Recompensa , Animais , Motivação/fisiologia , Masculino , Comportamento de Escolha/fisiologia , Tempo de Reação/fisiologia , Cadeias de Markov , Feminino
6.
Catheter Cardiovasc Interv ; 104(1): 84-91, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38639136

RESUMO

Cardiovascular devices are essential for the treatment of cardiovascular diseases including cerebrovascular, coronary, valvular, congenital, peripheral vascular and arrhythmic diseases. The regulation and surveillance of vascular devices in real-world practice, however, presents challenges during each individual product's life cycle. Four examples illustrate recent challenges and questions regarding safety, appropriate use and efficacy arising from FDA approved devices used in real-world practice. We outline potential pathways wherein providers, regulators and payors could potentially provide high-quality cardiovascular care, identify safety signals, ensure equitable device access, and study potential issues with devices in real-world practice.


Assuntos
Aprovação de Equipamentos , Vigilância de Produtos Comercializados , Humanos , Estados Unidos , Fatores de Risco , Segurança do Paciente , United States Food and Drug Administration , Medição de Risco , Dispositivos de Acesso Vascular , Procedimentos Endovasculares/instrumentação , Procedimentos Endovasculares/efeitos adversos , Desenho de Equipamento , Doenças Cardiovasculares/terapia , Doenças Cardiovasculares/diagnóstico
7.
Sci Rep ; 14(1): 7207, 2024 Mar 26.
Artigo em Inglês | MEDLINE | ID: mdl-38531995

RESUMO

The innovative application of Crowd Intelligent Devices (CIDS) in edge networks has garnered attention due to the rapid development of artificial intelligence and computer technology. This application offers users more reliable and low-latency computing services through computation offloading technology. However, the dynamic nature of network terminals and the limited coverage of edge servers pose challenges, such as data loss and service interruption. Furthermore, the high-speed mobility of intelligent terminals in the dynamic edge network environment further complicates the design of computation offloading and service migration strategies. To address these challenges, this paper explores the computation offloading model of cluster intelligence collaboration in a heterogeneous network environment. This model involves multiple intelligences collaborating to provide computation offloading services for terminals. To accommodate various roles, a switching strategy of split-cluster group collaboration is introduced, assigning the cluster head, the alternate cluster head, and the ordinary user are assigned to a group with different functions. Additionally, the paper formulates the optimal offloading strategy for group smart terminals as a Markov decision process, taking into account factors such as user mobility, service delay, service accuracy, and migration cost. To implement this strategy, the paper utilizes the deep reinforcement learning-based CCSMS algorithm. Simulation results demonstrate that the proposed edge network service migration strategy, rooted in groupwise cluster collaboration, effectively mitigates interruption delay and enhances service migration efficiency.

8.
Physiol Meas ; 45(3)2024 Mar 21.
Artigo em Inglês | MEDLINE | ID: mdl-38430565

RESUMO

Objective. Unobtrusive long-term monitoring of cardiac parameters is important in a wide variety of clinical applications, such as the assesment of acute illness severity and unobtrusive sleep monitoring. Here we determined the accuracy and robustness of heartbeat detection by an accelerometer worn on the chest.Approach. We performed overnight recordings in 147 individuals (69 female, 78 male) referred to two sleep centers. Two methods for heartbeat detection in the acceleration signal were compared: one previously described approach, based on local periodicity, and a novel extended method incorporating maximumaposterioriestimation and a Markov decision process to approach an optimal solution.Main results. The maximumaposterioriestimation significantly improved performance, with a mean absolute error for the estimation of inter-beat intervals of only 3.5 ms, and 95% limits of agreement of -1.7 to +1.0 beats per minute for heartrate measurement. Performance held during posture changes and was only weakly affected by the presence of sleep disorders and demographic factors.Significance. The new method may enable the use of a chest-worn accelerometer in a variety of applications such as ambulatory sleep staging and in-patient monitoring.


Assuntos
Sono , Tórax , Humanos , Masculino , Feminino , Frequência Cardíaca , Monitorização Fisiológica , Acelerometria , Processamento de Sinais Assistido por Computador
9.
Artif Intell Med ; 149: 102806, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38462294

RESUMO

In this study, the start time of teleconsultations is optimized for the clinical departments of class A tertiary hospitals to improve service quality and efficiency. For this purpose, first, a general teleconsultation scheduling model is formulated. In the formulation, the number of services (NS) is one of the objectives because of demand intermittency and service mobility. Demand intermittency means that demand has zero size in several periods. Service mobility means that specialists move between clinical departments and the National Telemedicine Center of China to provide the service. For problem-solving, the general model is converted into a Markov decision process (MDP) by elaborately defining the state, action, and reward. To solve the MDP, deep reinforcement learning (DRL) is applied to overcome the problem of inaccurate transition probability. To reduce the dimensions of the state-action space, a semi-fixed policy is developed and applied to the deep Q network (DQN) to construct an algorithm of the DQN with a semi-fixed policy (DQN-S). For efficient fitting, an early stop strategy is applied in DQN-S training. To verify the effectiveness of the proposed scheduling model and the model solving method DQN-S, scheduling experiments are carried out based on actual data of teleconsultation demand arrivals and service arrangements. The results show that DQN-S can improve the quality and efficiency of teleconsultations by reducing 9%-41% of the demand average waiting time, 3%-42% of the number of services, and 3%-33% of the total cost of services.


Assuntos
Consulta Remota , Telemedicina , Reforço Psicológico , Algoritmos , China
10.
Math Biosci Eng ; 21(1): 1058-1081, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38303454

RESUMO

In this study, a car transfer planning system for parking lots was designed based on reinforcement learning. The car transfer planning system for parking lots is an intelligent parking management system that is designed by using reinforcement learning techniques. The system features autonomous decision-making, intelligent path planning and efficient resource utilization. And the problem is solved by constructing a Markov decision process and using a dynamic planning-based reinforcement learning algorithm. The system has the advantage of looking to the future and using reinforcement learning to maximize its expected returns. And this is in contrast to manual transfer planning which relies on traditional thinking. In the context of this paper on parking lots, the states of the two locations form a finite set. The system ultimately seeks to find a strategy that is beneficial to the long-term development of the operation. It aims to prioritize strategies that have positive impacts in the future, rather than those that are focused solely on short-term benefits. To evaluate strategies, as its basis the system relies on the expected return of a state from now to the future. This approach allows for a more comprehensive assessment of the potential outcomes and ensures the selection of strategies that align with long-term goals. Experimental results show that the system has high performance and robustness in the area of car transfer planning for parking lots. By using reinforcement learning techniques, parking lot management systems can make autonomous decisions and plan optimal paths to achieve efficient resource utilization and reduce parking time.

11.
Math Biosci Eng ; 21(1): 1445-1471, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38303472

RESUMO

With the rise of Industry 4.0, manufacturing is shifting towards customization and flexibility, presenting new challenges to meet rapidly evolving market and customer needs. To address these challenges, this paper suggests a novel approach to address flexible job shop scheduling problems (FJSPs) through reinforcement learning (RL). This method utilizes an actor-critic architecture that merges value-based and policy-based approaches. The actor generates deterministic policies, while the critic evaluates policies and guides the actor to achieve the most optimal policy. To construct the Markov decision process, a comprehensive feature set was utilized to accurately represent the system's state, and eight sets of actions were designed, inspired by traditional scheduling rules. The formulation of rewards indirectly measures the effectiveness of actions, promoting strategies that minimize job completion times and enhance adherence to scheduling constraints. The experimental evaluation conducted a thorough assessment of the proposed reinforcement learning framework through simulations on standard FJSP benchmarks, comparing the proposed method against several well-known heuristic scheduling rules, related RL algorithms and intelligent algorithms. The results indicate that the proposed method consistently outperforms traditional approaches and exhibits exceptional adaptability and efficiency, particularly in large-scale datasets.

12.
Sensors (Basel) ; 24(2)2024 Jan 07.
Artigo em Inglês | MEDLINE | ID: mdl-38257450

RESUMO

In heterogeneous wireless networked control systems (WNCSs), the age of information (AoI) of the actuation update and actuation update cost are important performance metrics. To reduce the monetary cost, the control system can wait for the availability of a WiFi network for the actuator and then conduct the update using a WiFi network in an opportunistic manner, but this leads to an increased AoI of the actuation update. In addition, since there are different AoI requirements according to the control priorities (i.e., robustness of AoI of the actuation update), these need to be considered when delivering the actuation update. To jointly consider the monetary cost and AoI with priority, this paper proposes a priority-aware actuation update scheme (PAUS) where the control system decides whether to deliver or delay the actuation update to the actuator. For the optimal decision, we formulate a Markov decision process model and derive the optimal policy based on Q-learning, which aims to maximize the average reward that implies the balance between the monetary cost and AoI with priority. Simulation results demonstrate that the PAUS outperforms the comparison schemes in terms of the average reward under various settings.

13.
Neural Netw ; 169: 778-792, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38000180

RESUMO

With the development of artificial intelligence, robots are widely used in various fields, grasping detection has been the focus of intelligent robot research. A dual manipulator grasping detection model based on Markov decision process is proposed to realize the stable grasping with complex multiple objects in this paper. Based on the principle of Markov decision process, the cross entropy convolutional neural network and full convolutional neural network are used to parameterize the grasping detection model of dual manipulators which are two-finger manipulator and vacuum sucker manipulator for multi-objective unknown objects. The data set generated in the simulated environment is used to train the two grasping detection networks. By comparing the grasping quality of the detection network output the best grasping by the two grasping methods, the network with better detection effect corresponding to the two grasping methods of two-finger and vacuum sucker is determined, and the dual manipulator grasping detection model is constructed in this paper. Robot grasping experiments are carried out, and the experimental results show that the proposed dual manipulator grasping detection method achieves 90.6% success rate, which is much higher than the other groups of experiments. The feasibility and superiority of the dual manipulator grasping detection method based on Markov decision process are verified.


Assuntos
Inteligência Artificial , Redes Neurais de Computação , Dedos , Extremidade Superior , Força da Mão
14.
Psychol Sport Exerc ; 70: 102543, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-37778404

RESUMO

Expert performers in time constrained sports use a range of information sources to facilitate anticipatory and decision-making processes. However, research has often focused on responders such as batters, goalkeepers, defenders, and returners of serve, and failed to capture the complex interaction between opponents, where responders can also manipulate probabilities in their favour. This investigation aimed to explore the interaction between top order batters and fast or medium paced bowlers in cricket and the information they use to inform their anticipatory and decision-making skills in Twenty20 competition. Eleven professional cricketers were interviewed (8 batters and 3 bowlers) using semi-structured questions and scenarios from Twenty20 matches. An inductive and deductive thematic analysis was conducted using the overarching themes of Situation Awareness (SA) and Option Awareness (OA). Within SA, the sub-themes identified related to information sources used by bowlers and batters (i.e., stable contextual information, dynamic contextual information, kinematic information). Within OA, the sub-themes identified highlighted how cricketers use these information sources to understand the options available and the likelihood of success associated with each option (e.g., risk and reward, personal strengths). A sub-theme of 'responder manipulation' was also identified within OA to provide insight into how batters and bowlers interact in a cat-and-mouse like manner to generate options that manipulate one another throughout the competition. A schematic has been developed based on the study findings to illustrate the complex interaction between the anticipation and decision-making processes of professional top order batters and fast or medium paced bowlers in Twenty20 cricket.


Assuntos
Críquete , Esportes , Humanos , Fenômenos Biomecânicos , Probabilidade , Logro
15.
BMC Med ; 21(1): 359, 2023 09 19.
Artigo em Inglês | MEDLINE | ID: mdl-37726729

RESUMO

BACKGROUND: During the COVID-19 pandemic, a variety of clinical decision support systems (CDSS) were developed to aid patient triage. However, research focusing on the interaction between decision support systems and human experts is lacking. METHODS: Thirty-two physicians were recruited to rate the survival probability of 59 critically ill patients by means of chart review. Subsequently, one of two artificial intelligence systems advised the physician of a computed survival probability. However, only one of these systems explained the reasons behind its decision-making. In the third step, physicians reviewed the chart once again to determine the final survival probability rating. We hypothesized that an explaining system would exhibit a higher impact on the physicians' second rating (i.e., higher weight-on-advice). RESULTS: The survival probability rating given by the physician after receiving advice from the clinical decision support system was a median of 4 percentage points closer to the advice than the initial rating. Weight-on-advice was not significantly different (p = 0.115) between the two systems (with vs without explanation for its decision). Additionally, weight-on-advice showed no difference according to time of day or between board-qualified and not yet board-qualified physicians. Self-reported post-experiment overall trust was awarded a median of 4 out of 10 points. When asked after the conclusion of the experiment, overall trust was 5.5/10 (non-explaining median 4 (IQR 3.5-5.5), explaining median 7 (IQR 5.5-7.5), p = 0.007). CONCLUSIONS: Although overall trust in the models was low, the median (IQR) weight-on-advice was high (0.33 (0.0-0.56)) and in line with published literature on expert advice. In contrast to the hypothesis, weight-on-advice was comparable between the explaining and non-explaining systems. In 30% of cases, weight-on-advice was 0, meaning the physician did not change their rating. The median of the remaining weight-on-advice values was 50%, suggesting that physicians either dismissed the recommendation or employed a "meeting halfway" approach. Newer technologies, such as clinical reasoning systems, may be able to augment the decision process rather than simply presenting unexplained bias.


Assuntos
COVID-19 , Sistemas de Apoio a Decisões Clínicas , Humanos , Inteligência Artificial , COVID-19/diagnóstico , Pandemias , Triagem
16.
Psychol Sport Exerc ; 67: 102439, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-37665892

RESUMO

The ability to make effective decisions is an important function of any football coach, whether during training, team selection, match-day performance or post-match player evaluation. It is not yet known how elite Australian football coaches make decisions during matches, in time-constrained but well-resourced environments. This study is the first to explore the decision-making of elite Australian football coaches during matches, in pursuit of identifying opportunities to improve the translation and implementation of research findings into the competitive match environment. Using semi-structured interviews and thematic analysis, a six-stage framework of the decision-making of elite Australian football coaches during matches was developed. The stages include (1) Opportunity trigger, (2) Understand the opportunity, (3) Determine the need for action, (4) Explore options, (5) Take action and (6) Evaluate the decision. Coaches relied on subjective and objective sources of information and consulted with assistant coaches, performance analysts, and sport scientists. The findings enable researchers to ensure future interventions to improve decision-making during matches are well integrated. They also provide an opportunity for coaches to reflect on their own decision-making process, identifying targeted areas for improvement in their own practice.


Assuntos
Utensílios Domésticos , Médicos , Humanos , Austrália , Esportes de Equipe
17.
Drug Discov Today ; 28(10): 103734, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37572999

RESUMO

Effective portfolio management is crucial for innovation and sustaining revenue in pharmaceutical companies. This article holistically reviews trends, challenges, and approaches to pharmaceutical portfolio management and focuses, in particular, on cognitive biases in portfolio decision-making. Portfolio managers strongly rely on external innovation and face increasing competitive pressure and portfolio complexity. The ability to address biases and make robust decisions remains a challenge. Portfolio management practitioners most commonly face confirmation bias, champion bias, or misaligned incentives, which they seek to mitigate through expert input, team diversity, and rewarding truth-seeking. Ultimately, highest-quality portfolio management decision-making could be enabled by three factors: high-quality data, structured review processes, and comprehensive mitigating measures against biases in decision-making.


Assuntos
Cognição , Tomada de Decisões , Viés , Preparações Farmacêuticas
18.
Front Neurorobot ; 17: 1039644, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37483541

RESUMO

This paper proposes a self-learning Monte Carlo tree search algorithm (SL-MCTS), which has the ability to continuously improve its problem-solving ability in single-player scenarios. SL-MCTS combines the MCTS algorithm with a two-branch neural network (PV-Network). The MCTS architecture can balance the search for exploration and exploitation. PV-Network replaces the rollout process of MCTS and predicts the promising search direction and the value of nodes, which increases the MCTS convergence speed and search efficiency. The paper proposes an effective method to assess the trajectory of the current model during the self-learning process by comparing the performance of the current model with that of its best-performing historical model. Additionally, this method can encourage SL-MCTS to generate optimal solutions during the self-learning process. We evaluate the performance of SL-MCTS on the robot path planning scenario. The experimental results show that the performance of SL-MCTS is far superior to the traditional MCTS and single-player MCTS algorithms in terms of path quality and time consumption, especially its time consumption is half less than that of the traditional MCTS algorithms. SL-MCTS also performs comparably to other iterative-based search algorithms designed specifically for path planning tasks.

19.
Sensors (Basel) ; 23(10)2023 May 17.
Artigo em Inglês | MEDLINE | ID: mdl-37430735

RESUMO

This paper investigates the problem of buffer-aided relay selection to achieve reliable and secure communications in a two-hop amplify-and-forward (AF) network with an eavesdropper. Due to the fading of wireless signals and the broadcast nature of wireless channels, transmitted signals over the network may be undecodable at the receiver end or have been eavesdropped by eavesdroppers. Most available buffer-aided relay selection schemes consider either reliability or security issues in wireless communications; rarely is work conducted on both reliability and security issues. This paper proposes a buffer-aided relay selection scheme based on deep Q-learning (DQL) that considers both reliability and security. By conducting Monte Carlo simulations, we then verify the reliability and security performances of the proposed scheme in terms of the connection outage probability (COP) and secrecy outage probability (SOP), respectively. The simulation results show that two-hop wireless relay network can achieve reliable and secure communications by using our proposed scheme. We also performed comparison experiments between our proposed scheme and two benchmark schemes. The comparison results indicate that our proposed scheme outperforms the max-ratio scheme in terms of the SOP.

20.
Accid Anal Prev ; 190: 107179, 2023 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-37385116

RESUMO

A large number of freeway accident disposals are well-recorded by accident reports and surveillance videos, but it is not easy to get the emergency experience reused from past recorded accidents. To reuse emergency experience for better emergency decision-making, this paper proposed a knowledge-based experience transfer method to transfer task-level freeway accident disposal experience via multi-agent reinforcement learning algorithm with policy distillation. First, the Markov decision process is used to simulate the emergency decision-making process of multi-type freeway accident scene at the task level. Then, an adaptive knowledge transfer method named policy distilled multi-agent deep deterministic policy gradient (PD-MADDPG) algorithm is proposed to reuse experience from past freeway accident records to current accidents for fast decision-making and optimal onsite disposal. The performance of the proposed algorithm is evaluated on instantiated cases of freeway accidents that occurred on the freeway in Shaanxi Province, China. Aside from achieving better emergency decisions performance than various typical decision-making methods, the result shows decision maker with transferred knowledge owns 65.22%, 11.37%, 9.23%, 7.76% and 1.71% higher average reward than those without in the five studied cases, respectively. Indicating that the emergency experience transferred from past accidents contributes to fast emergency decision-making and optimal accident onsite disposal.


Assuntos
Acidentes de Trânsito , Algoritmos , Humanos , China
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...