Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 2.924
Filter
1.
Brief Bioinform ; 25(5)2024 Jul 25.
Article in English | MEDLINE | ID: mdl-39256196

ABSTRACT

Using amino acid residues in peptide generation has solved several key problems, including precise control of amino acid sequence order, customized peptides for property modification, and large-scale peptide synthesis. Proteins contain unknown amino acid residues. Extracting them for the synthesis of drug-like peptides can create novel structures with unique properties, driving drug development. Computer-aided design of novel peptide drug molecules can solve the high-cost and low-efficiency problems in the traditional drug discovery process. Previous studies faced limitations in enhancing the bioactivity and drug-likeness of polypeptide drugs due to less emphasis on the connection relationships in amino acid structures. Thus, we proposed a reinforcement learning-driven generation model based on graph attention mechanisms for peptide generation. By harnessing the advantages of graph attention mechanisms, this model effectively captured the connectivity structures between amino acid residues in peptides. Simultaneously, leveraging reinforcement learning's strength in guiding optimal sequence searches provided a novel approach to peptide design and optimization. This model introduces an actor-critic framework with real-time feedback loops to achieve dynamic balance between attributes, which can customize the generation of multiple peptides for specific targets and enhance the affinity between peptides and targets. Experimental results demonstrate that the generated drug-like peptides meet specified absorption, distribution, metabolism, excretion, and toxicity properties and bioactivity with a success rate of over 90$\%$, thereby significantly accelerating the process of drug-like peptide generation.


Subject(s)
Peptides , Peptides/chemistry , Amino Acid Sequence , Drug Discovery , Drug Design , Algorithms , Computer-Aided Design , Humans
2.
Neural Netw ; 180: 106675, 2024 Sep 02.
Article in English | MEDLINE | ID: mdl-39241435

ABSTRACT

The next basket recommendation task aims to predict the items in the user's next basket by modeling the user's basket sequence. Existing next basket recommendations focus on improving recommendation performance, and most of these methods are black-box models, ignoring the importance of providing explanations to improve user satisfaction. Furthermore, most next basket recommendation methods are designed for consumer users, and few methods are proposed for business user characteristics. To address the above problems, we propose a Knowledge Reinforced Explainable Next Basket Recommendation (KRE-NBR). Specifically, we construct a basket-based knowledge graph and obtain pretrained embeddings of entities that contain rich information of the knowledge graph. To obtain high-quality user predictive vectors, we fuse user pretrained embeddings, user basket sequence level embeddings, and user repurchase embeddings. One highlight of the user repurchase embeddings is that they are able to model business user repurchase behavior. To make the results of next basket recommendations explainable, we use reinforcement learning for path reasoning to find the items recommended in the next basket and generate recommendation explanations at the same time. To the best of our knowledge, this is the first work to provide recommendation explanations for next basket recommendations. Extensive experiments on real datasets show that the recommendation performance of our proposed approach outperforms several state-of-the-art baselines.

3.
Elife ; 132024 Sep 06.
Article in English | MEDLINE | ID: mdl-39240757

ABSTRACT

Theoretical computational models are widely used to describe latent cognitive processes. However, these models do not equally explain data across participants, with some individuals showing a bigger predictive gap than others. In the current study, we examined the use of theory-independent models, specifically recurrent neural networks (RNNs), to classify the source of a predictive gap in the observed data of a single individual. This approach aims to identify whether the low predictability of behavioral data is mainly due to noisy decision-making or misspecification of the theoretical model. First, we used computer simulation in the context of reinforcement learning to demonstrate that RNNs can be used to identify model misspecification in simulated agents with varying degrees of behavioral noise. Specifically, both prediction performance and the number of RNN training epochs (i.e., the point of early stopping) can be used to estimate the amount of stochasticity in the data. Second, we applied our approach to an empirical dataset where the actions of low IQ participants, compared with high IQ participants, showed lower predictability by a well-known theoretical model (i.e., Daw's hybrid model for the two-step task). Both the predictive gap and the point of early stopping of the RNN suggested that model misspecification is similar across individuals. This led us to a provisional conclusion that low IQ subjects are mostly noisier compared to their high IQ peers, rather than being more misspecified by the theoretical model. We discuss the implications and limitations of this approach, considering the growing literature in both theoretical and data-driven computational modeling in decision-making science.


Subject(s)
Choice Behavior , Neural Networks, Computer , Humans , Choice Behavior/physiology , Computer Simulation , Stochastic Processes , Reinforcement, Psychology , Male , Female , Decision Making/physiology , Adult , Young Adult
4.
J Imaging Inform Med ; 2024 Sep 09.
Article in English | MEDLINE | ID: mdl-39249582

ABSTRACT

PelviNet introduces a groundbreaking multi-agent convolutional network architecture tailored for enhancing pelvic image registration. This innovative framework leverages shared convolutional layers, enabling synchronized learning among agents and ensuring an exhaustive analysis of intricate 3D pelvic structures. The architecture combines max pooling, parametric ReLU activations, and agent-specific layers to optimize both individual and collective decision-making processes. A communication mechanism efficiently aggregates outputs from these shared layers, enabling agents to make well-informed decisions by harnessing combined intelligence. PelviNet's evaluation centers on both quantitative accuracy metrics and visual representations to elucidate agents' performance in pinpointing optimal landmarks. Empirical results demonstrate PelviNet's superiority over traditional methods, achieving an average image-wise error of 2.8 mm, a subject-wise error of 3.2 mm, and a mean Euclidean distance error of 3.0 mm. These quantitative results highlight the model's efficiency and precision in landmark identification, crucial for medical contexts such as radiation therapy, where exact landmark identification significantly influences treatment outcomes. By reliably identifying critical structures, PelviNet advances pelvic image analysis and offers potential enhancements for broader medical imaging applications, marking a significant step forward in computational healthcare.

5.
Mach Learn ; 113(7): 3961-3997, 2024 Jul.
Article in English | MEDLINE | ID: mdl-39221170

ABSTRACT

There is a growing interest in using reinforcement learning (RL) to personalize sequences of treatments in digital health to support users in adopting healthier behaviors. Such sequential decision-making problems involve decisions about when to treat and how to treat based on the user's context (e.g., prior activity level, location, etc.). Online RL is a promising datadriven approach for this problem as it learns based on each user's historical responses and uses that knowledge to personalize these decisions. However, to decide whether the RL algorithm should be included in an "optimized" intervention for real-world deployment, we must assess the data evidence indicating that the RL algorithm is actually personalizing the treatments to its users. Due to the stochasticity in the RL algorithm, one may get a false impression that it is learning in certain states and using this learning to provide specific treatments. We use a working definition of personalization and introduce a resampling-based methodology for investigating whether the personalization exhibited by the RL algorithm is an artifact of the RL algorithm stochasticity. We illustrate our methodology with a case study by analyzing the data from a physical activity clinical trial called HeartSteps, which included the use of an online RL algorithm. We demonstrate how our approach enhances data-driven truth-in-advertising of algorithm personalization both across all users as well as within specific users in the study.

6.
Biostatistics ; 2024 Sep 03.
Article in English | MEDLINE | ID: mdl-39226534

ABSTRACT

Major depressive disorder (MDD), a leading cause of years of life lived with disability, presents challenges in diagnosis and treatment due to its complex and heterogeneous nature. Emerging evidence indicates that reward processing abnormalities may serve as a behavioral marker for MDD. To measure reward processing, patients perform computer-based behavioral tasks that involve making choices or responding to stimulants that are associated with different outcomes, such as gains or losses in the laboratory. Reinforcement learning (RL) models are fitted to extract parameters that measure various aspects of reward processing (e.g. reward sensitivity) to characterize how patients make decisions in behavioral tasks. Recent findings suggest the inadequacy of characterizing reward learning solely based on a single RL model; instead, there may be a switching of decision-making processes between multiple strategies. An important scientific question is how the dynamics of strategies in decision-making affect the reward learning ability of individuals with MDD. Motivated by the probabilistic reward task within the Establishing Moderators and Biosignatures of Antidepressant Response in Clinical Care (EMBARC) study, we propose a novel RL-HMM (hidden Markov model) framework for analyzing reward-based decision-making. Our model accommodates decision-making strategy switching between two distinct approaches under an HMM: subjects making decisions based on the RL model or opting for random choices. We account for continuous RL state space and allow time-varying transition probabilities in the HMM. We introduce a computationally efficient Expectation-maximization (EM) algorithm for parameter estimation and use a nonparametric bootstrap for inference. Extensive simulation studies validate the finite-sample performance of our method. We apply our approach to the EMBARC study to show that MDD patients are less engaged in RL compared to the healthy controls, and engagement is associated with brain activities in the negative affect circuitry during an emotional conflict task.

7.
Neural Netw ; 179: 106552, 2024 Jul 17.
Article in English | MEDLINE | ID: mdl-39089154

ABSTRACT

Multi-agent reinforcement learning (MARL) effectively improves the learning speed of agents in sparse reward tasks with the guide of subgoals. However, existing works sever the consistency of the learning objectives of the subgoal generation and subgoal reached stages, thereby significantly inhibiting the effectiveness of subgoal learning. To address this problem, we propose a novel Potential field Subgoal-based Multi-Agent reinforcement learning (PSMA) method, which introduces the potential field (PF) to unify the two-stage learning objectives. Specifically, we design a state-to-PF representation model that describes agents' states as potential fields, allowing easy measurement of the interaction effect for both allied and enemy agents. With the PF representation, a subgoal selector is designed to automatically generate multiple subgoals for each agent, drawn from the experience replay buffer that contains both individual and total PF values. Based on the determined subgoals, we define an intrinsic reward function to guide the agent to reach their respective subgoals while maximizing the joint action-value. Experimental results show that our method outperforms the state-of-the-art MARL method on both StarCraft II micro-management (SMAC) and Google Research Football (GRF) tasks with sparse reward settings.

8.
Neural Netw ; 179: 106543, 2024 Jul 22.
Article in English | MEDLINE | ID: mdl-39089158

ABSTRACT

Recent successes in robot learning have significantly enhanced autonomous systems across a wide range of tasks. However, they are prone to generate similar or the same solutions, limiting the controllability of the robot to behave according to user intentions. These limited robot behaviors may lead to collisions and potential harm to humans. To resolve these limitations, we introduce a semi-autonomous teleoperation framework that enables users to operate a robot by selecting a high-level command, referred to as option. Our approach aims to provide effective and diverse options by a learned policy, thereby enhancing the efficiency of the proposed framework. In this work, we propose a quality-diversity (QD) based sampling method that simultaneously optimizes both the quality and diversity of options using reinforcement learning (RL). Additionally, we present a mixture of latent variable models to learn multiple policy distributions defined as options. In experiments, we show that the proposed method achieves superior performance in terms of the success rate and diversity of the options in simulation environments. We further demonstrate that our method outperforms manual keyboard control for time duration over cluttered real-world environments.

9.
Proc Biol Sci ; 291(2028): 20241141, 2024 Aug.
Article in English | MEDLINE | ID: mdl-39110908

ABSTRACT

Learning is a taxonomically widespread process by which animals change their behavioural responses to stimuli as a result of experience. In this way, it plays a crucial role in the development of individual behaviour and underpins substantial phenotypic variation within populations. Nevertheless, the impact of learning in social contexts on evolutionary change is not well understood. Here, we develop game theoretical models of competition for resources in small groups (e.g. producer-scrounger and hawk-dove games) in which actions are controlled by reinforcement learning and show that biases in the subjective valuation of different actions readily evolve. Moreover, in many cases, the convergence stable levels of bias exist at fitness minima and therefore lead to disruptive selection on learning rules and, potentially, to the evolution of genetic polymorphisms. Thus, we show how reinforcement learning in social contexts can be a driver of evolutionary diversification. In addition, we consider the evolution of ability in our games, showing that learning can also drive disruptive selection on the ability to perform a task.


Subject(s)
Biological Evolution , Competitive Behavior , Game Theory , Learning , Animals , Reinforcement, Psychology
10.
Heliyon ; 10(14): e34065, 2024 Jul 30.
Article in English | MEDLINE | ID: mdl-39108911

ABSTRACT

Synchronization in complex networks is a ubiquitous and important phenomenon with implications in various fields. Excessive synchronization may lead to undesired consequences, making desynchronization techniques essential. Exploiting the Proximal Policy Optimization algorithm, this work studies reinforcement learning-based pinning control strategies for synchronization suppression in global coupling networks and two types of irregular coupling networks: the Watts-Strogatz small-world networks and the Barabási-Albert scale-free networks. We investigate the impact of the ratio of controlled nodes and the role of key nodes selected by the LeaderRank algorithm on the performance of synchronization suppression. Numerical results demonstrate the effectiveness of the reinforcement learning-based pinning control strategy in different coupling schemes of the complex networks, revealing a critical ratio of the pinned nodes and the superior performance of a newly proposed hybrid pinning strategy. The results provide valuable insights for suppressing and optimizing network synchronization behavior efficiently.

11.
Neural Netw ; 179: 106565, 2024 Jul 22.
Article in English | MEDLINE | ID: mdl-39111159

ABSTRACT

In cooperative multi-agent reinforcement learning, agents jointly optimize a centralized value function based on the rewards shared by all agents and learn decentralized policies through value function decomposition. Although such a learning framework is considered effective, estimating individual contribution from the rewards, which is essential for learning highly cooperative behaviors, is difficult. In addition, it becomes more challenging when reinforcement and punishment, help in increasing or decreasing the specific behaviors of agents, coexist because the processes of maximizing reinforcement and minimizing punishment can often conflict in practice. This study proposes a novel exploration scheme called multi-agent decomposed reward-based exploration (MuDE), which preferably explores the action spaces associated with positive sub-rewards based on a modified reward decomposition scheme, thus effectively exploring action spaces not reachable by existing exploration schemes. We evaluate MuDE with a challenging set of StarCraft II micromanagement and modified predator-prey tasks extended to include reinforcement and punishment. The results show that MuDE accurately estimates sub-rewards and outperforms state-of-the-art approaches in both convergence speed and win rates.

12.
Psychon Bull Rev ; 2024 Aug 07.
Article in English | MEDLINE | ID: mdl-39112905

ABSTRACT

Adults struggle to learn non-native speech categories in many experimental settings (Goto, Neuropsychologia, 9(3), 317-323 1971), but learn efficiently in a video game paradigm where non-native speech sounds have functional significance (Lim & Holt, Cognitive Science, 35(7), 1390-1405 2011). Behavioral and neural evidence from this and other paradigms point toward the involvement of reinforcement learning mechanisms in speech category learning (Harmon, Idemaru, & Kapatsinski, Cognition, 189, 76-88 2019; Lim, Fiez, & Holt, Proceedings of the National Academy of Sciences, 116, 201811992 2019). We formalize this hypothesis computationally and implement a deep reinforcement learning network to map between environmental input and actions. Comparing to a supervised model of learning, we show that the reinforcement network closely matches aspects of human behavior in two experiments - learning of synthesized auditory noise tokens and improvement in speech sound discrimination. Both models perform comparably and the similarity in the output of each model leads us to believe that there is little inherent computational benefit to a reward-based learning mechanism. We suggest that the specific neural circuitry engaged by the paradigm and links between striatum and superior temporal areas play a critical role in effective learning.

13.
Heliyon ; 10(14): e33944, 2024 Jul 30.
Article in English | MEDLINE | ID: mdl-39114005

ABSTRACT

It is challenging to accurately model the overall uncertainty of the power system when it is connected to large-scale intermittent generation sources such as wind and photovoltaic generation due to the inherent volatility, uncertainty, and indivisibility of renewable energy. Deep reinforcement learning (DRL) algorithms are introduced as a solution to avoid modeling the complex uncertainties and to adapt the fluctuation of uncertainty by interacting with the environment and using feedback to continuously improve their strategies. However, the large-scale nature and uncertainty of the system lead to the sparse reward problem and high-dimensional space issue in DRL. A hierarchical deep reinforcement learning (HDRL) scheme is designed to decompose the process of solving this problem into two stages, using the reinforcement learning (RL) agent in the global stage and the heuristic algorithm in the local stage to find optimal dispatching decisions for power systems under uncertainty. Simulation studies have shown that the proposed HDRL scheme is efficient in solving power system economic dispatch problems under both deterministic and uncertain scenarios thanks to its adaptation system uncertainty, and coping with the volatility of uncertain factors while significantly improving the speed of online decision-making.

14.
J Cheminform ; 16(1): 95, 2024 Aug 08.
Article in English | MEDLINE | ID: mdl-39118113

ABSTRACT

Designing compounds with a range of desirable properties is a fundamental challenge in drug discovery. In pre-clinical early drug discovery, novel compounds are often designed based on an already existing promising starting compound through structural modifications for further property optimization. Recently, transformer-based deep learning models have been explored for the task of molecular optimization by training on pairs of similar molecules. This provides a starting point for generating similar molecules to a given input molecule, but has limited flexibility regarding user-defined property profiles. Here, we evaluate the effect of reinforcement learning on transformer-based molecular generative models. The generative model can be considered as a pre-trained model with knowledge of the chemical space close to an input compound, while reinforcement learning can be viewed as a tuning phase, steering the model towards chemical space with user-specific desirable properties. The evaluation of two distinct tasks-molecular optimization and scaffold discovery-suggest that reinforcement learning could guide the transformer-based generative model towards the generation of more compounds of interest. Additionally, the impact of pre-trained models, learning steps and learning rates are investigated.Scientific contributionOur study investigates the effect of reinforcement learning on a transformer-based generative model initially trained for generating molecules similar to starting molecules. The reinforcement learning framework is applied to facilitate multiparameter optimisation of starting molecules. This approach allows for more flexibility for optimizing user-specific property profiles and helps finding more ideas of interest.

15.
Front Robot AI ; 11: 1375490, 2024.
Article in English | MEDLINE | ID: mdl-39104806

ABSTRACT

Safefy-critical domains often employ autonomous agents which follow a sequential decision-making setup, whereby the agent follows a policy to dictate the appropriate action at each step. AI-practitioners often employ reinforcement learning algorithms to allow an agent to find the best policy. However, sequential systems often lack clear and immediate signs of wrong actions, with consequences visible only in hindsight, making it difficult to humans to understand system failure. In reinforcement learning, this is referred to as the credit assignment problem. To effectively collaborate with an autonomous system, particularly in a safety-critical setting, explanations should enable a user to better understand the policy of the agent and predict system behavior so that users are cognizant of potential failures and these failures can be diagnosed and mitigated. However, humans are diverse and have innate biases or preferences which may enhance or impair the utility of a policy explanation of a sequential agent. Therefore, in this paper, we designed and conducted human-subjects experiment to identify the factors which influence the perceived usability with the objective usefulness of policy explanations for reinforcement learning agents in a sequential setting. Our study had two factors: the modality of policy explanation shown to the user (Tree, Text, Modified Text, and Programs) and the "first impression" of the agent, i.e., whether the user saw the agent succeed or fail in the introductory calibration video. Our findings characterize a preference-performance tradeoff wherein participants perceived language-based policy explanations to be significantly more useable; however, participants were better able to objectively predict the agent's behavior when provided an explanation in the form of a decision tree. Our results demonstrate that user-specific factors, such as computer science experience (p < 0.05), and situational factors, such as watching agent crash (p < 0.05), can significantly impact the perception and usefulness of the explanation. This research provides key insights to alleviate prevalent issues regarding innapropriate compliance and reliance, which are exponentially more detrimental in safety-critical settings, providing a path forward for XAI developers for future work on policy-explanations.

16.
Neural Netw ; 179: 106579, 2024 Jul 26.
Article in English | MEDLINE | ID: mdl-39096749

ABSTRACT

How to accurately learn task-relevant state representations from high-dimensional observations with visual distractions is a realistic and challenging problem in visual reinforcement learning. Recently, unsupervised representation learning methods based on bisimulation metrics, contrast, prediction, and reconstruction have shown the ability for task-relevant information extraction. However, due to the lack of appropriate mechanisms for the extraction of task information in the prediction, contrast, and reconstruction-related approaches and the limitations of bisimulation-related methods in domains with sparse rewards, it is still difficult for these methods to be effectively extended to environments with distractions. To alleviate these problems, in the paper, the action sequences, which contain task-intensive signals, are incorporated into representation learning. Specifically, we propose a Sequential Action-induced invariant Representation (SAR) method, which decouples the controlled part (i.e., task-relevant information) and the uncontrolled part (i.e., task-irrelevant information) in noisy observations through sequential actions, thereby extracting effective representations related to decision tasks. To achieve it, the characteristic function of the action sequence's probability distribution is modeled to specifically optimize the state encoder. We conduct extensive experiments on the distracting DeepMind Control suite while achieving the best performance over strong baselines. We also demonstrate the effectiveness of our method at disregarding task-irrelevant information by applying SAR to real-world CARLA-based autonomous driving with natural distractions. Finally, we provide the analysis results of generalization drawn from the generalization decay and t-SNE visualization. Code and demo videos are available at https://github.com/DMU-XMU/SAR.git.

17.
Curr Opin Behav Sci ; 562024 Apr.
Article in English | MEDLINE | ID: mdl-39130377

ABSTRACT

Repetitive negative thinking (RNT) is a transdiagnostic construct that encompasses rumination and worry, yet what precisely is shared between rumination and worry is unclear. To clarify this, we develop a meta-control account of RNT. Meta-control refers to the reinforcement and control of mental behavior via similar computations as reinforce and control motor behavior. We propose rumination and worry are coarse terms for failure in meta-control, just as tripping and falling are coarse terms for failure in motor control. We delineate four meta-control stages and risk factors increasing the chance of failure at each, including open-ended thoughts (stage 1), individual differences influencing subgoal execution (stage 2) and switching (stage 3), and challenges inherent to learning adaptive mental behavior (stage 4). Distinguishing these stages therefore elucidates diverse processes that lead to the same behavior of excessive RNT. Our account also subsumes prominent clinical accounts of RNT into a computational cognitive neuroscience framework.

18.
PeerJ Comput Sci ; 10: e2159, 2024.
Article in English | MEDLINE | ID: mdl-39145250

ABSTRACT

In the contemporary digitalization landscape and technological advancement, the auction industry undergoes a metamorphosis, assuming a pivotal role as a transactional paradigm. Functioning as a mechanism for pricing commodities or services, the procedural intricacies and efficiency of auctions directly influence market dynamics and participant engagement. Harnessing the advancing capabilities of artificial intelligence (AI) technology, the auction sector proactively integrates AI methodologies to augment efficacy and enrich user interactions. This study delves into the intricacies of the price prediction challenge within the auction domain, introducing a sophisticated RL-GRU framework for price interval analysis. The framework commences by adeptly conducting quantitative feature extraction of commodities through GRU, subsequently orchestrating dynamic interactions within the model's environment via reinforcement learning techniques. Ultimately, it accomplishes the task of interval division and recognition of auction commodity prices through a discerning classification module. Demonstrating precision exceeding 90% across publicly available and internally curated datasets within five intervals and exhibiting superior performance within eight intervals, this framework contributes valuable technical insights for future endeavours in auction price interval prediction challenges.

19.
Sensors (Basel) ; 24(15)2024 Jul 25.
Article in English | MEDLINE | ID: mdl-39123865

ABSTRACT

Efficient and reliable data routing is critical in Advanced Metering Infrastructure (AMI) within Smart Grids, dictating the overall network performance and resilience. This paper introduces Q-RPL, a novel Q-learning-based Routing Protocol designed to enhance routing decisions in AMI deployments based on wireless mesh technologies. Q-RPL leverages the principles of Reinforcement Learning (RL) to dynamically select optimal next-hop forwarding candidates, adapting to changing network conditions. The protocol operates on top of the standard IPv6 Routing Protocol for Low-Power and Lossy Networks (RPL), integrating it with intelligent decision-making capabilities. Through extensive simulations carried out in real map scenarios, Q-RPL demonstrates a significant improvement in key performance metrics such as packet delivery ratio, end-to-end delay, and compliant factor compared to the standard RPL implementation and other benchmark algorithms found in the literature. The adaptability and robustness of Q-RPL mark a significant advancement in the evolution of routing protocols for Smart Grid AMI, promising enhanced efficiency and reliability for future intelligent energy systems. The findings of this study also underscore the potential of Reinforcement Learning to improve networking protocols.

20.
Sensors (Basel) ; 24(15)2024 Jul 31.
Article in English | MEDLINE | ID: mdl-39124010

ABSTRACT

The ability to make informed decisions in complex scenarios is crucial for intelligent automotive systems. Traditional expert rules and other methods often fall short in complex contexts. Recently, reinforcement learning has garnered significant attention due to its superior decision-making capabilities. However, there exists the phenomenon of inaccurate target network estimation, which limits its decision-making ability in complex scenarios. This paper mainly focuses on the study of the underestimation phenomenon, and proposes an end-to-end autonomous driving decision-making method based on an improved TD3 algorithm. This method employs a forward camera to capture data. By introducing a new critic network to form a triple-critic structure and combining it with the target maximization operation, the underestimation problem in the TD3 algorithm is solved. Subsequently, the multi-timestep averaging method is used to address the policy instability caused by the new single critic. In addition, this paper uses Carla platform to construct multi-vehicle unprotected left turn and congested lane-center driving scenarios and verifies the algorithm. The results demonstrate that our method surpasses baseline DDPG and TD3 algorithms in aspects such as convergence speed, estimation accuracy, and policy stability.

SELECTION OF CITATIONS
SEARCH DETAIL