RESUMO
Human-level driving is the ultimate goal of autonomous driving. As the top-level decision-making aspect of autonomous driving, behavior decision establishes short-term driving behavior strategies by evaluating road structures, adhering to traffic rules, and analyzing the intentions of other traffic participants. Existing behavior decisions are primarily implemented based on rule-based methods, exhibiting insufficient generalization capabilities when faced with new and unseen driving scenarios. In this paper, we propose a novel behavior decision method that leverages the inherent generalization and commonsense reasoning abilities of visual language models (VLMs) to learn and simulate the behavior decision process in human driving. We constructed a novel instruction-following dataset containing a large number of image-text instructions paired with corresponding driving behavior labels, to support the learning of the Drive Large Language and Vision Assistant (DriveLLaVA) and enhance the transparency and interpretability of the entire decision process. DriveLLaVA is fine-tuned on this dataset using the Low-Rank Adaptation (LoRA) approach, which efficiently optimizes the model parameter count and significantly reduces training costs. We conducted extensive experiments on a large-scale instruction-following dataset, and compared with state-of-the-art methods, DriveLLaVA demonstrated excellent behavior decision performance. DriveLLaVA is capable of handling various complex driving scenarios, showing strong robustness and generalization abilities.
Assuntos
Condução de Veículo , Tomada de Decisões , Humanos , Condução de Veículo/psicologia , Tomada de Decisões/fisiologia , Algoritmos , IdiomaRESUMO
Reinforcement Learning (RL) methods are regarded as effective for designing autonomous driving policies. However, even when RL policies are trained to convergence, ensuring their robust safety remains a challenge, particularly in long-tail data. Therefore, decision-making based on RL must adequately consider potential variations in data distribution. This paper presents a framework for highway autonomous driving decisions that prioritizes both safety and robustness. Utilizing the proposed Replay Buffer Constrained Policy Optimization (RECPO) method, this framework updates RL strategies to maximize rewards while ensuring that the policies always remain within safety constraints. We incorporate importance sampling techniques to collect and store data in a Replay buffer during agent operation, allowing the reutilization of data from old policies for training new policy models, thus mitigating potential catastrophic forgetting. Additionally, we transform the highway autonomous driving decision problem into a Constrained Markov Decision Process (CMDP) and apply our proposed RECPO for training, optimizing highway driving policies. Finally, we deploy our method in the CARLA simulation environment and compare its performance in typical highway scenarios against traditional CPO, current advanced strategies based on Deep Deterministic Policy Gradient (DDPG), and IDM + MOBIL (Intelligent Driver Model and the model for minimizing overall braking induced by lane changes). The results show that our framework significantly enhances model convergence speed, safety, and decision-making stability, achieving a zero-collision rate in highway autonomous driving.
RESUMO
Highly integrated information sharing among people, vehicles, roads, and cloud systems, along with the rapid development of autonomous driving technologies, has spurred the evolution of automobiles from simple "transportation tools" to interconnected "intelligent systems". The intelligent cockpit is a comprehensive application space for various new technologies in intelligent vehicles, encompassing the domains of driving control, riding comfort, and infotainment. It provides drivers and passengers with safety, comfort, and pleasant driving experiences, serving as the gateway for traditional automobile manufacturing to upgrade towards an intelligent automotive industry ecosystem. This is the optimal convergence point for the intelligence, connectivity, electrification, and sharing of automobiles. Currently, the form, functions, and interaction methods of the intelligent cockpit are gradually changing, transitioning from the traditional "human adapts to the vehicle" viewpoint to the "vehicle adapts to human", and evolving towards a future of natural interactive services where "humans and vehicles mutually adapt". This article reviews the definitions, intelligence levels, functional domains, and technical frameworks of intelligent automotive cockpits. Additionally, combining the core mechanisms of human-machine interactions in intelligent cockpits, this article proposes an intelligent-cockpit human-machine interaction process and summarizes the current state of key technologies in intelligent-cockpit human-machine interactions. Lastly, this article analyzes the current challenges faced in the field of intelligent cockpits and forecasts future trends in intelligent cockpit technologies.
RESUMO
The centralized coordination of Connected and Automated Vehicles (CAVs) at unsignalized intersections aims to enhance traffic efficiency, driving safety, and passenger comfort. Autonomous Intersection Management (AIM) systems introduce a novel approach for centralized coordination. However, existing rule-based and optimization methods often face the challenges of poor generalization and low computational efficiency when dealing with complex traffic environments and highly dynamic traffic conditions. Additionally, current Reinforcement Learning (RL)-based methods encounter difficulties around policy inference and safety. To address these issues, this study proposes Constraint-Guided Behavior Transformer for Safe Reinforcement Learning (CoBT-SRL), which uses transformers as the policy network to achieve efficient decision-making for vehicle driving behaviors. This method leverages the ability of transformers to capture long-range dependencies and improve data sample efficiency by using historical states, actions, and reward and cost returns to predict future actions. Furthermore, to enhance policy exploration performance, a sequence-level entropy regularizer is introduced to encourage policy exploration while ensuring the safety of policy updates. Simulation results indicate that CoBT-SRL exhibits stable training progress and converges effectively. CoBT-SRL outperforms other RL methods and vehicle intersection coordination schemes (VICS) based on optimal control in terms of traffic efficiency, driving safety, and passenger comfort.
RESUMO
Urban traffic congestion poses significant economic and environmental challenges worldwide. To mitigate these issues, Adaptive Traffic Signal Control (ATSC) has emerged as a promising solution. Recent advancements in deep reinforcement learning (DRL) have further enhanced ATSC's capabilities. This paper introduces a novel DRL-based ATSC approach named the Sequence Decision Transformer (SDT), employing DRL enhanced with attention mechanisms and leveraging the robust capabilities of sequence decision models, akin to those used in advanced natural language processing, adapted here to tackle the complexities of urban traffic management. Firstly, the ATSC problem is modeled as a Markov Decision Process (MDP), with the observation space, action space, and reward function carefully defined. Subsequently, we propose SDT, specifically tailored to solve the MDP problem. The SDT model uses a transformer-based architecture with an encoder and decoder in an actor-critic structure. The encoder processes observations and outputs, both encoded data for the decoder, and value estimates for parameter updates. The decoder, as the policy network, outputs the agent's actions. Proximal Policy Optimization (PPO) is used to update the policy network based on historical data, enhancing decision-making in ATSC. This approach significantly reduces training times, effectively manages larger observation spaces, captures dynamic changes in traffic conditions more accurately, and enhances traffic throughput. Finally, the SDT model is trained and evaluated in synthetic scenarios by comparing the number of vehicles, average speed, and queue length against three baselines, including PPO, a DQN tailored for ATSC, and FRAP, a state-of-the-art ATSC algorithm. SDT shows improvements of 26.8%, 150%, and 21.7% over traditional ATSC algorithms, and 18%, 30%, and 15.6% over the FRAP. This research underscores the potential of integrating Large Language Models (LLMs) with DRL for traffic management, offering a promising solution to urban congestion.
RESUMO
Adaptive cruise control (ACC) enables efficient, safe, and intelligent vehicle control by autonomously adjusting speed and ensuring a safe following distance from the vehicle in front. This paper proposes a novel adaptive cruise system, namely the Safety-First Reinforcement Learning Adaptive Cruise Control (SFRL-ACC). This system aims to leverage the model-free nature and high real-time inference efficiency of Deep Reinforcement Learning (DRL) to overcome the challenges of modeling difficulties and lower computational efficiency faced by current optimization control-based ACC methods while simultaneously maintaining safety advantages and optimizing ride comfort. Firstly, we transform the ACC problem into a safe DRL formulation Constrained Markov Decision Process (CMDP) by carefully designing state, action, reward, and cost functions. Subsequently, we propose the Projected Constrained Policy Optimization (PCPO)-based ACC Algorithm SFRL-ACC, which is specifically tailored to solve the CMDP problem. PCPO incorporates safety constraints that further restrict the trust region formed by the Kullback-Leibler (KL) divergence, facilitating DRL policy updates that maximize performance while keeping safety costs within their limit bounds. Finally, we train an SFRL-ACC policy and compare its computation time, traffic efficiency, ride comfort, and safety with state-of-the-art MPC-based ACC control methods. The experimental results prove the superiority of the proposed method in the aforementioned performance aspects.
RESUMO
Introduction: The rapid development of animal husbandry has brought many problems such as ecological environmental pollution and public health damage. The resource utilization of livestock manure is the key way to deal with the above crisis and turn waste into treasure. Methods: Based on the theory of perceived value, this paper uses multi-group structural equation model to explore the driving mechanism of perceived value on the resource utilization behavior of livestock manure. Results and discussion: The results showed that: (1) The resource utilization behavior of livestock manure followed the logic of "cognitive level â cognitive trade-off â perceived value â behavioral intention â behavioral performance." Perceived benefit and perceived risk have positive and reverse driving effects on perceived value, respectively. Perceived value has a positive driving effect on behavioral intention. The behavioral intention has a positive driving effect on utilization behavior. (2) Among the observed variables of perceived benefits, ecological benefits have the greatest impact; Among the observed variables of perceived risk, economic risk has the greatest impact. Among the observed variables of perceived value, Significance cognition has the greatest influence. Among the observed variables of behavioral intention, utilization intention has the greatest influence. (3) The perceived value has a differential effect on the utilization behavior of livestock manure resources of different part-time farmers, and the driving effect is more obvious for full-time farmers. Conclusions: Therefore, it is necessary to improve the resource utilization system of livestock manure, increase the channel for realizing the output of manure resources, strengthen technical assistance and policy subsidies, and implement policies according to local conditions to improve the overall perceived value of farmers.
RESUMO
BACKGROUND AND AIMS: There are no comparative studies on the efficacy of hepatic resection (HR) and CyberKnife stereotactic body radiation therapy (CK-SBRT) plus transhepatic arterial chemotherapy embolization (TACE) in the treatment of large hepatocellular carcinoma (HCC). Therefore, this study aimed to compare the efficacy of HR and CK-SBRT+TACE in large HCC. METHODS: A total of one hundred and sixteen patients were selected from November 2011 to December 2016. Among them, 50 were allocated to the CK-SBRT+TACE group and 66 were allocated to the HR group. The Kaplan-Meier method was applied to calculate overall survival (OS) and progression-free survival (PFS) rates. Propensity score matching was performed to control for baseline differences between the groups. RESULTS: Thirty-six paired patients were selected from the CK-SBRT+TACE and HR groups. After propensity score matching, the 1-, 2- and 3-year OS rates were 83.3%, 77.8% and 66.7% in the HR group and 80.6%, 72.2% and 52.8% in the CK-SBRT+TACE group, respectively. The 1-, 2- and 3-year PFS rates were 71.6%, 57.3% and 42.3% in the HR group and 66.1%, 45.8% and 39.3% in the CK-SBRT+TACE group, respectively (OS: p=0.143; PFS: p=0.445). Both a high platelet count and low alpha-fetoprotein value were revealed as influencing factors in improving OS and PFS. CONCLUSIONS: CK-SBRT+TACE brought local effects that were similar to those of HR in HCC patients with a large and single lesion. Moreover, the liver injury occurrence rate was acceptable in both groups.
RESUMO
Purpose: The aim of our study was to evaluate the curative effect and safety of CyberKnife stereotactic body radiation therapy in treating decompensated cirrhosis hepatocellular carcinoma (HCC) patients. Methods: From March 2011 to December 2015, 32 HCC patients who refused or were ineligible for other treatments were treated with CyberKnife stereotactic body radiation therapy. Among these patients, 17 were Child-Pugh score 7 (53.13%), 7 were Child-Pugh score 8 (21.87%), 4 were Child-Pugh score 9 (12.50%), and 4 were Child-Pugh score 10 (12.50%). A total dose of 45-54 Gy in 5-10 fractions was given according to the location of lesions. Results: The median follow-up period was 30 months (8-46 months). By July 2019, the tumors were recurrent or metastasized in 17 patients. The overall survival rates of 1, 2, and 3 years were 84.4, 61.8, and 46.0%, respectively. After 1, 2, and 3 years, the local control rates were 92.9%. The progression-free survival rates of the 1, 2, and 3-year treatments were 73.8, 44.6, and 33.4%, respectively. Conclusions: CyberKnife stereotactic body radiation therapy was an effective option for HCC patients with decompensated cirrhosis. The liver injury occurrence rate was acceptable in our study.
RESUMO
BACKGROUND: CyberKnife stereotactic body radiation therapy (CK-SBRT) has been applied to hepatocellular carcinoma (HCC) patients for several years. The study aim was to compare the efficacy of hepatic resection (HR) and CK-SBRT in naive small hepatocellular carcinoma (sHCC) patients with hepatitis virus-related cirrhosis using a 5-year follow-up study. MATERIALS AND METHODS: This retrospective cohort study included 317 naive sHCC patients (246 men and 71 women) with hepatitis B or C virus cirrhosis who were treated with HR (n = 195) or CK-SBRT (n = 122) from November 2011 to December 2015. Cumulative overall survival (OS) rates and progression-free survival (PFS) rates were calculated using Kaplan-Meier method. RESULTS: After the propensity score-matched analysis, 104 patients were selected from each group for further analysis. The 1-, 2-, 3-, and 5-year OS rates were 96.2%, 89.4%, 85.5% and 70.7% in the HR group and 93.3%, 89.4%, 83.7% and 71.0% in the CK-SBRT group, respectively. The 1-, 2-, 3-, and 5-year PFS rates were 78.8%, 64.3%, 56.4% and 47.3% in the HR group and 84.5%, 67.8%, 58.9% and 49.0% in the CK-SBRT group, respectively. No significant difference was found between the two groups in the OS and PFS rates (OS, p = 0.673; PFS, p = 0.350). No death occurred due to the toxicity or complications of HR or CK-SBRT. CONCLUSION: CK-SBRT could be an effective alternative to HR for sHCC naive patients with hepatitis-related cirrhosis, especially if patients have higher CP scores and lower PLT counts. PLT counts should be factored into survival evaluation of HCC treatment.