RESUMEN
BACKGROUND: Clinical diagnoses are typically made by following a series of steps recommended by guidelines that are authored by colleges of experts. Accordingly, guidelines play a crucial role in rationalizing clinical decisions. However, they suffer from limitations, as they are designed to cover the majority of the population and often fail to account for patients with uncommon conditions. Moreover, their updates are long and expensive, making them unsuitable for emerging diseases and new medical practices. METHODS: Inspired by guidelines, we formulate the task of diagnosis as a sequential decision-making problem and study the use of Deep Reinforcement Learning (DRL) algorithms to learn the optimal sequence of actions to perform in order to obtain a correct diagnosis from Electronic Health Records (EHRs), which we name a diagnostic decision pathway. We apply DRL to synthetic yet realistic EHRs and develop two clinical use cases: Anemia diagnosis, where the decision pathways follow a decision tree schema, and Systemic Lupus Erythematosus (SLE) diagnosis, which follows a weighted criteria score. We particularly evaluate the robustness of our approaches to noise and missing data, as these frequently occur in EHRs. RESULTS: In both use cases, even with imperfect data, our best DRL algorithms exhibit competitive performance compared to traditional classifiers, with the added advantage of progressively generating a pathway to the suggested diagnosis, which can both guide and explain the decision-making process. CONCLUSION: DRL offers the opportunity to learn personalized decision pathways for diagnosis. Our two use cases illustrate the advantages of this approach: they generate step-by-step pathways that are explainable, and their performance is competitive when compared to state-of-the-art methods.
RESUMEN
Urban traffic congestion poses significant economic and environmental challenges worldwide. To mitigate these issues, Adaptive Traffic Signal Control (ATSC) has emerged as a promising solution. Recent advancements in deep reinforcement learning (DRL) have further enhanced ATSC's capabilities. This paper introduces a novel DRL-based ATSC approach named the Sequence Decision Transformer (SDT), employing DRL enhanced with attention mechanisms and leveraging the robust capabilities of sequence decision models, akin to those used in advanced natural language processing, adapted here to tackle the complexities of urban traffic management. Firstly, the ATSC problem is modeled as a Markov Decision Process (MDP), with the observation space, action space, and reward function carefully defined. Subsequently, we propose SDT, specifically tailored to solve the MDP problem. The SDT model uses a transformer-based architecture with an encoder and decoder in an actor-critic structure. The encoder processes observations and outputs, both encoded data for the decoder, and value estimates for parameter updates. The decoder, as the policy network, outputs the agent's actions. Proximal Policy Optimization (PPO) is used to update the policy network based on historical data, enhancing decision-making in ATSC. This approach significantly reduces training times, effectively manages larger observation spaces, captures dynamic changes in traffic conditions more accurately, and enhances traffic throughput. Finally, the SDT model is trained and evaluated in synthetic scenarios by comparing the number of vehicles, average speed, and queue length against three baselines, including PPO, a DQN tailored for ATSC, and FRAP, a state-of-the-art ATSC algorithm. SDT shows improvements of 26.8%, 150%, and 21.7% over traditional ATSC algorithms, and 18%, 30%, and 15.6% over the FRAP. This research underscores the potential of integrating Large Language Models (LLMs) with DRL for traffic management, offering a promising solution to urban congestion.
RESUMEN
This paper investigates the single agile optical satellite scheduling problem, which has received increasing attention due to the rapid growth in earth observation requirements. Owing to the complicated constraints and considerable solution space of this problem, the conventional exact methods and heuristic methods, which are sensitive to the problem scale, demand high computational expenses. Thus, an efficient approach is demanded to solve this problem, and this paper proposes a deep reinforcement learning algorithm with a local attention mechanism. A mathematical model is first established to describe this problem, which considers a series of complex constraints and takes the profit ratio of completed tasks as the optimization objective. Then, a neural network framework with an encoder-decoder structure is adopted to generate high-quality solutions, and a local attention mechanism is designed to improve the generation of solutions. In addition, an adaptive learning rate strategy is proposed to guide the actor-critic training algorithm to dynamically adjust the learning rate in the training process to enhance the training effectiveness of the proposed network. Finally, extensive experiments verify that the proposed algorithm outperforms the comparison algorithms in terms of solution quality, generalization performance, and computation efficiency.
RESUMEN
Treatment planning for chronic diseases is a critical task in medical artificial intelligence, particularly in traditional Chinese medicine (TCM). However, generating optimized sequential treatment strategies for patients with chronic diseases in different clinical encounters remains a challenging issue that requires further exploration. In this study, we proposed a TCM herbal prescription planning framework based on deep reinforcement learning for chronic disease treatment (PrescDRL). PrescDRL is a sequential herbal prescription optimization model that focuses on long-term effectiveness rather than achieving maximum reward at every step, thereby ensuring better patient outcomes. We constructed a high-quality benchmark dataset for sequential diagnosis and treatment of diabetes and evaluated PrescDRL against this benchmark. Our results showed that PrescDRL achieved a higher curative effect, with the single-step reward improving by 117% and 153% compared to doctors. Furthermore, PrescDRL outperformed the benchmark in prescription prediction, with precision improving by 40.5% and recall improving by 63%. Overall, our study demonstrates the potential of using artificial intelligence to improve clinical intelligent diagnosis and treatment in TCM.
RESUMEN
With the rapid advancement of drone technology, the efficient distribution of drones has garnered significant attention. Central to this discourse is the energy consumption of drones, a critical metric for assessing energy-efficient distribution strategies. Accordingly, this study delves into the energy consumption factors affecting drone distribution. A primary challenge in drone distribution lies in devising optimal, energy-efficient routes for drones. However, traditional routing algorithms, predominantly heuristic-based, exhibit certain limitations. These algorithms often rely on heuristic rules and expert knowledge, which can constrain their ability to escape local optima. Motivated by these shortcomings, we propose a novel multi-agent deep reinforcement learning algorithm that integrates a drone energy consumption model, namely EMADRL. The EMADRL algorithm first formulates the drone routing problem within a multi-agent reinforcement learning framework. It subsequently designs a strategy network model comprising multiple agent networks, tailored to address the node adjacency and masking complexities typical of multi-depot vehicle routing problem. Training utilizes strategy gradient algorithms and attention mechanisms. Furthermore, local and sampling search strategies are introduced to enhance solution quality. Extensive experimentation demonstrates that EMADRL consistently achieves high-quality solutions swiftly. A comparative analysis against contemporary algorithms reveals EMADRL's superior energy efficiency, with average energy savings of 5.96% and maximum savings reaching 12.45%. Thus, this approach offers a promising new avenue for optimizing energy consumption in last-mile distribution scenarios.
RESUMEN
How to improve the success rate of autonomous underwater vehicle (AUV) path planning and reduce travel time as much as possible is a very challenging and crucial problem in the practical applications of AUV in the complex ocean current environment. Traditional reinforcement learning algorithms lack exploration of the environment, and the strategies learned by the agent may not generalize well to other different environments. To address these challenges, we propose a novel AUV path planning algorithm named the Noisy Dueling Double Deep Q-Network (ND3QN) algorithm by modifying the reward function and introducing a noisy network, which generalizes the traditional D3QN algorithm. Compared with the classical algorithm [e.g., Rapidly-exploring Random Trees Star (RRT*), DQN, and D3QN], with simulation experiments conducted in realistic terrain and ocean currents, the proposed ND3QN algorithm demonstrates the outstanding characteristics of a higher success rate of AUV path planning, shorter travel time, and smoother paths.
RESUMEN
Data transmission of Virtual Reality (VR) plays an important role in delivering a powerful VR experience. This increasing demand on both high bandwidth and low latency. 6G emerging technologies like Software Defined Network (SDN) and resource slicing are acting as promising technologies for addressing the transmission requirements of VR users. Efficient resource management becomes dominant to ensure a satisfactory user experience. The integration of Deep Reinforcement Learning (DRL) allows for dynamic network resource balancing, minimizing communication latency and maximizing data transmission rates wirelessly. Employing slicing techniques further aids in managing distributed resources across the network for different services as enhanced Mobile Broadband (eMBB) and Ultra-Reliable and Low Latency Communications (URLLC). The proposed VR-based SDN system model for 6G cellular networks facilitates centralized administration of resources, enhancing communication between VR users. This innovative solution seeks to contribute to the effective and streamlined resource management essential for VR video transmission in 6G cellular networks. The utilization of Deep Reinforcement Learning (DRL) approaches, is presented as an alternative solution, showcasing significant performance and feature distinctions through comparative results. Our results show that implementing strategies based on DRL leads to a considerable improvement in the resource management process as well as in the achievable data rate and a reduction in the necessary latency in dynamic and large scale networks.
RESUMEN
Enhancing patient response to immune checkpoint inhibitors (ICIs) is crucial in cancer immunotherapy. We aim to create a data-driven mathematical model of the tumor immune microenvironment (TIME) and utilize deep reinforcement learning (DRL) to optimize patient-specific ICI therapy combined with chemotherapy (ICC). Using patients' genomic and transcriptomic data, we develop an ordinary differential equations (ODEs)-based TIME dynamic evolutionary model to characterize interactions among chemotherapy, ICIs, immune cells, and tumor cells. A DRL agent is trained to determine the personalized optimal ICC therapy. Numerical experiments with real-world data demonstrate that the proposed TIME model can predict ICI therapy response. The DRL-derived personalized ICC therapy outperforms predefined fixed schedules. For tumors with extremely low CD8 + T cell infiltration ('extremely cold tumors'), the DRL agent recommends high-dosage chemotherapy alone. For tumors with higher CD8 + T cell infiltration ('cold' and 'hot tumors'), an appropriate chemotherapy dosage induces CD8 + T cell proliferation, enhancing ICI therapy outcomes. Specifically, for 'hot tumors', chemotherapy and ICI are administered simultaneously, while for 'cold tumors', a mid-dosage of chemotherapy makes the TIME 'hotter' before ICI administration. However, in several 'cold tumors' with rapid resistant tumor cell growth, ICC eventually fails. This study highlights the potential of utilizing real-world clinical data and DRL algorithm to develop personalized optimal ICC by understanding the complex biological dynamics of a patient's TIME. Our ODE-based TIME dynamic evolutionary model offers a theoretical framework for determining the best use of ICI, and the proposed DRL agent may guide personalized ICC schedules.
Asunto(s)
Inhibidores de Puntos de Control Inmunológico , Neoplasias , Microambiente Tumoral , Humanos , Microambiente Tumoral/inmunología , Inhibidores de Puntos de Control Inmunológico/uso terapéutico , Inhibidores de Puntos de Control Inmunológico/farmacología , Neoplasias/tratamiento farmacológico , Neoplasias/inmunología , Linfocitos T CD8-positivos/inmunología , Linfocitos T CD8-positivos/efectos de los fármacos , Medicina de Precisión , InmunoterapiaRESUMEN
Crawling robots are the focus of intelligent inspection research, and the main feature of this type of robot is the flexibility of in-plane attitude adjustment. The crawling robot HIT_Spibot is a new type of steam generator heat transfer tube inspection robot with a unique mobility capability different from traditional quadrupedal robots. This paper introduces a hierarchical motion planning approach for HIT_Spibot, aiming to achieve efficient and agile maneuverability. The proposed method integrates three distinct planners to handle complex motion tasks: a nonlinear optimization-based base motion planner, a TOPSIS-based base orientation planner, and a Mask-D3QN (MD3QN) algorithm-based gait motion planner. Initially, the robot's base and foot workspace were delineated through envelope analysis, followed by trajectory computation using Larangian methods. Subsequently, the TOPSIS algorithm was employed to establish an evaluation framework conducive to foundational turning planning. Finally, the MD3QN algorithm trained foot-points to facilitate robot movement along predefined paths. Experimental results demonstrated the method's adaptability across diverse tube structures, showcasing robust performance even in environments with random obstacles. Compared to the D3QN algorithm, MD3QN achieved a 100% success rate, enhanced average overall scores by 6.27%, reduced average stride lengths by 39.04%, and attained a stability rate of 58.02%. These results not only validate the effectiveness and practicality of the method but also showcase the significant potential of HIT_Spibot in the field of industrial inspection.
RESUMEN
BACKGROUND: The improvement of controllers of left ventricular assist device (LVAD) technology supporting heart failure (HF) patients has enormous impact, given the high prevalence and mortality of HF in the population. The use of reinforcement learning for control applications in LVAD remains minimally explored. This work introduces a preload-based deep reinforcement learning control for LVAD based on the proximal policy optimization algorithm. METHODS: The deep reinforcement learning control is built upon data derived from a deterministic high-fidelity cardiorespiratory simulator exposed to variations of total blood volume, heart rate, systemic vascular resistance, pulmonary vascular resistance, right ventricular end-systolic elastance, and left ventricular end-systolic elastance, to replicate realistic inter- and intra-patient variability of patients with a severe HF supported by LVAD. The deep reinforcement learning control obtained in this work is trained to avoid ventricular suction and allow aortic valve opening by using left ventricular pressure signals: end-diastolic pressure, maximum pressure in the left ventricle (LV), and maximum pressure in the aorta. RESULTS: The results show controller obtained in this work, compared to the constant speed LVAD alternative, assures a more stable end-diastolic volume (EDV), with a standard deviation of 5 mL and 9 mL, respectively, and a higher degree of aortic flow, with an average flow of 1.1 L/min and 0.9 L/min, respectively. CONCLUSION: This work implements a deep reinforcement learning controller in a high-fidelity cardiorespiratory simulator, resulting in increases of flow through the aortic valve and increases of EDV stability, when compared to a constant speed LVAD strategy.
RESUMEN
State representations considerably accelerate learning speed and improve data efficiency for deep reinforcement learning (DRL), especially for visual tasks. Task-relevant state representations could focus on features relevant to the task, filter out irrelevant elements, and thus further improve performance. However, task-relevant representations are typically obtained through model-based DRL methods, which involves the challenging task of learning a transition function. Moreover, inaccuracies in the learned transition function can potentially lead to performance degradation and negatively impact the learning of the policy. In this paper, to address the above issue, we propose a novel method of explainable task-relevant state representation (ETrSR) for model-free DRL that is direct, robust, and without any requirement of learning of a transition model. More specifically, the proposed ETrSR first disentangles the features from the states based on the beta variational autoencoder (ß-VAE). Then, a reward prediction model is employed to bootstrap these features to be relevant to the task, and the explainable states can be obtained by decoding the task-related features. Finally, we validate our proposed method on the CarRacing environment and various tasks in the DeepMind control suite (DMC), which demonstrates the explainability for better understanding of the decision-making process and the outstanding performance of the proposed method even in environments with strong distractions.
RESUMEN
In the air-to-ground transmissions, the lifespan of the network is based on the "unmanned aerial vehicle's (UAV)" life span because of the limited battery capacity. Thus, the enhancement of energy efficiency and the outage of the ground candidate's minimization are significant factors of the network functionality. UAV-aided transmission can highly enhance the spectrum efficacy and coverage. Because of their flexible deployment and the high maneuverability, the UAVs can be the best alternative for the situations where the "Internet of Things (IoT)" systems utilize more energy to attain the essential information rate, when they are far away from the terrestrial base station. Therefore, it is significant to win over the few troubles in the conventional UAV-aided efficiency approaches. Thus, this proposed work is aimed to design an innovative energy efficiency framework in the UAV-assisted network using a reinforcement learning mechanism. The energy efficiency optimization in the UAV offers better wireless coverage to the static and mobile ground user. Presently, reinforcement learning techniques effectively optimize the energy efficiency rate of the system by employing the 2D trajectory mechanism, which effectively removes the interference rate attained in the nearby UAV cells. The main objective of the recommended framework is to maximize the energy efficiency rate of the UAV network by performing the joint optimization using UAV 3D trajectory, with the energy utilized during interference accounting, and connected user counts. Hence, an efficient Adaptive Deep Reinforcement Learning with Novel Loss Function (ADRL-NLF) framework is designed to provide a better energy efficiency rate to the UAV network. Moreover, the parameter of ADRL is tuned using the Hybrid Energy Valley and Hermit Crab (HEVHC) algorithm. Various experimental observations are performed to observe the effectualness rate of the recommended energy efficiency model for UAV-based networks over the classical energy efficiency framework in UAV Networks.
RESUMEN
Biologically inspired jumping robots exhibit exceptional movement capabilities and can quickly overcome obstacles. However, the stability and accuracy of jumping movements are significantly compromised by rapid changes in posture. Here, we propose a stable jumping control algorithm for a locust-inspired jumping robot based on deep reinforcement learning. The algorithm utilizes a training framework comprising two neural network modules (actor network and critic network) to enhance training performance. The framework can control jumping by directly mapping the robot's observations (robot position and velocity, obstacle position, target position, etc.) to its joint torques. The control policy increases randomness and exploration by introducing an entropy term to the policy function. Moreover, we designed a stage incentive mechanism to adjust the reward function dynamically, thereby improving the robot's jumping stability and accuracy. We established a locus-inspired jumping robot platform and conducted a series of jumping experiments in simulation. The results indicate that the robot could perform smooth and non-flip jumps, with the error of the distance from the target remaining below 3%. The robot consumed 44.6% less energy to travel the same distance by jumping compared with walking. Additionally, the proposed algorithm exhibited a faster convergence rate and improved convergence effects compared with other classical algorithms.
RESUMEN
Due to the radial network structures, small cross-sectional lines, and light loads characteristic of existing AC distribution networks in mountainous areas, the development of active distribution networks (ADNs) in these regions has revealed significant issues with integrating distributed generation (DGs) and consuming renewable energy. Focusing on this issue, this paper proposes a wide-range thyristor-controlled series compensation (TCSC)-based ADN and presents a deep reinforcement learning (DRL)-based optimal operation strategy. This strategy takes into account the complementarity of hydropower, photovoltaic (PV) systems, and energy storage systems (ESSs) to enhance the capacity for consuming renewable energy. In the proposed ADN, a wide-range TCSC connects the sub-networks where PV and hydropower systems are located, with ESSs configured for each renewable energy generation. The designed wide-range TCSC allows for power reversal and improves power delivery efficiency, providing conditions for the optimization operation. The optimal operation issue is formulated as a Markov decision process (MDP) with continuous action space and solved using the twin delayed deep deterministic policy gradient (TD3) algorithm. The optimal objective is to maximize the consumption of renewable energy sources (RESs) and minimize line losses by coordinating the charging/discharging of ESSs with the operation mode of the TCSC. The simulation results demonstrate the effectiveness of the proposed method.
RESUMEN
Next-generation mobile networks, such as those beyond the 5th generation (B5G) and 6th generation (6G), have diverse network resource demands. Network slicing (NS) and device-to-device (D2D) communication have emerged as promising solutions for network operators. NS is a candidate technology for this scenario, where a single network infrastructure is divided into multiple (virtual) slices to meet different service requirements. Combining D2D and NS can improve spectrum utilization, providing better performance and scalability. This paper addresses the challenging problem of dynamic resource allocation with wireless network slices and D2D communications using deep reinforcement learning (DRL) techniques. More specifically, we propose an approach named DDPG-KRP based on deep deterministic policy gradient (DDPG) with K-nearest neighbors (KNNs) and reward penalization (RP) for undesirable action elimination to determine the resource allocation policy maximizing long-term rewards. The simulation results show that the DDPG-KRP is an efficient solution for resource allocation in wireless networks with slicing, outperforming other considered DRL algorithms.
RESUMEN
Vehicle-to-everything (V2X) communication is pivotal in enhancing cooperative awareness in vehicular networks. Typically, awareness is viewed as a vehicle's ability to perceive and share real-time kinematic information. We present a novel definition of awareness in V2X communications, conceptualizing it as a multi-faceted concept involving vehicle detection, tracking, and maintaining their safety distances. To enhance this awareness, we propose a deep reinforcement learning framework for the joint control of beacon rate and transmit power (DRL-JCBRTP). Our DRL-JCBRTP framework integrates LSTM-based actor networks and MLP-based critic networks within the Soft Actor-Critic (SAC) algorithm to effectively learn optimal policies. Leveraging local state information, the DRL-JCBRTP scheme uses an innovative reward function to increase the minimum awareness failure distance. Our SLMLab-Gym-VEINS simulations show that the DRL-JCBRTP scheme outperforms existing beaconing schemes in minimizing awareness failure probability and maximizing awareness distance, ultimately improving driving safety.
RESUMEN
Mobile Edge Computing (MEC) is crucial for reducing latency by bringing computational resources closer to the network edge, thereby enhancing the quality of services (QoS). However, the broad deployment of cloudlets poses challenges in efficient network slicing, particularly when traffic distribution is uneven. Therefore, these challenges include managing diverse resource requirements across widely distributed cloudlets, minimizing resource conflicts and delays, and maintaining service quality amid fluctuating request rates. Addressing this requires intelligent strategies to predict request types (common or urgent), assess resource needs, and allocate resources efficiently. Emerging technologies like edge computing and 5G with network slicing can handle delay-sensitive IoT requests rapidly, but a robust mechanism for real-time resource and utility optimization remains necessary. To address these challenges, we designed an end-to-end network slicing approach that predicts common and urgent user requests through T distribution. We formulated our problem as a multi-agent Markov decision process (MDP) and introduced a multi-agent soft actor-critic (MAgSAC) algorithm. This algorithm prevents the wastage of scarce resources by intelligently activating and deactivating virtual network function (VNF) instances, thereby balancing the allocation process. Our approach aims to optimize overall utility, balancing trade-offs between revenue, energy consumption costs, and latency. We evaluated our method, MAgSAC, through simulations, comparing it with the following six benchmark schemes: MAA3C, SACT, DDPG, S2Vec, Random, and Greedy. The results demonstrate that our approach, MAgSAC, optimizes utility by 30%, minimizes energy consumption costs by 12.4%, and reduces execution time by 21.7% compared to the closest related multi-agent approach named MAA3C.
RESUMEN
Spinal fusion surgery requires highly accurate implantation of pedicle screw implants, which must be conducted in critical proximity to vital structures with a limited view of the anatomy. Robotic surgery systems have been proposed to improve placement accuracy. Despite remarkable advances, current robotic systems still lack advanced mechanisms for continuous updating of surgical plans during procedures, which hinders attaining higher levels of robotic autonomy. These systems adhere to conventional rigid registration concepts, relying on the alignment of preoperative planning to the intraoperative anatomy. In this paper, we propose a safe deep reinforcement learning (DRL) planning approach (SafeRPlan) for robotic spine surgery that leverages intraoperative observation for continuous path planning of pedicle screw placement. The main contributions of our method are (1) the capability to ensure safe actions by introducing an uncertainty-aware distance-based safety filter; (2) the ability to compensate for incomplete intraoperative anatomical information, by encoding a-priori knowledge of anatomical structures with neural networks pre-trained on pre-operative images; and (3) the capability to generalize over unseen observation noise thanks to the novel domain randomization techniques. Planning quality was assessed by quantitative comparison with the baseline approaches, gold standard (GS) and qualitative evaluation by expert surgeons. In experiments with human model datasets, our approach was capable of achieving over 5% higher safety rates compared to baseline approaches, even under realistic observation noise. To the best of our knowledge, SafeRPlan is the first safety-aware DRL planning approach specifically designed for robotic spine surgery.
RESUMEN
Task scheduling problem (TSP) is huge challenge in cloud computing paradigm as number of tasks comes to cloud application platform vary from time to time and all the tasks consists of variable length, runtime capacities. All these tasks may generated from various heterogeneous resources which comes onto cloud console directly effects the performance of cloud paradigm with increase in makespan, energy consumption, resource costs. Traditional task scheduling algorithms cannot handle these type of complex workloads in cloud paradigm. Many authors developed Task Scheduling algorithms by using metaheuristic techniques, hybrid approaches but all these algorithms give near optimal solutions but still TSP is a highly challenging and dynamic scenario as it resembles NP hard problem. Therefore, to tackle the TSP in cloud computing paradigm and schedule the tasks in an effective way in cloud paradigm, we formulated Adaptive Task scheduler which segments all the tasks comes to cloud console as sub tasks and fed these to the scheduler which is modeled by Improved Asynchronous Advantage Actor Critic Algorithm(IA3C) to generate schedules. This scheduling process is carried out in two stages. In first stage, all incoming tasks are segmented as sub tasks. After segmentation, all these sub tasks according to their size, execution time, communication time are grouped together and fed to the (ATSIA3C) scheduler. In the second stage, it checks for the above said constraints and disperse them onto the corresponding suitable processing capacity VMs resided in datacenters. Proposed ATSIA3C is simulated on Cloudsim. Extensive simulations are conducted using both fabricated worklogs and as well as realtime supercomputing worklogs. Our proposed mechanism evaluated over baseline algorithms i.e. RATS-HM, AINN-BPSO, MOABCQ. From results it is evident that our proposed ATSIA3C outperforms existing task schedulers by improving makespan by 70.49%. Resource cost is improved by 77.42%. Energy Consumption is improved over compared algorithms 74.24% in multi cloud environment by proposed ATSIA3C.
RESUMEN
BACKGROUND: Due to the sparse encoding character of the human visual cortex and the scarcity of paired training samples for {images, fMRIs}, voxel selection is an effective means of reconstructing perceived images from fMRI. However, the existing data-driven voxel selection methods have not achieved satisfactory results. NEW METHOD: Here, a novel deep reinforcement learning-guided sparse voxel (DRL-SV) decoding model is proposed to reconstruct perceived images from fMRI. We innovatively describe voxel selection as a Markov decision process (MDP), training agents to select voxels that are highly involved in specific visual encoding. RESULTS: Experimental results on two public datasets verify the effectiveness of the proposed DRL-SV, which can accurately select voxels highly involved in neural encoding, thereby improving the quality of visual image reconstruction. COMPARISON WITH EXISTING METHODS: We qualitatively and quantitatively compared our results with the state-of-the-art (SOTA) methods, getting better reconstruction results. We compared the proposed DRL-SV with traditional data-driven baseline methods, obtaining sparser voxel selection results, but better reconstruction performance. CONCLUSIONS: DRL-SV can accurately select voxels involved in visual encoding on few-shot, compared to data-driven voxel selection methods. The proposed decoding model provides a new avenue to improving the image reconstruction quality of the primary visual cortex.