Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 503
Filtrar
Mais filtros

País/Região como assunto
Intervalo de ano de publicação
1.
Brief Bioinform ; 23(3)2022 05 13.
Artigo em Inglês | MEDLINE | ID: mdl-35348602

RESUMO

Proteins with desired functions and properties are important in fields like nanotechnology and biomedicine. De novo protein design enables the production of previously unseen proteins from the ground up and is believed as a key point for handling real social challenges. Recent introduction of deep learning into design methods exhibits a transformative influence and is expected to represent a promising and exciting future direction. In this review, we retrospect the major aspects of current advances in deep-learning-based design procedures and illustrate their novelty in comparison with conventional knowledge-based approaches through noticeable cases. We not only describe deep learning developments in structure-based protein design and direct sequence design, but also highlight recent applications of deep reinforcement learning in protein design. The future perspectives on design goals, challenges and opportunities are also comprehensively discussed.


Assuntos
Aprendizado Profundo , Bases de Conhecimento , Proteínas
2.
Biotechnol Bioeng ; 121(9): 2868-2880, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-38812405

RESUMO

Reinforcement learning (RL), a subset of machine learning (ML), could optimize and control biomanufacturing processes, such as improved production of therapeutic cells. Here, the process of CAR T-cell activation by antigen-presenting beads and their subsequent expansion is formulated in silico. The simulation is used as an environment to train RL-agents to dynamically control the number of beads in culture to maximize the population of robust effector cells at the end of the culture. We make periodic decisions of incremental bead addition or complete removal. The simulation is designed to operate in OpenAI Gym, enabling testing of different environments, cell types, RL-agent algorithms, and state inputs to the RL-agent. RL-agent training is demonstrated with three different algorithms (PPO, A2C, and DQN), each sampling three different state input types (tabular, image, mixed); PPO-tabular performs best for this simulation environment. Using this approach, training of the RL-agent on different cell types is demonstrated, resulting in unique control strategies for each type. Sensitivity to input-noise (sensor performance), number of control step interventions, and advantages of pre-trained RL-agents are also evaluated. Therefore, we present an RL framework to maximize the population of robust effector cells in CAR T-cell therapy production.


Assuntos
Aprendizado de Máquina , Linfócitos T , Linfócitos T/imunologia , Humanos , Simulação por Computador , Ativação Linfocitária , Receptores de Antígenos Quiméricos/imunologia , Imunoterapia Adotiva/métodos , Técnicas de Cultura de Células/métodos
3.
Diabetes Obes Metab ; 26(5): 1555-1566, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38263540

RESUMO

Postprandial glucose control can be challenging for individuals with type 1 diabetes, and this can be attributed to many factors, including suboptimal therapy parameters (carbohydrate ratios, correction factors, basal doses) because of physiological changes, meal macronutrients and engagement in postprandial physical activity. This narrative review aims to examine the current postprandial glucose-management strategies tested in clinical trials, including adjusting therapy settings, bolusing for meal macronutrients, adjusting pre-exercise and postexercise meal boluses for postprandial physical activity, and other therapeutic options, for individuals on open-loop and closed-loop therapies. Then we discuss their challenges and future avenues. Despite advancements in insulin delivery devices such as closed-loop systems and decision-support systems, many individuals with type 1 diabetes still struggle to manage their glucose levels. The main challenge is the lack of personalized recommendations, causing suboptimal postprandial glucose control. We suggest that postprandial glucose control can be improved by (i) providing personalized recommendations for meal macronutrients and postprandial activity; (ii) including behavioural recommendations; (iii) using other personalized therapeutic approaches (e.g. glucagon-like peptide-1 receptor agonists, sodium-glucose co-transporter inhibitors, amylin analogues, inhaled insulin) in addition to insulin therapy; and (iv) integrating an interpretability report to explain to individuals about changes in treatment therapy and behavioural recommendations. In addition, we suggest a future avenue to implement precision recommendations for individuals with type 1 diabetes utilizing the potential of deep reinforcement learning and foundation models (such as GPT and BERT), employing different modalities of data including diabetes-related and external background factors (i.e. behavioural, environmental, biological and abnormal events).


Assuntos
Diabetes Mellitus Tipo 1 , Humanos , Diabetes Mellitus Tipo 1/tratamento farmacológico , Glucose/uso terapêutico , Glicemia , Hipoglicemiantes/uso terapêutico , Inteligência Artificial , Medicina de Precisão , Insulina/uso terapêutico , Período Pós-Prandial
4.
Artif Organs ; 2024 Sep 17.
Artigo em Inglês | MEDLINE | ID: mdl-39289857

RESUMO

BACKGROUND: The improvement of controllers of left ventricular assist device (LVAD) technology supporting heart failure (HF) patients has enormous impact, given the high prevalence and mortality of HF in the population. The use of reinforcement learning for control applications in LVAD remains minimally explored. This work introduces a preload-based deep reinforcement learning control for LVAD based on the proximal policy optimization algorithm. METHODS: The deep reinforcement learning control is built upon data derived from a deterministic high-fidelity cardiorespiratory simulator exposed to variations of total blood volume, heart rate, systemic vascular resistance, pulmonary vascular resistance, right ventricular end-systolic elastance, and left ventricular end-systolic elastance, to replicate realistic inter- and intra-patient variability of patients with a severe HF supported by LVAD. The deep reinforcement learning control obtained in this work is trained to avoid ventricular suction and allow aortic valve opening by using left ventricular pressure signals: end-diastolic pressure, maximum pressure in the left ventricle (LV), and maximum pressure in the aorta. RESULTS: The results show controller obtained in this work, compared to the constant speed LVAD alternative, assures a more stable end-diastolic volume (EDV), with a standard deviation of 5 mL and 9 mL, respectively, and a higher degree of aortic flow, with an average flow of 1.1 L/min and 0.9 L/min, respectively. CONCLUSION: This work implements a deep reinforcement learning controller in a high-fidelity cardiorespiratory simulator, resulting in increases of flow through the aortic valve and increases of EDV stability, when compared to a constant speed LVAD strategy.

5.
Risk Anal ; 2024 Aug 11.
Artigo em Inglês | MEDLINE | ID: mdl-39128862

RESUMO

Urban flooding is among the costliest natural disasters worldwide. Timely and effective rescue path planning is crucial for minimizing loss of life and property. However, current research on path planning often fails to adequately consider the need to assess area risk uncertainties and bypass complex obstacles in flood rescue scenarios, presenting significant challenges for developing optimal rescue paths. This study proposes a deep reinforcement learning (RL) algorithm incorporating four main mechanisms to address these issues. Dual-priority experience replays and backtrack punishment mechanisms enhance the precise estimation of area risks. Concurrently, random noisy networks and dynamic exploration techniques encourage the agent to explore unknown areas in the environment, thereby improving sampling and optimizing strategies for bypassing complex obstacles. The study constructed multiple grid simulation scenarios based on real-world rescue operations in major urban flood disasters. These scenarios included uncertain risk values for all passable areas and an increased presence of complex elements, such as narrow passages, C-shaped barriers, and jagged paths, significantly raising the challenge of path planning. The comparative analysis demonstrated that only the proposed algorithm could bypass all obstacles and plan the optimal rescue path across nine scenarios. This research advances the theoretical progress for urban flood rescue path planning by extending the scale of scenarios to unprecedented levels. It also develops RL mechanisms adaptable to various extremely complex obstacles in path planning. Additionally, it provides methodological insights into artificial intelligence to enhance real-world risk management.

6.
Sensors (Basel) ; 24(18)2024 Sep 18.
Artigo em Inglês | MEDLINE | ID: mdl-39338773

RESUMO

Due to the radial network structures, small cross-sectional lines, and light loads characteristic of existing AC distribution networks in mountainous areas, the development of active distribution networks (ADNs) in these regions has revealed significant issues with integrating distributed generation (DGs) and consuming renewable energy. Focusing on this issue, this paper proposes a wide-range thyristor-controlled series compensation (TCSC)-based ADN and presents a deep reinforcement learning (DRL)-based optimal operation strategy. This strategy takes into account the complementarity of hydropower, photovoltaic (PV) systems, and energy storage systems (ESSs) to enhance the capacity for consuming renewable energy. In the proposed ADN, a wide-range TCSC connects the sub-networks where PV and hydropower systems are located, with ESSs configured for each renewable energy generation. The designed wide-range TCSC allows for power reversal and improves power delivery efficiency, providing conditions for the optimization operation. The optimal operation issue is formulated as a Markov decision process (MDP) with continuous action space and solved using the twin delayed deep deterministic policy gradient (TD3) algorithm. The optimal objective is to maximize the consumption of renewable energy sources (RESs) and minimize line losses by coordinating the charging/discharging of ESSs with the operation mode of the TCSC. The simulation results demonstrate the effectiveness of the proposed method.

7.
Sensors (Basel) ; 24(13)2024 Jun 29.
Artigo em Inglês | MEDLINE | ID: mdl-39001014

RESUMO

The segmented mirror co-phase error identification technique based on supervised learning methods has the advantages of simple application conditions, no dependence on custom sensors, a fast calculation speed, and low computing power requirements compared with other methods. However, it is often difficult to obtain a high accuracy in practical application situations with this method because of the difference between the training model and the actual model. The reinforcement learning algorithm does not need to model the real system when operating the system. However, it still retains the advantages of supervised learning. Thus, in this paper, we placed a mask on the pupil plane of the segmented telescope optical system. Moreover, based on the wide spectrum, point spread function, and modulation transfer function of the optical system and deep reinforcement learning-without modeling the optical system-a large-range and high-precision piston error automatic co-phase method with multiple-submirror parallelization was proposed. Finally, we carried out relevant simulation experiments, and the results indicate that the method is effective.

8.
Sensors (Basel) ; 24(13)2024 Jul 04.
Artigo em Inglês | MEDLINE | ID: mdl-39001116

RESUMO

This study investigates the dynamic deployment of unmanned aerial vehicles (UAVs) using edge computing in a forest fire scenario. We consider the dynamically changing characteristics of forest fires and the corresponding varying resource requirements. Based on this, this paper models a two-timescale UAV dynamic deployment scheme by considering the dynamic changes in the number and position of UAVs. In the slow timescale, we use a gate recurrent unit (GRU) to predict the number of future users and determine the number of UAVs based on the resource requirements. UAVs with low energy are replaced accordingly. In the fast timescale, a deep-reinforcement-learning-based UAV position deployment algorithm is designed to enable the low-latency processing of computational tasks by adjusting the UAV positions in real time to meet the ground devices' computational demands. The simulation results demonstrate that the proposed scheme achieves better prediction accuracy. The number and position of UAVs can be adapted to resource demand changes and reduce task execution delays.

9.
Sensors (Basel) ; 24(16)2024 Aug 06.
Artigo em Inglês | MEDLINE | ID: mdl-39204780

RESUMO

A map of the environment is the basis for the robot's navigation. Multi-robot collaborative autonomous exploration allows for rapidly constructing maps of unknown environments, essential for application areas such as search and rescue missions. Traditional autonomous exploration methods are inefficient due to the repetitive exploration problem. For this reason, we propose a multi-robot autonomous exploration method based on the Transformer model. Our multi-agent deep reinforcement learning method includes a multi-agent learning method to effectively improve exploration efficiency. We conducted experiments comparing our proposed method with existing methods in a simulation environment, and the experimental results showed that our proposed method had a good performance and a specific generalization ability.

10.
Sensors (Basel) ; 24(7)2024 Mar 31.
Artigo em Inglês | MEDLINE | ID: mdl-38610458

RESUMO

With the growing maritime economy, ensuring the quality of communication for maritime users has become imperative. The maritime communication system based on nearshore base stations enhances the communication rate of maritime users through dynamic resource allocation. A virtual queue-based deep reinforcement learning beam allocation scheme is proposed in this paper, aiming to maximize the communication rate. More particularly, to reduce the complexity of resource management, we employ a grid-based method to discretize the maritime environment. For the combinatorial optimization problem of grid and beam allocation under unknown channel state information, we model it as a sequential decision process of resource allocation. The nearshore base station is modeled as a learning agent, continuously interacting with the environment to optimize beam allocation schemes using deep reinforcement learning techniques. Furthermore, we guarantee that grids with poor channel state information can be serviced through the virtual queue method. Finally, the simulation results provided show that our proposed beam allocation scheme is beneficial in terms of increasing the communication rate.

11.
Sensors (Basel) ; 24(18)2024 Sep 20.
Artigo em Inglês | MEDLINE | ID: mdl-39338825

RESUMO

Next-generation mobile networks, such as those beyond the 5th generation (B5G) and 6th generation (6G), have diverse network resource demands. Network slicing (NS) and device-to-device (D2D) communication have emerged as promising solutions for network operators. NS is a candidate technology for this scenario, where a single network infrastructure is divided into multiple (virtual) slices to meet different service requirements. Combining D2D and NS can improve spectrum utilization, providing better performance and scalability. This paper addresses the challenging problem of dynamic resource allocation with wireless network slices and D2D communications using deep reinforcement learning (DRL) techniques. More specifically, we propose an approach named DDPG-KRP based on deep deterministic policy gradient (DDPG) with K-nearest neighbors (KNNs) and reward penalization (RP) for undesirable action elimination to determine the resource allocation policy maximizing long-term rewards. The simulation results show that the DDPG-KRP is an efficient solution for resource allocation in wireless networks with slicing, outperforming other considered DRL algorithms.

12.
Sensors (Basel) ; 24(19)2024 Oct 02.
Artigo em Inglês | MEDLINE | ID: mdl-39409435

RESUMO

This paper investigates the single agile optical satellite scheduling problem, which has received increasing attention due to the rapid growth in earth observation requirements. Owing to the complicated constraints and considerable solution space of this problem, the conventional exact methods and heuristic methods, which are sensitive to the problem scale, demand high computational expenses. Thus, an efficient approach is demanded to solve this problem, and this paper proposes a deep reinforcement learning algorithm with a local attention mechanism. A mathematical model is first established to describe this problem, which considers a series of complex constraints and takes the profit ratio of completed tasks as the optimization objective. Then, a neural network framework with an encoder-decoder structure is adopted to generate high-quality solutions, and a local attention mechanism is designed to improve the generation of solutions. In addition, an adaptive learning rate strategy is proposed to guide the actor-critic training algorithm to dynamically adjust the learning rate in the training process to enhance the training effectiveness of the proposed network. Finally, extensive experiments verify that the proposed algorithm outperforms the comparison algorithms in terms of solution quality, generalization performance, and computation efficiency.

13.
Sensors (Basel) ; 24(13)2024 Jun 26.
Artigo em Inglês | MEDLINE | ID: mdl-39000919

RESUMO

Reinforcement Learning (RL) methods are regarded as effective for designing autonomous driving policies. However, even when RL policies are trained to convergence, ensuring their robust safety remains a challenge, particularly in long-tail data. Therefore, decision-making based on RL must adequately consider potential variations in data distribution. This paper presents a framework for highway autonomous driving decisions that prioritizes both safety and robustness. Utilizing the proposed Replay Buffer Constrained Policy Optimization (RECPO) method, this framework updates RL strategies to maximize rewards while ensuring that the policies always remain within safety constraints. We incorporate importance sampling techniques to collect and store data in a Replay buffer during agent operation, allowing the reutilization of data from old policies for training new policy models, thus mitigating potential catastrophic forgetting. Additionally, we transform the highway autonomous driving decision problem into a Constrained Markov Decision Process (CMDP) and apply our proposed RECPO for training, optimizing highway driving policies. Finally, we deploy our method in the CARLA simulation environment and compare its performance in typical highway scenarios against traditional CPO, current advanced strategies based on Deep Deterministic Policy Gradient (DDPG), and IDM + MOBIL (Intelligent Driver Model and the model for minimizing overall braking induced by lane changes). The results show that our framework significantly enhances model convergence speed, safety, and decision-making stability, achieving a zero-collision rate in highway autonomous driving.

14.
Sensors (Basel) ; 24(14)2024 Jul 18.
Artigo em Inglês | MEDLINE | ID: mdl-39066074

RESUMO

Edge servers frequently manage their own offline digital twin (DT) services, in addition to caching online digital twin services. However, current research often overlooks the impact of offline caching services on memory and computation resources, which can hinder the efficiency of online service task processing on edge servers. In this study, we concentrated on service caching and task offloading within a collaborative edge computing system by emphasizing the integrated quality of service (QoS) for both online and offline edge services. We considered the resource usage of both online and offline services, along with incoming online requests. To maximize the overall QoS utility, we established an optimization objective that rewards the throughput of online services while penalizing offline services that miss their soft deadlines. We formulated this as a utility maximization problem, which was proven to be NP-hard. To tackle this complexity, we reframed the optimization problem as a Markov decision process (MDP) and introduced a joint optimization algorithm for service caching and task offloading by leveraging the deep Q-network (DQN). Comprehensive experiments revealed that our algorithm enhanced the utility by at least 14.01% compared with the baseline algorithms.

15.
Sensors (Basel) ; 24(16)2024 Aug 20.
Artigo em Inglês | MEDLINE | ID: mdl-39205064

RESUMO

This study proposes a method named Hybrid Heuristic Proximal Policy Optimization (HHPPO) to implement online 3D bin-packing tasks. Some heuristic algorithms for bin-packing and the Proximal Policy Optimization (PPO) algorithm of deep reinforcement learning are integrated to implement this method. In the heuristic algorithms for bin-packing, an extreme point priority sorting method is proposed to sort the generated extreme points according to their waste spaces to improve space utilization. In addition, a 3D grid representation of the space status of the container is used, and some partial support constraints are proposed to increase the possibilities for stacking objects and enhance overall space utilization. In the PPO algorithm, some heuristic algorithms are integrated, and the reward function and the action space of the policy network are designed so that the proposed method can effectively complete the online 3D bin-packing task. Some experimental results illustrate that the proposed method has good results in achieving online 3D bin-packing tasks in some simulation environments. In addition, an environment with image vision is constructed to show that the proposed method indeed enables an actual robot manipulator to successfully and effectively complete the bin-packing task in a real environment.

16.
Sensors (Basel) ; 24(4)2024 Feb 17.
Artigo em Inglês | MEDLINE | ID: mdl-38400445

RESUMO

With the advent of IoT, cities will soon be populated by autonomous vehicles and managed by intelligent systems capable of actively interacting with city infrastructures and vehicles. In this work, we propose a model based on reinforcement learning that teaches to autonomous connected vehicles how to save resources while navigating in such an environment. In particular, we focus on budget savings in the context of auction-based intersection management systems. We trained several models with Deep Q-learning by varying traffic conditions to find the most performance-effective variant in terms of the trade-off between saved currency and trip times. Afterward, we compared the performance of our model with previously proposed and random strategies, even under adverse traffic conditions. Our model appears to be robust and manages to save a considerable amount of currency without significantly increasing the waiting time in traffic. For example, the learner bidder saves at least 20% of its budget with heavy traffic conditions and up to 74% in lighter traffic with respect to a standard bidder, and around three times the saving of a random bidder. The results and discussion suggest practical adoption of the proposal in a foreseen future real-life scenario.

17.
Sensors (Basel) ; 24(4)2024 Feb 19.
Artigo em Inglês | MEDLINE | ID: mdl-38400483

RESUMO

Optimizing jamming strategies is crucial for enhancing the performance of cognitive jamming systems in dynamic electromagnetic environments. The emergence of frequency-agile radars, capable of changing the carrier frequency within or between pulses, poses significant challenges for the jammer to make intelligent decisions and adapt to the dynamic environment. This paper focuses on researching intelligent jamming decision-making algorithms for Intra-Pulse Frequency Agile Radar using deep reinforcement learning. Intra-Pulse Frequency Agile Radar achieves frequency agility at the sub-pulse level, creating a significant frequency agility space. This presents challenges for traditional jamming decision-making methods to rapidly learn its changing patterns through interactions. By employing Gated Recurrent Units (GRU) to capture long-term dependencies in sequence data, together with the attention mechanism, this paper proposes a GA-Dueling DQN (GRU-Attention-based Dueling Deep Q Network) method for jamming frequency selection. Simulation results indicate that the proposed method outperforms traditional Q-learning, DQN, and Dueling DQN methods in terms of jamming effectiveness. It exhibits the fastest convergence speed and reduced reliance on prior knowledge, highlighting its significant advantages in jamming the subpulse-level frequency-agile radar.

18.
Sensors (Basel) ; 24(5)2024 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-38475167

RESUMO

The fast development of the sensors in the wireless sensor networks (WSN) brings a big challenge of low energy consumption requirements, and Peer-to-peer (P2P) communication becomes the important way to break this bottleneck. However, the interference caused by different sensors sharing the spectrum and the power limitations seriously constrains the improvement of WSN. Therefore, in this paper, we proposed a deep reinforcement learning-based energy consumption optimization for P2P communication in WSN. Specifically, P2P sensors (PUs) are considered agents to share the spectrum of authorized sensors (AUs). An authorized sensor has permission to access specific data or systems, while a P2P sensor directly communicates with other sensors without needing a central server. One involves permission, the other is direct communication between sensors. Each agent can control the power and select the resources to avoid interference. Moreover, we use a double deep Q network (DDQN) algorithm to help the agent learn more detailed features of the interference. Simulation results show that the proposed algorithm can obtain a higher performance than the deep Q network scheme and the traditional algorithm, which can effectively lower the energy consumption for P2P communication in WSN.

19.
Sensors (Basel) ; 24(8)2024 Apr 09.
Artigo em Inglês | MEDLINE | ID: mdl-38676003

RESUMO

With the emergence of wireless rechargeable sensor networks (WRSNs), the possibility of wirelessly recharging nodes using mobile charging vehicles (MCVs) has become a reality. However, existing approaches overlook the effective integration of node energy replenishment and mobile data collection processes. In this paper, we propose a joint energy replenishment and data collection scheme (D-JERDG) for WRSNs based on deep reinforcement learning. By capitalizing on the high mobility of unmanned aerial vehicles (UAVs), D-JERDG enables continuous visits to the cluster head nodes in each cluster, facilitating data collection and range-based charging. First, D-JERDG utilizes the K-means algorithm to partition the network into multiple clusters, and a cluster head selection algorithm is proposed based on an improved dynamic routing protocol, which elects cluster head nodes based on the remaining energy and geographical location of the cluster member nodes. Afterward, the simulated annealing (SA) algorithm determines the shortest flight path. Subsequently, the DRL model multiobjective deep deterministic policy gradient (MODDPG) is employed to control and optimize the UAV instantaneous heading and speed, effectively planning UAV hover points. By redesigning the reward function, joint optimization of multiple objectives such as node death rate, UAV throughput, and average flight energy consumption is achieved. Extensive simulation results show that the proposed D-JERDG achieves joint optimization of multiple objectives and exhibits significant advantages over the baseline in terms of throughput, time utilization, and charging cost, among other indicators.

20.
Sensors (Basel) ; 24(8)2024 Apr 12.
Artigo em Inglês | MEDLINE | ID: mdl-38676106

RESUMO

In this paper, we consider an integrated sensing, communication, and computation (ISCC) system to alleviate the spectrum congestion and computation burden problem. Specifically, while serving communication users, a base station (BS) actively engages in sensing targets and collaborates seamlessly with the edge server to concurrently process the acquired sensing data for efficient target recognition. A significant challenge in edge computing systems arises from the inherent uncertainty in computations, mainly stemming from the unpredictable complexity of tasks. With this consideration, we address the computation uncertainty by formulating a robust communication and computing resource allocation problem in ISCC systems. The primary goal of the system is to minimize total energy consumption while adhering to perception and delay constraints. This is achieved through the optimization of transmit beamforming, offloading ratio, and computing resource allocation, effectively managing the trade-offs between local execution and edge computing. To overcome this challenge, we employ a Markov decision process (MDP) in conjunction with the proximal policy optimization (PPO) algorithm, establishing an adaptive learning strategy. The proposed algorithm stands out for its rapid training speed, ensuring compliance with latency requirements for perception and computation in applications. Simulation results highlight its robustness and effectiveness within ISCC systems compared to baseline approaches.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA