Pesquisa | BVS Doenças Infecciosas e Parasitárias

1.

Estimation of optimal treatment regimes with electronic medical record data using the residual life value estimator.

Rhodes, Grace; Davidian, Marie; Lu, Wenbin.

Biostatistics ; 2024 Feb 09.

Artigo em Inglês | MEDLINE | ID: mdl-38332633

RESUMO

Clinicians and patients must make treatment decisions at a series of key decision points throughout disease progression. A dynamic treatment regime is a set of sequential decision rules that return treatment decisions based on accumulating patient information, like that commonly found in electronic medical record (EMR) data. When applied to a patient population, an optimal treatment regime leads to the most favorable outcome on average. Identifying optimal treatment regimes that maximize residual life is especially desirable for patients with life-threatening diseases such as sepsis, a complex medical condition that involves severe infections with organ dysfunction. We introduce the residual life value estimator (ReLiVE), an estimator for the expected value of cumulative restricted residual life under a fixed treatment regime. Building on ReLiVE, we present a method for estimating an optimal treatment regime that maximizes expected cumulative restricted residual life. Our proposed method, ReLiVE-Q, conducts estimation via the backward induction algorithm Q-learning. We illustrate the utility of ReLiVE-Q in simulation studies, and we apply ReLiVE-Q to estimate an optimal treatment regime for septic patients in the intensive care unit using EMR data from the Multiparameter Intelligent Monitoring Intensive Care database. Ultimately, we demonstrate that ReLiVE-Q leverages accumulating patient information to estimate personalized treatment regimes that optimize a clinically meaningful function of residual life.

2.

A Q-learning method based on coarse-to-fine potential energy surface for locating transition state and reaction pathway.

Xu, Wenjun; Zhao, Yanling; Chen, Jialu; Wan, Zhongyu; Yan, Dadong; Zhang, Xinghua; Zhang, Ruiqin.

J Comput Chem ; 45(8): 487-497, 2024 Mar 30.

Artigo em Inglês | MEDLINE | ID: mdl-37966714

RESUMO

Transition state (TS) on the potential energy surface (PES) plays a key role in determining the kinetics and thermodynamics of chemical reactions. Inspired by the fact that the dynamics of complex systems are always driven by rare but significant transition events, we herein propose a TS search method in accordance with the Q-learning algorithm. Appropriate reward functions are set for a given PES to optimize the reaction pathway through continuous trial and error, and then the TS can be obtained from the optimized reaction pathway. The validity of this Q-learning method with reasonable settings of Q-value table including actions, states, learning rate, greedy rate, discount rate, and so on, is exemplified in 2 two-dimensional potential functions. In the applications of the Q-learning method to two chemical reactions, it is demonstrated that the Q-learning method can predict consistent TS and reaction pathway with those by ab initio calculations. Notably, the PES must be well prepared before using the Q-learning method, and a coarse-to-fine PES scanning scheme is thus introduced to save the computational time while maintaining the accuracy of the Q-learning prediction. This work offers a simple and reliable Q-learning method to search for all possible TS and reaction pathway of a chemical reaction, which may be a new option for effectively exploring the PES in an extensive search manner.

3.

Integrating randomized and observational studies to estimate optimal dynamic treatment regimes.

Batorsky, Anna; Anstrom, Kevin J; Zeng, Donglin.

Biometrics ; 80(2)2024 Mar 27.

Artigo em Inglês | MEDLINE | ID: mdl-38804219

RESUMO

Sequential multiple assignment randomized trials (SMARTs) are the gold standard for estimating optimal dynamic treatment regimes (DTRs), but are costly and require a large sample size. We introduce the multi-stage augmented Q-learning estimator (MAQE) to improve efficiency of estimation of optimal DTRs by augmenting SMART data with observational data. Our motivating example comes from the Back Pain Consortium, where one of the overarching aims is to learn how to tailor treatments for chronic low back pain to individual patient phenotypes, knowledge which is lacking clinically. The Consortium-wide collaborative SMART and observational studies within the Consortium collect data on the same participant phenotypes, treatments, and outcomes at multiple time points, which can easily be integrated. Previously published single-stage augmentation methods for integration of trial and observational study (OS) data were adapted to estimate optimal DTRs from SMARTs using Q-learning. Simulation studies show the MAQE, which integrates phenotype, treatment, and outcome information from multiple studies over multiple time points, more accurately estimates the optimal DTR, and has a higher average value than a comparable Q-learning estimator without augmentation. We demonstrate this improvement is robust to a wide range of trial and OS sample sizes, addition of noise variables, and effect sizes.

Assuntos

Simulação por Computador , Dor Lombar , Estudos Observacionais como Assunto , Ensaios Clínicos Controlados Aleatórios como Assunto , Humanos , Estudos Observacionais como Assunto/estatística & dados numéricos , Ensaios Clínicos Controlados Aleatórios como Assunto/estatística & dados numéricos , Dor Lombar/terapia , Tamanho da Amostra , Resultado do Tratamento , Modelos Estatísticos , Biometria/métodos

4.

Accommodating misclassification effects on optimizing dynamic treatment regimes with Q-learning.

Khadem Charvadeh, Yasin; Yi, Grace Y.

Stat Med ; 43(3): 578-605, 2024 02 10.

Artigo em Inglês | MEDLINE | ID: mdl-38213277

RESUMO

Research on dynamic treatment regimes has enticed extensive interest. Many methods have been proposed in the literature, which, however, are vulnerable to the presence of misclassification in covariates. In particular, although Q-learning has received considerable attention, its applicability to data with misclassified covariates is unclear. In this article, we investigate how ignoring misclassification in binary covariates can impact the determination of optimal decision rules in randomized treatment settings, and demonstrate its deleterious effects on Q-learning through empirical studies. We present two correction methods to address misclassification effects on Q-learning. Numerical studies reveal that misclassification in covariates induces non-negligible estimation bias and that the correction methods successfully ameliorate bias in parameter estimation.

Assuntos

Regras de Decisão Clínica , Aprendizado de Máquina , Humanos

5.

A sequential, multiple assignment, randomized trial design with a tailoring function.

Hartman, Holly; Schipper, Matthew; Kidwell, Kelley.

Stat Med ; 2024 Jul 08.

Artigo em Inglês | MEDLINE | ID: mdl-38973591

RESUMO

We present a trial design for sequential multiple assignment randomized trials (SMARTs) that use a tailoring function instead of a binary tailoring variable allowing for simultaneous development of the tailoring variable and estimation of dynamic treatment regimens (DTRs). We apply methods for developing DTRs from observational data: tree-based regression learning and Q-learning. We compare this to a balanced randomized SMART with equal re-randomization probabilities and a typical SMART design where re-randomization depends on a binary tailoring variable and DTRs are analyzed with weighted and replicated regression. This project addresses a gap in clinical trial methodology by presenting SMARTs where second stage treatment is based on a continuous outcome removing the need for a binary tailoring variable. We demonstrate that data from a SMART using a tailoring function can be used to efficiently estimate DTRs and is more flexible under varying scenarios than a SMART using a tailoring variable.

6.

Modulation of ventromedial orbitofrontal cortical glutamatergic activity affects the explore-exploit balance and influences value-based decision-making.

Barnes, Samuel A; Dillon, Daniel G; Young, Jared W; Thomas, Michael L; Faget, Lauren; Yoo, Ji Hoon; Der-Avakian, Andre; Hnasko, Thomas S; Geyer, Mark A; Ramanathan, Dhakshin S.

Cereb Cortex ; 33(10): 5783-5796, 2023 05 09.

Artigo em Inglês | MEDLINE | ID: mdl-36472411

RESUMO

The balance between exploration and exploitation is essential for decision-making. The present study investigated the role of ventromedial orbitofrontal cortex (vmOFC) glutamate neurons in mediating value-based decision-making by first using optogenetics to manipulate vmOFC glutamate activity in rats during a probabilistic reversal learning (PRL) task. Rats that received vmOFC activation during informative feedback completed fewer reversals and exhibited reduced reward sensitivity relative to rats. Analysis with a Q-learning computational model revealed that increased vmOFC activity did not affect the learning rate but instead promoted maladaptive exploration. By contrast, vmOFC inhibition increased the number of completed reversals and increased exploitative behavior. In a separate group of animals, calcium activity of vmOFC glutamate neurons was recorded using fiber photometry. Complementing our results above, we found that suppression of vmOFC activity during the latter part of rewarded trials was associated with improved PRL performance, greater win-stay responding and selecting the correct choice on the next trial. These data demonstrate that excessive vmOFC activity during reward feedback disrupted value-based decision-making by increasing the maladaptive exploration of lower-valued options. Our findings support the premise that pharmacological interventions that normalize aberrant vmOFC glutamate activity during reward feedback processing may attenuate deficits in value-based decision-making.

Assuntos

Córtex Pré-Frontal , Recompensa , Ratos , Animais , Córtex Pré-Frontal/fisiologia , Reversão de Aprendizagem/fisiologia , Glutamatos , Tomada de Decisões/fisiologia

7.

Advancing ASD identification with neuroimaging: a novel GARL methodology integrating Deep Q-Learning and generative adversarial networks.

Zhou, Yujing; Jia, Guangbo; Ren, Yingtong; Ren, Yingxin; Xiao, Zhifeng; Wang, Yinmei.

BMC Med Imaging ; 24(1): 186, 2024 Jul 25.

Artigo em Inglês | MEDLINE | ID: mdl-39054419

RESUMO

Autism Spectrum Disorder (ASD) is a neurodevelopmental condition that affects an individual's behavior, speech, and social interaction. Early and accurate diagnosis of ASD is pivotal for successful intervention. The limited availability of large datasets for neuroimaging investigations, however, poses a significant challenge to the timely and precise identification of ASD. To address this problem, we propose a breakthrough approach, GARL, for ASD diagnosis using neuroimaging data. GARL innovatively integrates the power of GANs and Deep Q-Learning to augment limited datasets and enhance diagnostic precision. We utilized the Autistic Brain Imaging Data Exchange (ABIDE) I and II datasets and employed a GAN to expand these datasets, creating a more robust and diversified dataset for analysis. This approach not only captures the underlying sample distribution within ABIDE I and II but also employs deep reinforcement learning for continuous self-improvement, significantly enhancing the capability of the model to generalize and adapt. Our experimental results confirmed that GAN-based data augmentation effectively improved the performance of all prediction models on both datasets, with the combination of InfoGAN and DQN's GARL yielding the most notable improvement.

Assuntos

Transtorno do Espectro Autista , Aprendizado Profundo , Neuroimagem , Humanos , Transtorno do Espectro Autista/diagnóstico por imagem , Neuroimagem/métodos , Criança , Redes Neurais de Computação , Masculino , Encéfalo/diagnóstico por imagem

8.

Intelligent Resource Allocation Scheme Using Reinforcement Learning for Efficient Data Transmission in VANET.

Kim, Jin-Woo; Kim, Jae-Wan; Lee, Jaeho.

Sensors (Basel) ; 24(9)2024 Apr 26.

Artigo em Inglês | MEDLINE | ID: mdl-38732859

RESUMO

Vehicular ad hoc networks (VANETs) use multiple channels to communicate using wireless access in vehicular environment (WAVE) standards to provide a variety of vehicle-related applications. The current IEEE 802.11p WAVE communication channel structure is composed of one control channel (CCH) and several service channels (SCHs). SCHs are used for non-safety data transmission, while the CCH is used for broadcasting beacons, control, and safety. WAVE devices transmit data that alternate between CCHs and SCHs, and each channel is active for a duration called the CCH interval (CCHI) and SCH interval (SCHI), respectively. Currently, both intervals are fixed at 50 ms. However, fixed-length intervals cannot effectively respond to dynamically changing traffic loads. Additionally, when many vehicles are simultaneously using the limited channel resources for data transmission, the network performance significantly degrades due to numerous packet collisions. Herein, we propose an adaptive resource allocation technique for efficient data transmission. The technique dynamically adjusts the SCHI and CCHI to improve network performance. Moreover, to reduce data collisions and optimize the network's backoff distribution, the proposed scheme applies reinforcement learning (RL) to provide an intelligent channel access algorithm. The simulation results demonstrate that the proposed scheme can ensure high throughputs and low transmission delays.

9.

Two-Layer Edge Intelligence for Task Offloading and Computing Capacity Allocation with UAV Assistance in Vehicular Networks.

Bi, Xiaodan; Zhao, Lian.

Sensors (Basel) ; 24(6)2024 Mar 14.

Artigo em Inglês | MEDLINE | ID: mdl-38544128

RESUMO

With the exponential growth of wireless devices and the demand for real-time processing, traditional server architectures face challenges in meeting the ever-increasing computational requirements. This paper proposes a collaborative edge computing framework to offload and process tasks efficiently in such environments. By equipping a moving unmanned aerial vehicle (UAV) as the mobile edge computing (MEC) server, the proposed architecture aims to release the burden on roadside units (RSUs) servers. Specifically, we propose a two-layer edge intelligence scheme to allocate network computing resources. The first layer intelligently offloads and allocates tasks generated by wireless devices in the vehicular system, and the second layer utilizes the partially observable stochastic game (POSG), solved by duelling deep Q-learning, to allocate the computing resources of each processing node (PN) to different tasks. Meanwhile, we propose a weighted position optimization algorithm for the UAV movement in the system to facilitate task offloading and task processing. Simulation results demonstrate the improved performance by applying the proposed scheme.

10.

Optimizing Human-Robot Teaming Performance through Q-Learning-Based Task Load Adjustment and Physiological Data Analysis.

Korivand, Soroush; Galvani, Gustavo; Ajoudani, Arash; Gong, Jiaqi; Jalili, Nader.

Sensors (Basel) ; 24(9)2024 Apr 28.

Artigo em Inglês | MEDLINE | ID: mdl-38732923

RESUMO

The transition to Industry 4.0 and 5.0 underscores the need for integrating humans into manufacturing processes, shifting the focus towards customization and personalization rather than traditional mass production. However, human performance during task execution may vary. To ensure high human-robot teaming (HRT) performance, it is crucial to predict performance without negatively affecting task execution. Therefore, to predict performance indirectly, significant factors affecting human performance, such as engagement and task load (i.e., amount of cognitive, physical, and/or sensory resources required to perform a particular task), must be considered. Hence, we propose a framework to predict and maximize the HRT performance. For the prediction of task performance during the development phase, our methodology employs features extracted from physiological data as inputs. The labels for these predictions-categorized as accurate performance or inaccurate performance due to high/low task load-are meticulously crafted using a combination of the NASA TLX questionnaire, records of human performance in quality control tasks, and the application of Q-Learning to derive task-specific weights for the task load indices. This structured approach enables the deployment of our model to exclusively rely on physiological data for predicting performance, thereby achieving an accuracy rate of 95.45% in forecasting HRT performance. To maintain optimized HRT performance, this study further introduces a method of dynamically adjusting the robot's speed in the case of low performance. This strategic adjustment is designed to effectively balance the task load, thereby enhancing the efficiency of human-robot collaboration.

Assuntos

Robótica , Análise e Desempenho de Tarefas , Humanos , Robótica/métodos , Feminino , Masculino , Análise de Dados , Sistemas Homem-Máquina , Adulto , Carga de Trabalho

11.

Priority-Aware Actuation Update Scheme in Heterogeneous Industrial Networks.

Kyung, Yeunwoong; Sung, Jihoon; Ko, Haneul; Song, Taewon; Kim, Youngjun.

Sensors (Basel) ; 24(2)2024 Jan 07.

Artigo em Inglês | MEDLINE | ID: mdl-38257450

RESUMO

In heterogeneous wireless networked control systems (WNCSs), the age of information (AoI) of the actuation update and actuation update cost are important performance metrics. To reduce the monetary cost, the control system can wait for the availability of a WiFi network for the actuator and then conduct the update using a WiFi network in an opportunistic manner, but this leads to an increased AoI of the actuation update. In addition, since there are different AoI requirements according to the control priorities (i.e., robustness of AoI of the actuation update), these need to be considered when delivering the actuation update. To jointly consider the monetary cost and AoI with priority, this paper proposes a priority-aware actuation update scheme (PAUS) where the control system decides whether to deliver or delay the actuation update to the actuator. For the optimal decision, we formulate a Markov decision process model and derive the optimal policy based on Q-learning, which aims to maximize the average reward that implies the balance between the monetary cost and AoI with priority. Simulation results demonstrate that the PAUS outperforms the comparison schemes in terms of the average reward under various settings.

12.

OAS Deep Q-Learning-Based Fast and Smooth Control Method for Traffic Signal Transition in Urban Arterial Tidal Lanes.

Dong, Luxi; Xie, Xiaolan; Lu, Jiali; Feng, Liangyuan; Zhang, Lieping.

Sensors (Basel) ; 24(6)2024 Mar 13.

Artigo em Inglês | MEDLINE | ID: mdl-38544109

RESUMO

To address traffic flow fluctuations caused by changes in traffic signal control schemes on tidal lanes and maintain smooth traffic operations, this paper proposes a method for controlling traffic signal transitions on tidal lanes. Firstly, the proposed method includes designing an intersection overlap phase scheme based on the traffic flow conflict matrix in the tidal lane scenario and a fast and smooth transition method for key intersections based on the flow ratio. The aim of the control is to equalize average queue lengths and minimize average vehicle delays for different flow directions at the intersection. This study also analyses various tidal lane scenarios based on the different opening states of the tidal lanes at related intersections. The transitions of phase offsets are emphasized after a comprehensive analysis of transition time and smoothing characteristics. In addition, this paper proposes a coordinated method for tidal lanes to optimize the phase offset at arterial intersections for smooth and rapid transitions. The method uses Deep Q-Learning, a reinforcement learning algorithm for optimal action selection (OSA), to develop an adaptive traffic signal transition control and enhance its efficiency. Finally, a simulation experiment using a traffic control interface is presented to validate the proposed approach. This study shows that this method leads to smoother and faster traffic signal transitions across different tidal lane scenarios compared to the conventional method. Implementing this solution can benefit intersection groups by reducing traffic delays, improving traffic efficiency, and decreasing air pollution caused by congestion.

13.

More Than 30â¯000-fold Field Enhancement of Terahertz Nanoresonators Enabled by Rapid Inverse Design.

Lee, Hyoung-Taek; Kim, Jeonghoon; Lee, Joon Sue; Yoon, Mina; Park, Hyeong-Ryeol.

Nano Lett ; 23(24): 11685-11692, 2023 Dec 27.

Artigo em Inglês | MEDLINE | ID: mdl-38060838

RESUMO

The rapid development of 6G communications using terahertz (THz) electromagnetic waves has created a demand for highly sensitive THz nanoresonators capable of detecting these waves. Among the potential candidates, THz nanogap loop arrays show promising characteristics but require significant computational resources for accurate simulation. This requirement arises because their unit cells are 10 times smaller than millimeter wavelengths, with nanogap regions that are 1â¯000â¯000 times smaller. To address this challenge, we propose a rapid inverse design method using physics-informed machine learning, employing double deep Q-learning with an analytical model of the THz nanogap loop array. In â¼39 h on a middle-level personal computer, our approach identifies the optimal structure through 200â¯000 iterations, achieving an experimental electric field enhancement of 32â¯000 at 0.2 THz, 300% stronger than prior results. Our analytical model-based approach significantly reduces the amount of computational resources required, offering a practical alternative to numerical simulation-based inverse design for THz nanodevices.

14.

Prefrontal transcranial magnetic stimulation boosts response vigour during reinforcement learning in healthy adults.

Biernacki, Kathryn; Myers, Catherine E; Cole, Sally; Cavanagh, James F; Baker, Travis E.

Eur J Neurosci ; 57(4): 680-691, 2023 02.

Artigo em Inglês | MEDLINE | ID: mdl-36550631

RESUMO

A 10-Hz repetitive transcranial magnetic stimulation to the left dorsal lateral prefrontal cortex has been shown to increase dopaminergic activity in the dorsal striatum, a region strongly implicated in reinforcement learning. However, the behavioural influence of this effect remains largely unknown. We tested the causal effects of 10-Hz stimulation on behavioural and computational characteristics of reinforcement learning. A total of 40 healthy individuals were randomized into active and sham (placebo) stimulation groups. Each participant underwent one stimulation session (1500 pulses) in which stimulation was applied over the left dorsal lateral prefrontal cortex using a robotic arm. Participants then completed a reinforcement learning task sensitive to striatal dopamine functioning. Participants' choices were modelled using a reinforcement learning model (Q-learning) that calculates separate learning rates associated with positive and negative reward prediction errors. Subjects receiving active stimulation exhibited increased reward rate (number of correct responses per second of task activity) compared with those in sham. Computationally, although no group differences were observed, the active group displayed a higher learning rate for correct trials (αG) compared with incorrect trials (αL). Finally, when tested with novel pairs of stimuli, the active group displayed extremely fast reaction times, and a trend towards a higher reward rate. This study provided specific behavioural and computational accounts of altered striatal-mediated behaviour, particularly response vigour, induced by a proposed increase of dopamine activity by 10-Hz stimulation to the left dorsal lateral prefrontal cortex. Together, these findings bolster the use of repetitive transcranial magnetic stimulation to target neurocognitive disturbances attributed to the dysregulation of dopaminergic-striatal circuits.

Assuntos

Dopamina , Estimulação Magnética Transcraniana , Humanos , Adulto , Dopamina/farmacologia , Reforço Psicológico , Aprendizagem/fisiologia , Recompensa , Córtex Pré-Frontal/fisiologia

15.

Simplified Deep Reinforcement Learning Approach for Channel Prediction in Power Domain NOMA System.

Gaballa, Mohamed; Abbod, Maysam.

Sensors (Basel) ; 23(21)2023 Nov 06.

Artigo em Inglês | MEDLINE | ID: mdl-37960708

RESUMO

In this work, the impact of implementing Deep Reinforcement Learning (DRL) in predicting the channel parameters for user devices in a Power Domain Non-Orthogonal Multiple Access system (PD-NOMA) is investigated. In the channel prediction process, DRL based on deep Q networks (DQN) algorithm will be developed and incorporated into the NOMA system so that this developed DQN model can be employed to estimate the channel coefficients for each user device in NOMA system. The developed DQN scheme will be structured as a simplified approach to efficiently predict the channel parameters for each user in order to maximize the downlink sum rates for all users in the system. In order to approximate the channel parameters for each user device, this proposed DQN approach is first initialized using random channel statistics, and then the proposed DQN model will be dynamically updated based on the interaction with the environment. The predicted channel parameters will be utilized at the receiver side to recover the desired data. Furthermore, this work inspects how the channel estimation process based on the simplified DQN algorithm and the power allocation policy, can both be integrated for the purpose of multiuser detection in the examined NOMA system. Simulation results, based on several performance metrics, have demonstrated that the proposed simplified DQN algorithm can be a competitive algorithm for channel parameters estimation when compared to different benchmark schemes for channel estimation processes such as deep neural network (DNN) based long-short term memory (LSTM), RL based Q algorithm, and channel estimation scheme based on minimum mean square error (MMSE) procedure.

16.

A Study on the Impact of Integrating Reinforcement Learning for Channel Prediction and Power Allocation Scheme in MISO-NOMA System.

Gaballa, Mohamed; Abbod, Maysam; Aldallal, Ammar.

Sensors (Basel) ; 23(3)2023 Jan 26.

Artigo em Inglês | MEDLINE | ID: mdl-36772422

RESUMO

In this study, the influence of adopting Reinforcement Learning (RL) to predict the channel parameters for user devices in a Power Domain Multi-Input Single-Output Non-Orthogonal Multiple Access (MISO-NOMA) system is inspected. In the channel prediction-based RL approach, the Q-learning algorithm is developed and incorporated into the NOMA system so that the developed Q-model can be employed to predict the channel coefficients for every user device. The purpose of adopting the developed Q-learning procedure is to maximize the received downlink sum-rate and decrease the estimation loss. To satisfy this aim, the developed Q-algorithm is initialized using different channel statistics and then the algorithm is updated based on the interaction with the environment in order to approximate the channel coefficients for each device. The predicted parameters are utilized at the receiver side to recover the desired data. Furthermore, based on maximizing the sum-rate of the examined user devices, the power factors for each user can be deduced analytically to allocate the optimal power factor for every user device in the system. In addition, this work inspects how the channel prediction based on the developed Q-learning model, and the power allocation policy, can both be incorporated for the purpose of multiuser recognition in the examined MISO-NOMA system. Simulation results, based on several performance metrics, have demonstrated that the developed Q-learning algorithm can be a competitive algorithm for channel estimation when compared to different benchmark schemes such as deep learning-based long short-term memory (LSTM), RL based actor-critic algorithm, RL based state-action-reward-state-action (SARSA) algorithm, and standard channel estimation scheme based on minimum mean square error procedure.

17.

Manipulating XXY Planar Platform Positioning Accuracy by Computer Vision Based on Reinforcement Learning.

Huang, Yi-Cheng; Chan, Yung-Chun.

Sensors (Basel) ; 23(6)2023 Mar 10.

Artigo em Inglês | MEDLINE | ID: mdl-36991742

RESUMO

With the rise of Industry 4.0 and artificial intelligence, the demand for industrial automation and precise control has increased. Machine learning can reduce the cost of machine parameter tuning and improve high-precision positioning motion. In this study, a visual image recognition system was used to observe the displacement of an XXY planar platform. Ball-screw clearance, backlash, nonlinear frictional force, and other factors affect the accuracy and reproducibility of positioning. Therefore, the actual positioning error was determined by inputting images captured by a charge-coupled device camera into a reinforcement Q-learning algorithm. Time-differential learning and accumulated rewards were used to perform Q-value iteration to enable optimal platform positioning. A deep Q-network model was constructed and trained through reinforcement learning for effectively estimating the XXY platform's positioning error and predicting the command compensation according to the error history. The constructed model was validated through simulations. The adopted methodology can be extended to other control applications based on the interaction between feedback measurement and artificial intelligence.

18.

Optimizing Forecasted Activity Notifications with Reinforcement Learning.

Fikry, Muhammad; Inoue, Sozo.

Sensors (Basel) ; 23(14)2023 Jul 19.

Artigo em Inglês | MEDLINE | ID: mdl-37514804

RESUMO

In this paper, we propose the notification optimization method by providing multiple alternative times as a reminder for a forecasted activity with and without probabilistic considerations for the activity that needs to be completed and needs notification. It is important to consider various factors when sending notifications to people after obtaining the results of the forecasted activity. We should not send notifications only when we have forecasted results because future daily activities are unpredictable. Therefore, it is important to strike a balance between providing useful reminders and avoiding excessive interruptions, especially for low probabilities of forecasted activity. Our study investigates the impact of the low probability of forecasted activity and optimizes the notification time with reinforcement learning. We also show the gaps between forecasted activities that are useful for self-improvement by people for the balance of important tasks, such as tasks completed as planned and additional tasks to be completed. For evaluation, we utilize two datasets: the existing dataset and data we collected in the field with the technology we have developed. In the data collection, we have 23 activities from six participants. To evaluate the effectiveness of these approaches, we assess the percentage of positive responses, user response rate, and response duration as performance criteria. Our proposed method provides a more effective way to optimize notifications. By incorporating the probability level of activity that needs to be done and needs notification into the state, we achieve a better response rate than the baseline, with the advantage of reaching 27.15%, as well as than the other criteria, which are also improved by using probability.

19.

Biologically Inspired Complete Coverage Path Planning Algorithm Based on Q-Learning.

Tan, Xiangquan; Han, Linhui; Gong, Hao; Wu, Qingwen.

Sensors (Basel) ; 23(10)2023 May 11.

Artigo em Inglês | MEDLINE | ID: mdl-37430561

RESUMO

Complete coverage path planning requires that the mobile robot traverse all reachable positions in the environmental map. Aiming at the problems of local optimal path and high path coverage ratio in the complete coverage path planning of the traditional biologically inspired neural network algorithm, a complete coverage path planning algorithm based on Q-learning is proposed. The global environment information is introduced by the reinforcement learning method in the proposed algorithm. In addition, the Q-learning method is used for path planning at the positions where the accessible path points are changed, which optimizes the path planning strategy of the original algorithm near these obstacles. Simulation results show that the algorithm can automatically generate an orderly path in the environmental map, and achieve 100% coverage with a lower path repetition ratio.

20.

Adaptive Trust Threshold Model Based on Reinforcement Learning in Cooperative Spectrum Sensing.

Xie, Gang; Zhou, Xincheng; Gao, Jinchun.

Sensors (Basel) ; 23(10)2023 May 14.

Artigo em Inglês | MEDLINE | ID: mdl-37430665

RESUMO

In cognitive radio systems, cooperative spectrum sensing (CSS) can effectively improve the sensing performance of the system. At the same time, it also provides opportunities for malicious users (MUs) to launch spectrum-sensing data falsification (SSDF) attacks. This paper proposes an adaptive trust threshold model based on a reinforcement learning (ATTR) algorithm for ordinary SSDF attacks and intelligent SSDF attacks. By learning the attack strategies of different malicious users, different trust thresholds are set for honest and malicious users collaborating within a network. The simulation results show that our ATTR algorithm can filter out a set of trusted users, eliminate the influence of malicious users, and improve the detection performance of the system.

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA