Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 202
Filtrar
1.
Sci Rep ; 14(1): 22885, 2024 Oct 02.
Artigo em Inglês | MEDLINE | ID: mdl-39358373

RESUMO

Predicting rock tunnel squeezing in underground projects is challenging due to its intricate and unpredictable nature. This study proposes an innovative approach to enhance the accuracy and reliability of tunnel squeezing prediction. The proposed method combines ensemble learning techniques with Q-learning and online Markov chain integration. A deep learning model is trained on a comprehensive database comprising tunnel parameters including diameter (D), burial depth (H), support stiffness (K), and tunneling quality index (Q). Multiple deep learning models are trained concurrently, leveraging ensemble learning to capture diverse patterns and improve prediction performance. Integration of the Q-learning-Online Markov Chain further refines predictions. The online Markov chain analyzes historical sequences of tunnel parameters and squeezing class transitions, establishing transition probabilities between different squeezing classes. The Q-learning algorithm optimizes decision-making by learning the optimal policy for transitioning between tunnel states. The proposed model is evaluated using a dataset from various tunnel construction projects, assessing performance through metrics like accuracy, precision, recall, and F1-score. Results demonstrate the efficiency of the ensemble deep learning model combined with Q-learning-Online Markov Chain in predicting surrounding rock tunnel squeezing. This approach offers insights into parameter interrelationships and dynamic squeezing characteristics, enabling proactive planning and support measures implementation to mitigate tunnel squeezing hazards and ensure underground structure safety. Experimental results show the model achieves a prediction accuracy of 98.11%, surpassing individual CNN and RNN models, with an AUC value of 0.98.

2.
Sensors (Basel) ; 24(18)2024 Sep 22.
Artigo em Inglês | MEDLINE | ID: mdl-39338867

RESUMO

With the rapid development of mobile edge computing (MEC) and wireless power transfer (WPT) technologies, the MEC-WPT system makes it possible to provide high-quality data processing services for end users. However, in a real-world WPT-MEC system, the channel gain decreases with the transmission distance, leading to "double near and far effect" in the joint transmission of wireless energy and data, which affects the quality of the data processing service for end users. Consequently, it is essential to design a reasonable system model to overcome the "double near and far effect" and make reasonable scheduling of multi-dimensional resources such as energy, communication and computing to guarantee high-quality data processing services. First, this paper designs a relay collaboration WPT-MEC resource scheduling model to improve wireless energy utilization efficiency. The optimization goal is to minimize the normalization of the total communication delay and total energy consumption while meeting multiple resource constraints. Second, this paper imports a BK-means algorithm to complete the end terminals cluster to guarantee effective energy reception and adapts the whale optimization algorithm with adaptive mechanism (AWOA) for mobile vehicle path-planning to reduce energy waste. Third, this paper proposes an immune differential enhanced deep deterministic policy gradient (IDDPG) algorithm to realize efficient resource scheduling of multiple resources and minimize the optimization goal. Finally, simulation experiments are carried out on different data, and the simulation results prove the validity of the designed scheduling model and proposed IDDPG.

3.
Sci Rep ; 14(1): 21406, 2024 Sep 13.
Artigo em Inglês | MEDLINE | ID: mdl-39271735

RESUMO

Non-orthogonal Multiple Access (NOMA) techniques offer potential enhancements in spectral efficiency for 5G and 6G wireless networks, facilitating broader network access. Central to realizing optimal system performance are factors like joint power control, user grouping, and decoding order. This study investigates power control and user grouping to optimize spectral efficiency in NOMA uplink systems, aiming to reduce computational difficulty. While previous research on this integrated optimization has identified several near-optimal solutions, they often come with considerable system and computational overheads. To address this, this study employed an improved Grey Wolf Optimizer (GWO), a nature-inspired metaheuristic optimization method. Although GWO is effective, it can sometimes converge prematurely and might lack diversity. To enhance its performance, this study introduces a new version of GWO, integrating Competitive Learning, Q-learning, and Greedy Selection. Competitive learning adopts agent competition, balancing exploration and exploitation and preserving diversity. Q-learning guides the search based on past experiences, enhancing adaptability and preventing redundant exploration of sub-optimal regions. Greedy selection ensures the retention of the best solutions after each iteration. The synergistic integration of these three components substantially enhances the performance of the standard GWO. This algorithm was used to manage power and user-grouping in NOMA systems, aiming to strengthen system performance while restricting computational demands. The effectiveness of the proposed algorithm was validated through numerical evaluations. Simulated outcomes revealed that when applied to the joint challenge in NOMA uplink systems, it surpasses the spectral efficiency of conventional orthogonal multiple access. Moreover, the proposed approach demonstrated superior performance compared to the standard GWO and other state-of-the-art algorithms, achieving reduced system complexity under identical constraints.

4.
Biomed Phys Eng Express ; 10(6)2024 Sep 03.
Artigo em Inglês | MEDLINE | ID: mdl-39178885

RESUMO

This work proposes a novel technique called Enhanced JAYA (EJAYA) assisted Q-Learning for the classification of pulmonary diseases, such as pneumonia and tuberculosis (TB) sub-classes using chest x-ray images. The work introduces Fuzzy lattices formation to handle real time (non-linear and non-stationary) data based feature extraction using Schrödinger equation. Features based adaptive classification is made possible through the Q-learning algorithm wherein optimal Q-values selection is done via EJAYA optimization algorithm. Fuzzy lattice is formed using x-ray image pixels and lattice Kinetic Energy (K.E.) is calculated using the Schrödinger equation. Feature vector lattices having highest K.E. have been used as an input features for the classifier. The classifier has been employed for pneumonia classification (normal, mild and severe) and Tuberculosis detection (presence or absence). A total of 3000 images have been used for pneumonia classification yielding an accuracy, sensitivity, specificity, precision and F-scores of 97.90%, 98.43%, 97.25%, 97.78% and 98.10%, respectively. For Tuberculosis 600 samples have been used. The achived accuracy, sensitivity, specificity, precision and F-score are 95.50%, 96.39%, 94.40% 95.52% and 95.95%, respectively. Computational time are 40.96 and 39.98 s for pneumonia and TB classification. Classifier learning rate (training accuracy) for pneumonia classes (normal, mild and severe) are 97.907%, 95.375% and 96.391%, respectively and for tuberculosis (present and absent) are 96.928% and 95.905%, respectively. The results have been compared with contemporary classification techniques which shows superiority of the proposed approach in terms of accuracy and speed of classification. The technique could serve as a fast and accurate tool for automated pneumonia and tuberculosis classification.


Assuntos
Algoritmos , Lógica Fuzzy , Pneumonia , Humanos , Pneumonia/diagnóstico por imagem , Pneumonia/classificação , Aprendizado de Máquina , Pneumopatias/diagnóstico por imagem , Pneumopatias/classificação , Sensibilidade e Especificidade , Tuberculose/diagnóstico , Tuberculose/diagnóstico por imagem , Reprodutibilidade dos Testes , Processamento de Imagem Assistida por Computador/métodos
5.
Sensors (Basel) ; 24(15)2024 Jul 27.
Artigo em Inglês | MEDLINE | ID: mdl-39123927

RESUMO

The transmission environment of underwater wireless sensor networks is open, and important transmission data can be easily intercepted, interfered with, and tampered with by malicious nodes. Malicious nodes can be mixed in the network and are difficult to distinguish, especially in time-varying underwater environments. To address this issue, this article proposes a GAN-based trusted routing algorithm (GTR). GTR defines the trust feature attributes and trust evaluation matrix of underwater network nodes, constructs the trust evaluation model based on a generative adversarial network (GAN), and achieves malicious node detection by establishing a trust feature profile of a trusted node, which improves the detection performance for malicious nodes in underwater networks under unlabeled and imbalanced training data conditions. GTR combines the trust evaluation algorithm with the adaptive routing algorithm based on Q-Learning to provide an optimal trusted data forwarding route for underwater network applications, improving the security, reliability, and efficiency of data forwarding in underwater networks. GTR relies on the trust feature profile of trusted nodes to distinguish malicious nodes and can adaptively select the forwarding route based on the status of trusted candidate next-hop nodes, which enables GTR to better cope with the changing underwater transmission environment and more accurately detect malicious nodes, especially unknown malicious node intrusions, compared to baseline algorithms. Simulation experiments showed that, compared to baseline algorithms, GTR can provide a better malicious node detection performance and data forwarding performance. Under the condition of 15% malicious nodes and 10% unknown malicious nodes mixed in, the detection rate of malicious nodes by the underwater network configured with GTR increased by 5.4%, the error detection rate decreased by 36.4%, the packet delivery rate increased by 11.0%, the energy tax decreased by 11.4%, and the network throughput increased by 20.4%.

6.
Neural Netw ; 180: 106667, 2024 Aug 26.
Artigo em Inglês | MEDLINE | ID: mdl-39216294

RESUMO

This paper addresses the tracking control problem of nonlinear discrete-time multi-agent systems (MASs). First, a local neighborhood error system (LNES) is constructed. Then, a novel tracking algorithm based on asynchronous iterative Q-learning (AIQL) is developed, which can transform the tracking problem into the optimal regulation of LNES. The AIQL-based algorithm has two Q values QiA and QiB for each agent i, where QiA is used for improving the control policy and QiB is used for evaluating the value of the control policy. Moreover, the convergence of LNES is given. It is shown that the LNES converges to 0 and the tracking problem is solved. A neural network-based actor-critic framework is used to implement AIQL. The critic network of AIQL is composed of two neural networks, which are used for approximating QiA and QiB respectively. Finally, simulation results are given to verify the performance of the developed algorithm. It is shown that the AIQL-based tracking algorithm has a lower cost value and faster convergence speed than the IQL-based tracking algorithm.

7.
Sensors (Basel) ; 24(16)2024 Aug 14.
Artigo em Inglês | MEDLINE | ID: mdl-39204967

RESUMO

Task scheduling is a critical challenge in cloud computing systems, greatly impacting their performance. Task scheduling is a nondeterministic polynomial time hard (NP-Hard) problem that complicates the search for nearly optimal solutions. Five major uncertainty parameters, i.e., security, traffic, workload, availability, and price, influence task scheduling decisions. The primary rationale for selecting these uncertainty parameters lies in the challenge of accurately measuring their values, as empirical estimations often diverge from the actual values. The integral-valued Pythagorean fuzzy set (IVPFS) is a promising mathematical framework to deal with parametric uncertainties. The Dyna Q+ algorithm is the updated form of the Dyna Q agent designed specifically for dynamic computing environments by providing bonus rewards to non-exploited states. In this paper, the Dyna Q+ agent is enriched with the IVPFS mathematical framework to make intelligent task scheduling decisions. The performance of the proposed IVPFS Dyna Q+ task scheduler is tested using the CloudSim 3.3 simulator. The execution time is reduced by 90%, the makespan time is also reduced by 90%, the operation cost is below 50%, and the resource utilization rate is improved by 95%, all of these parameters meeting the desired standards or expectations. The results are also further validated using an expected value analysis methodology that confirms the good performance of the task scheduler. A better balance between exploration and exploitation through rigorous action-based learning is achieved by the Dyna Q+ agent.

8.
Neurobiol Aging ; 142: 8-16, 2024 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-39029360

RESUMO

This study explores the impact of aging on reinforcement learning in mice, focusing on changes in learning rates and behavioral strategies. A 5-armed bandit task (5-ABT) and a computational Q-learning model were used to evaluate the positive and negative learning rates and the inverse temperature across three age groups (3, 12, and 18 months). Results showed a significant decline in the negative learning rate of 18-month-old mice, which was not observed for the positive learning rate. This suggests that older mice maintain the ability to learn from successful experiences while decreasing the ability to learn from negative outcomes. We also observed a significant age-dependent variation in inverse temperature, reflecting a shift in action selection policy. Middle-aged mice (12 months) exhibited higher inverse temperature, indicating a higher reliance on previous rewarding experiences and reduced exploratory behaviors, when compared to both younger and older mice. This study provides new insights into aging research by demonstrating that there are age-related differences in specific components of reinforcement learning, which exhibit a non-linear pattern.


Assuntos
Envelhecimento , Animais , Envelhecimento/psicologia , Envelhecimento/fisiologia , Masculino , Camundongos Endogâmicos C57BL , Reforço Psicológico , Comportamento Animal , Aprendizagem por Probabilidade , Camundongos , Comportamento Exploratório/fisiologia
9.
Stat Med ; 43(21): 4055-4072, 2024 Sep 20.
Artigo em Inglês | MEDLINE | ID: mdl-38973591

RESUMO

We present a trial design for sequential multiple assignment randomized trials (SMARTs) that use a tailoring function instead of a binary tailoring variable allowing for simultaneous development of the tailoring variable and estimation of dynamic treatment regimens (DTRs). We apply methods for developing DTRs from observational data: tree-based regression learning and Q-learning. We compare this to a balanced randomized SMART with equal re-randomization probabilities and a typical SMART design where re-randomization depends on a binary tailoring variable and DTRs are analyzed with weighted and replicated regression. This project addresses a gap in clinical trial methodology by presenting SMARTs where second stage treatment is based on a continuous outcome removing the need for a binary tailoring variable. We demonstrate that data from a SMART using a tailoring function can be used to efficiently estimate DTRs and is more flexible under varying scenarios than a SMART using a tailoring variable.


Assuntos
Ensaios Clínicos Controlados Aleatórios como Assunto , Humanos , Ensaios Clínicos Controlados Aleatórios como Assunto/métodos , Projetos de Pesquisa , Modelos Estatísticos , Análise de Regressão , Simulação por Computador
10.
BMC Med Imaging ; 24(1): 186, 2024 Jul 25.
Artigo em Inglês | MEDLINE | ID: mdl-39054419

RESUMO

Autism Spectrum Disorder (ASD) is a neurodevelopmental condition that affects an individual's behavior, speech, and social interaction. Early and accurate diagnosis of ASD is pivotal for successful intervention. The limited availability of large datasets for neuroimaging investigations, however, poses a significant challenge to the timely and precise identification of ASD. To address this problem, we propose a breakthrough approach, GARL, for ASD diagnosis using neuroimaging data. GARL innovatively integrates the power of GANs and Deep Q-Learning to augment limited datasets and enhance diagnostic precision. We utilized the Autistic Brain Imaging Data Exchange (ABIDE) I and II datasets and employed a GAN to expand these datasets, creating a more robust and diversified dataset for analysis. This approach not only captures the underlying sample distribution within ABIDE I and II but also employs deep reinforcement learning for continuous self-improvement, significantly enhancing the capability of the model to generalize and adapt. Our experimental results confirmed that GAN-based data augmentation effectively improved the performance of all prediction models on both datasets, with the combination of InfoGAN and DQN's GARL yielding the most notable improvement.


Assuntos
Transtorno do Espectro Autista , Aprendizado Profundo , Neuroimagem , Humanos , Transtorno do Espectro Autista/diagnóstico por imagem , Neuroimagem/métodos , Criança , Redes Neurais de Computação , Masculino , Encéfalo/diagnóstico por imagem
11.
PeerJ Comput Sci ; 10: e2034, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38855215

RESUMO

Student dropout prediction (SDP) in educational research has gained prominence for its role in analyzing student learning behaviors through time series models. Traditional methods often focus singularly on either prediction accuracy or earliness, leading to sub-optimal interventions for at-risk students. This issue underlines the necessity for methods that effectively manage the trade-off between accuracy and earliness. Recognizing the limitations of existing methods, this study introduces a novel approach leveraging multi-objective reinforcement learning (MORL) to optimize the trade-off between prediction accuracy and earliness in SDP tasks. By framing SDP as a partial sequence classification problem, we model it through a multiple-objective Markov decision process (MOMDP), incorporating a vectorized reward function that maintains the distinctiveness of each objective, thereby preventing information loss and enabling more nuanced optimization strategies. Furthermore, we introduce an advanced envelope Q-learning technique to foster a comprehensive exploration of the solution space, aiming to identify Pareto-optimal strategies that accommodate a broader spectrum of preferences. The efficacy of our model has been rigorously validated through comprehensive evaluations on real-world MOOC datasets. These evaluations have demonstrated our model's superiority, outperforming existing methods in achieving optimal trade-off between accuracy and earliness, thus marking a significant advancement in the field of SDP.

12.
Biomimetics (Basel) ; 9(6)2024 May 21.
Artigo em Inglês | MEDLINE | ID: mdl-38921187

RESUMO

In the complex and dynamic landscape of cyber threats, organizations require sophisticated strategies for managing Cybersecurity Operations Centers and deploying Security Information and Event Management systems. Our study enhances these strategies by integrating the precision of well-known biomimetic optimization algorithms-namely Particle Swarm Optimization, the Bat Algorithm, the Gray Wolf Optimizer, and the Orca Predator Algorithm-with the adaptability of Deep Q-Learning, a reinforcement learning technique that leverages deep neural networks to teach algorithms optimal actions through trial and error in complex environments. This hybrid methodology targets the efficient allocation and deployment of network intrusion detection sensors while balancing cost-effectiveness with essential network security imperatives. Comprehensive computational tests show that versions enhanced with Deep Q-Learning significantly outperform their native counterparts, especially in complex infrastructures. These results highlight the efficacy of integrating metaheuristics with reinforcement learning to tackle complex optimization challenges, underscoring Deep Q-Learning's potential to boost cybersecurity measures in rapidly evolving threat environments.

13.
Comput Biol Med ; 178: 108694, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38870728

RESUMO

Telemedicine is an emerging development in the healthcare domain, where the Internet of Things (IoT) fiber optics technology assists telemedicine applications to improve overall digital healthcare performances for society. Telemedicine applications are bowel disease monitoring based on fiber optics laser endoscopy, gastrointestinal disease fiber optics lights, remote doctor-patient communication, and remote surgeries. However, many existing systems are not effective and their approaches based on deep reinforcement learning have not obtained optimal results. This paper presents the fiber optics IoT healthcare system based on deep reinforcement learning combinatorial constraint scheduling for hybrid telemedicine applications. In the proposed system, we propose the adaptive security deep q-learning network (ASDQN) algorithm methodology to execute all telemedicine applications under their given quality of services (deadline, latency, security, and resources) constraints. For the problem solution, we have exploited different fiber optics endoscopy datasets with images, video, and numeric data for telemedicine applications. The objective is to minimize the overall latency of telemedicine applications (e.g., local, communication, and edge nodes) and maximize the overall rewards during offloading and scheduling on different nodes. The simulation results show that ASDQN outperforms all telemedicine applications with their QoS and objectives compared to existing state action reward state (SARSA) and deep q-learning network (DQN) policy during execution and scheduling on different nodes.


Assuntos
Aprendizado Profundo , Internet das Coisas , Telemedicina , Humanos , Tecnologia de Fibra Óptica , Algoritmos
14.
Comput Biol Med ; 175: 108447, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38691912

RESUMO

Deep vein thrombosis (DVT) represents a critical health concern due to its potential to lead to pulmonary embolism, a life-threatening complication. Early identification and prediction of DVT are crucial to prevent thromboembolic events and implement timely prophylactic measures in high-risk individuals. This study aims to examine the risk determinants associated with acute lower extremity DVT in hospitalized individuals. Additionally, it introduces an innovative approach by integrating Q-learning augmented colony predation search ant colony optimizer (QL-CPSACO) into the analysis. This algorithm, then combined with support vector machines (SVM), forms a bQL-CPSACO-SVM feature selection model dedicated to crafting a clinical risk prognostication model for DVT. The effectiveness of the proposed algorithm's optimization and the model's accuracy are assessed through experiments utilizing the CEC 2017 benchmark functions and predictive analyses on the DVT dataset. The experimental results reveal that the proposed model achieves an outstanding accuracy of 95.90% in predicting DVT. Key parameters such as D-dimer, normal plasma prothrombin time, prothrombin percentage activity, age, previously documented DVT, leukocyte count, and thrombocyte count demonstrate significant value in the prognostication of DVT. The proposed method provides a basis for risk assessment at the time of patient admission and offers substantial guidance to physicians in making therapeutic decisions.


Assuntos
Máquina de Vetores de Suporte , Trombose Venosa , Humanos , Feminino , Masculino , Algoritmos , Pessoa de Meia-Idade , Hospitalização , Idoso , Fatores de Risco , Medição de Risco , Adulto
15.
Sensors (Basel) ; 24(9)2024 Apr 26.
Artigo em Inglês | MEDLINE | ID: mdl-38732859

RESUMO

Vehicular ad hoc networks (VANETs) use multiple channels to communicate using wireless access in vehicular environment (WAVE) standards to provide a variety of vehicle-related applications. The current IEEE 802.11p WAVE communication channel structure is composed of one control channel (CCH) and several service channels (SCHs). SCHs are used for non-safety data transmission, while the CCH is used for broadcasting beacons, control, and safety. WAVE devices transmit data that alternate between CCHs and SCHs, and each channel is active for a duration called the CCH interval (CCHI) and SCH interval (SCHI), respectively. Currently, both intervals are fixed at 50 ms. However, fixed-length intervals cannot effectively respond to dynamically changing traffic loads. Additionally, when many vehicles are simultaneously using the limited channel resources for data transmission, the network performance significantly degrades due to numerous packet collisions. Herein, we propose an adaptive resource allocation technique for efficient data transmission. The technique dynamically adjusts the SCHI and CCHI to improve network performance. Moreover, to reduce data collisions and optimize the network's backoff distribution, the proposed scheme applies reinforcement learning (RL) to provide an intelligent channel access algorithm. The simulation results demonstrate that the proposed scheme can ensure high throughputs and low transmission delays.

16.
Sensors (Basel) ; 24(9)2024 Apr 28.
Artigo em Inglês | MEDLINE | ID: mdl-38732923

RESUMO

The transition to Industry 4.0 and 5.0 underscores the need for integrating humans into manufacturing processes, shifting the focus towards customization and personalization rather than traditional mass production. However, human performance during task execution may vary. To ensure high human-robot teaming (HRT) performance, it is crucial to predict performance without negatively affecting task execution. Therefore, to predict performance indirectly, significant factors affecting human performance, such as engagement and task load (i.e., amount of cognitive, physical, and/or sensory resources required to perform a particular task), must be considered. Hence, we propose a framework to predict and maximize the HRT performance. For the prediction of task performance during the development phase, our methodology employs features extracted from physiological data as inputs. The labels for these predictions-categorized as accurate performance or inaccurate performance due to high/low task load-are meticulously crafted using a combination of the NASA TLX questionnaire, records of human performance in quality control tasks, and the application of Q-Learning to derive task-specific weights for the task load indices. This structured approach enables the deployment of our model to exclusively rely on physiological data for predicting performance, thereby achieving an accuracy rate of 95.45% in forecasting HRT performance. To maintain optimized HRT performance, this study further introduces a method of dynamically adjusting the robot's speed in the case of low performance. This strategic adjustment is designed to effectively balance the task load, thereby enhancing the efficiency of human-robot collaboration.


Assuntos
Robótica , Análise e Desempenho de Tarefas , Humanos , Robótica/métodos , Feminino , Masculino , Análise de Dados , Sistemas Homem-Máquina , Adulto , Carga de Trabalho
17.
Sci Rep ; 14(1): 10838, 2024 May 12.
Artigo em Inglês | MEDLINE | ID: mdl-38735996

RESUMO

Given the complexity of issuing, verifying, and trading green power certificates in China, along with the challenges posed by policy changes, ensuring that China's green certificate market trading system receives proper mechanisms and technical support is crucial. This study presents a green power certificate trading (GC-TS) architecture based on an equilibrium strategy, which enhances the quoting efficiency and multi-party collaboration capability of green certificate trading by introducing Q-learning, smart contracts, and effectively integrating a multi-agent trading Nash strategy. Firstly, we integrate green certificate trading with electricity and carbon asset trading, constructing pricing strategies for the green certificate, carbon, and electricity trading markets; secondly, we design a certificate-electricity-carbon efficiency model based on ensuring the consistency of green certificates, green electricity, and carbon markets; then, to achieve diversified green certificate trading, we establish a multi-agent reinforcement learning game equilibrium model. Additionally, we propose an integrated Nash Q-learning offer with a smart contract dynamic trading joint clearing mechanism. Experiments show that trading prices have increased by 20%, and the transaction success rate by 30 times, with an analysis of trading performance from groups of 3, 5, 7, and 9 trading agents exhibiting high consistency and redundancy. Compared with models integrating smart contracts, it possesses a higher convergence efficiency of trading quotes.

18.
Biometrics ; 80(2)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38804219

RESUMO

Sequential multiple assignment randomized trials (SMARTs) are the gold standard for estimating optimal dynamic treatment regimes (DTRs), but are costly and require a large sample size. We introduce the multi-stage augmented Q-learning estimator (MAQE) to improve efficiency of estimation of optimal DTRs by augmenting SMART data with observational data. Our motivating example comes from the Back Pain Consortium, where one of the overarching aims is to learn how to tailor treatments for chronic low back pain to individual patient phenotypes, knowledge which is lacking clinically. The Consortium-wide collaborative SMART and observational studies within the Consortium collect data on the same participant phenotypes, treatments, and outcomes at multiple time points, which can easily be integrated. Previously published single-stage augmentation methods for integration of trial and observational study (OS) data were adapted to estimate optimal DTRs from SMARTs using Q-learning. Simulation studies show the MAQE, which integrates phenotype, treatment, and outcome information from multiple studies over multiple time points, more accurately estimates the optimal DTR, and has a higher average value than a comparable Q-learning estimator without augmentation. We demonstrate this improvement is robust to a wide range of trial and OS sample sizes, addition of noise variables, and effect sizes.


Assuntos
Simulação por Computador , Dor Lombar , Estudos Observacionais como Assunto , Ensaios Clínicos Controlados Aleatórios como Assunto , Humanos , Estudos Observacionais como Assunto/estatística & dados numéricos , Ensaios Clínicos Controlados Aleatórios como Assunto/estatística & dados numéricos , Dor Lombar/terapia , Tamanho da Amostra , Resultado do Tratamento , Modelos Estatísticos , Biometria/métodos
19.
Neural Netw ; 175: 106274, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38583264

RESUMO

In this paper, an adjustable Q-learning scheme is developed to solve the discrete-time nonlinear zero-sum game problem, which can accelerate the convergence rate of the iterative Q-function sequence. First, the monotonicity and convergence of the iterative Q-function sequence are analyzed under some conditions. Moreover, by employing neural networks, the model-free tracking control problem can be overcome for zero-sum games. Second, two practical algorithms are designed to guarantee the convergence with accelerated learning. In one algorithm, an adjustable acceleration phase is added to the iteration process of Q-learning, which can be adaptively terminated with convergence guarantee. In another algorithm, a novel acceleration function is developed, which can adjust the relaxation factor to ensure the convergence. Finally, through a simulation example with the practical physical background, the fantastic performance of the developed algorithm is demonstrated with neural networks.


Assuntos
Algoritmos , Redes Neurais de Computação , Dinâmica não Linear , Simulação por Computador , Humanos , Aprendizado de Máquina
20.
Sensors (Basel) ; 24(6)2024 Mar 13.
Artigo em Inglês | MEDLINE | ID: mdl-38544109

RESUMO

To address traffic flow fluctuations caused by changes in traffic signal control schemes on tidal lanes and maintain smooth traffic operations, this paper proposes a method for controlling traffic signal transitions on tidal lanes. Firstly, the proposed method includes designing an intersection overlap phase scheme based on the traffic flow conflict matrix in the tidal lane scenario and a fast and smooth transition method for key intersections based on the flow ratio. The aim of the control is to equalize average queue lengths and minimize average vehicle delays for different flow directions at the intersection. This study also analyses various tidal lane scenarios based on the different opening states of the tidal lanes at related intersections. The transitions of phase offsets are emphasized after a comprehensive analysis of transition time and smoothing characteristics. In addition, this paper proposes a coordinated method for tidal lanes to optimize the phase offset at arterial intersections for smooth and rapid transitions. The method uses Deep Q-Learning, a reinforcement learning algorithm for optimal action selection (OSA), to develop an adaptive traffic signal transition control and enhance its efficiency. Finally, a simulation experiment using a traffic control interface is presented to validate the proposed approach. This study shows that this method leads to smoother and faster traffic signal transitions across different tidal lane scenarios compared to the conventional method. Implementing this solution can benefit intersection groups by reducing traffic delays, improving traffic efficiency, and decreasing air pollution caused by congestion.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA