Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 31
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
J Imaging Inform Med ; 2024 Jul 17.
Artigo em Inglês | MEDLINE | ID: mdl-39020159

RESUMO

Large labeled data bring significant performance improvement, but acquiring labeled medical data is particularly challenging due to the laborious, time-consuming, and medically qualified annotation. Semi-supervised learning has been employed to leverage unlabeled data. However, the quality and quantity of annotated data have a great influence on the performance of the semi-supervised model. Selecting informative samples through active learning is crucial and could improve model performance. Therefore, we propose a unified semi-supervised active learning architecture (RL-based SSAL) that alternately trains a semi-supervised network and performs active sample selection. Semi-supervised model is first well trained for sample selection, and selected label-required samples are annotated and added to the previously labeled dataset for subsequent semi-supervised model training. To learn to select the most informative samples, we adopt a policy learning-based approach that treats sample selection as a decision-making process. A novel reward function based on the product of predictive confidence and uncertainty is designed, aiming to select samples with high confidence and uncertainty. Comparisons with a semi-supervised baseline on collected lumbar disc herniation dataset demonstrate the effectiveness of the proposed RL-based SSAL, achieving over 3% promotion across different amounts of labeled data. Comparisons with other active learning methods and ablation studies reveal the superiority of proposed policy learning based on active sample selection and reward function. Our model trained with only 200 labeled data achieves an accuracy of 89.32% which is comparable to the performance achieved with the entire labeled dataset, demonstrating its significant advantage.

2.
Int J Neural Syst ; 34(7): 2450037, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38655914

RESUMO

Vision and proprioception have fundamental sensory mismatches in delivering locational information, and such mismatches are critical factors limiting the efficacy of motor learning. However, it is still not clear how and to what extent this mismatch limits motor learning outcomes. To further the understanding of the effect of sensory mismatch on motor learning outcomes, a reinforcement learning algorithm and the simplified biomechanical elbow joint model were employed to mimic the motor learning process in a computational environment. By applying a reinforcement learning algorithm to the motor learning of elbow joint flexion task, simulation results successfully explained how visual-proprioceptive mismatch limits motor learning outcomes in terms of motor control accuracy and task completion speed. The larger the perceived angular offset between the two sensory modalities, the lower the motor control accuracy. Also, the more similar the peak reward amplitude of the two sensory modalities, the lower the motor control accuracy. In addition, simulation results suggest that insufficient exploration rate limits task completion speed, and excessive exploration rate limits motor control accuracy. Such a speed-accuracy trade-off shows that a moderate exploration rate could serve as another important factor in motor learning.


Assuntos
Propriocepção , Reforço Psicológico , Percepção Visual , Humanos , Propriocepção/fisiologia , Percepção Visual/fisiologia , Aprendizagem/fisiologia , Articulação do Cotovelo/fisiologia , Desempenho Psicomotor/fisiologia , Fenômenos Biomecânicos/fisiologia , Simulação por Computador , Atividade Motora/fisiologia
3.
Comput Biol Med ; 169: 107877, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38157774

RESUMO

Although existing deep reinforcement learning-based approaches have achieved some success in image augmentation tasks, their effectiveness and adequacy for data augmentation in intelligent medical image analysis are still unsatisfactory. Therefore, we propose a novel Adaptive Sequence-length based Deep Reinforcement Learning (ASDRL) model for Automatic Data Augmentation (AutoAug) in intelligent medical image analysis. The improvements of ASDRL-AutoAug are two-fold: (i) To remedy the problem of some augmented images being invalid, we construct a more accurate reward function based on different variations of the augmentation trajectories. This reward function assesses the validity of each augmentation transformation more accurately by introducing different information about the validity of the augmented images. (ii) Then, to alleviate the problem of insufficient augmentation, we further propose a more intelligent automatic stopping mechanism (ASM). ASM feeds a stop signal to the agent automatically by judging the adequacy of image augmentation. This ensures that each transformation before stopping the augmentation can smoothly improve the model performance. Extensive experimental results on three medical image segmentation datasets show that (i) ASDRL-AutoAug greatly outperforms the state-of-the-art data augmentation methods in medical image segmentation tasks, (ii) the proposed improvements are both effective and essential for ASDRL-AutoAug to achieve superior performance, and the new reward evaluates the transformations more accurately than existing reward functions, and (iii) we also demonstrate that ASDRL-AutoAug is adaptive for different images in terms of sequence length, as well as generalizable across different segmentation models.

4.
J Cheminform ; 15(1): 120, 2023 Dec 13.
Artigo em Inglês | MEDLINE | ID: mdl-38093324

RESUMO

Developing compounds with novel structures is important for the production of new drugs. From an intellectual perspective, confirming the patent status of newly developed compounds is essential, particularly for pharmaceutical companies. The generation of a large number of compounds has been made possible because of the recent advances in artificial intelligence (AI). However, confirming the patent status of these generated molecules has been a challenge because there are no free and easy-to-use tools that can be used to determine the novelty of the generated compounds in terms of patents in a timely manner; additionally, there are no appropriate reference databases for pharmaceutical patents in the world. In this study, two public databases, SureChEMBL and Google Patents Public Datasets, were used to create a reference database of drug-related patented compounds using international patent classification. An exact structure search system was constructed using InChIKey and a relational database system to rapidly search for compounds in the reference database. Because drug-related patented compounds are a good source for generative AI to learn useful chemical structures, they were used as the training data. Furthermore, molecule generation was successfully directed by increasing and decreasing the number of generated patented compounds through incorporation of patent status (i.e., patented or not) into learning. The use of patent status enabled generation of novel molecules with high drug-likeness. The generation using generative AI with patent information would help efficiently propose novel compounds in terms of pharmaceutical patents. Scientific contribution: In this study, a new molecule-generation method that takes into account the patent status of molecules, which has rarely been considered but is an important feature in drug discovery, was developed. The method enables the generation of novel molecules based on pharmaceutical patents with high drug-likeness and will help in the efficient development of effective drug compounds.

5.
Neural Netw ; 167: 847-864, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37741067

RESUMO

Adversarial imitation learning (AIL) is a powerful method for automated decision systems due to training a policy efficiently by mimicking expert demonstrations. However, implicit bias is present in the reward function of these algorithms, which leads to sample inefficiency. To solve this issue, an algorithm, referred to as Mutual Information Generative Adversarial Imitation Learning (MI-GAIL), is proposed to correct the biases. In this study, we propose two guidelines for designing an unbiased reward function. Based on these guidelines, we shape the reward function from the discriminator by adding auxiliary information from a potential-based reward function. The primary insight is that the potential-based reward function provides more accurate rewards for actions identified in the two guidelines. We compare our algorithm with SOTA imitation learning algorithms on a family of continuous control tasks. Experiments results show that MI-GAIL is able to address the issue of bias in AIL reward functions and further improve sample efficiency and training stability.


Assuntos
Viés Implícito , Comportamento Imitativo , Aprendizagem , Algoritmos , Políticas
6.
Comput Biol Med ; 164: 107253, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37536094

RESUMO

Spike sorting is the basis for analyzing spike firing patterns encoded in high-dimensional information spaces. With the fact that high-density microelectrode arrays record multiple neurons simultaneously, the data collected often suffers from two problems: a few overlapping spikes and different neuronal firing rates, which both belong to the multi-class imbalance problem. Since deep reinforcement learning (DRL) assign targeted attention to categories through reward functions, we propose ImbSorter to implement spike sorting under multi-class imbalance. We describe spike sorting as a Markov sequence decision and construct a dynamic reward function (DRF) to improve the sensitivity of the agent to minor classes based on the inter-class imbalance ratios. The agent is eventually guided by the optimal strategy to classify spikes. We consider the Wave_Clus dataset, which contains overlapping spikes and diverse noise levels, and the macaque dataset, which has a multi-scale imbalance. ImbSorter is compared with classical DRL architectures, traditional machine learning algorithms, and advanced overlapping spike sorting techniques on these two above datasets. ImbSorter obtained improved results on the Macro_F1. The results show ImbSorter has a promising ability to resist overlapping and noise interference. It has high stability and promising performance in processing spikes with different degrees of skewed distribution.


Assuntos
Neurônios , Processamento de Sinais Assistido por Computador , Potenciais de Ação/fisiologia , Neurônios/fisiologia , Microeletrodos , Algoritmos
7.
Neural Netw ; 167: 104-117, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37647740

RESUMO

The implementation of robotic reinforcement learning is hampered by problems such as an unspecified reward function and high training costs. Many previous works have used cross-domain policy transfer to obtain the policy of the problem domain. However, these researches require paired and aligned dynamics trajectories or other interactions with the environment. We propose a cross-domain dynamics alignment framework for the problem domain policy acquisition that can transfer the policy trained in the source domain to the problem domain. Our framework aims to learn dynamics alignment across two domains that differ in agents' physical parameters (armature, rotation range, or torso mass) or agents' morphologies (limbs). Most importantly, we learn dynamics alignment between two domains using unpaired and unaligned dynamics trajectories. For these two scenarios, we propose a cross-physics-domain policy adaptation algorithm (CPD) and a cross-morphology-domain policy adaptation algorithm (CMD) based on our cross-domain dynamics alignment framework. In order to improve the performance of policy in the source domain so that a better policy can be transferred to the problem domain, we propose the Boltzmann TD3 (BTD3) algorithm. We conduct diverse experiments on agent continuous control domains to demonstrate the performance of our approaches. Experimental results show that our approaches can obtain better policies and higher rewards for the agents in the problem domains even when the dataset of the problem domain is small.


Assuntos
Algoritmos , Aprendizagem , Física , Políticas , Reforço Psicológico
8.
Neural Netw ; 164: 419-427, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37187108

RESUMO

Although reinforcement learning (RL) has made numerous breakthroughs in recent years, addressing reward-sparse environments remains challenging and requires further exploration. Many studies improve the performance of the agents by introducing the state-action pairs experienced by an expert. However, such kinds of strategies almost depend on the quality of the demonstration by the expert, which is rarely optimal in a real-world environment, and struggle with learning from sub-optimal demonstrations. In this paper, a self-imitation learning algorithm based on the task space division is proposed to realize an efficient high-quality demonstration acquire while the training process. To determine the quality of the trajectory, some well-designed criteria are defined in the task space for finding a better demonstration. The results show that the proposed algorithm will improve the success rate of robot control and achieve a high mean Q value per step. The algorithm framework proposed in this paper has illustrated a great potential to learn from a demonstration generated by using self-policy in sparse environments and can be used in reward-sparse environments where the task space can be divided.


Assuntos
Algoritmos , Inteligência Artificial , Reforço Psicológico , Recompensa
9.
Front Psychiatry ; 14: 1093784, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36896348

RESUMO

Objective: Internet gaming disorder (IGD) can seriously impair an individual's physical and mental health. However, unlike the majority of those suffering from substance addiction, individuals with IGD may recover without any professional intervention. Understanding the brain mechanisms of natural recovery from IGD may provide new insight into how to prevent addiction and implement more targeted interventions. Methods: Sixty individuals with IGD were scanned by using a resting-state fMRI to assess brain region changes associated with IGD. After 1 year, 19 individuals with IGD no longer met the IGD criteria and were considered recovered (RE-IGD), 23 individuals still met the IGD criteria (PER-IGD), and 18 individuals left the study. The brain activity in resting state between 19 RE-IGD individuals and 23 PER-IGD individuals was compared by using regional homogeneity (ReHo). Additionally, brain structure and cue-craving functional MRIs were collected to further support the results in the resting-state. Results: The resting-state fMRI results revealed that activity in brain regions responsible for reward and inhibitory control [including the orbitofrontal cortex (OFC), the precuneus and the dorsolateral prefrontal cortex (DLPFC)] was decreased in the PER-IGD individuals compared to RE-IGD individuals. In addition, significant positive correlations were found between mean ReHo values in the precuneus and self-reported craving scores for gaming, whether among the PER-IGD individuals or the RE-IGD individuals. Furthermore, we found similar results in that brain structure and cue-craving differences exist between the PER-IGD individuals and RE-IGD individuals, specifically in the brain regions associated with reward processing and inhibitory control (including the DLPFC, anterior cingulate gyrus, insula, OFC, precuneus, and superior frontal gyrus). Conclusion: These findings indicate that the brain regions responsible for reward processing and inhibitory control are different in PER-IGD individuals, which may have consequences on natural recovery. Our present study provides neuroimaging evidence that spontaneous brain activity may influence natural recovery from IGD.

10.
Front Neuroinform ; 17: 1096053, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36756212

RESUMO

Aiming at the poor robustness and adaptability of traditional control methods for different situations, the deep deterministic policy gradient (DDPG) algorithm is improved by designing a hybrid function that includes different rewards superimposed on each other. In addition, the experience replay mechanism of DDPG is also improved by combining priority sampling and uniform sampling to accelerate the DDPG's convergence. Finally, it is verified in the simulation environment that the improved DDPG algorithm can achieve accurate control of the robot arm motion. The experimental results show that the improved DDPG algorithm can converge in a shorter time, and the average success rate in the robotic arm end-reaching task is as high as 91.27%. Compared with the original DDPG algorithm, it has more robust environmental adaptability.

11.
Int J Neural Syst ; 32(9): 2250038, 2022 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-35989578

RESUMO

Hippocampal pyramidal cells and interneurons play a key role in spatial navigation. In goal-directed behavior associated with rewards, the spatial firing pattern of pyramidal cells is modulated by the animal's moving direction toward a reward, with a dependence on auditory, olfactory, and somatosensory stimuli for head orientation. Additionally, interneurons in the CA1 region of the hippocampus monosynaptically connected to CA1 pyramidal cells are modulated by a complex set of interacting brain regions related to reward and recall. The computational method of reinforcement learning (RL) has been widely used to investigate spatial navigation, which in turn has been increasingly used to study rodent learning associated with the reward. The rewards in RL are used for discovering a desired behavior through the integration of two streams of neural activity: trial-and-error interactions with the external environment to achieve a goal, and the intrinsic motivation primarily driven by brain reward system to accelerate learning. Recognizing the potential benefit of the neural representation of this reward design for novel RL architectures, we propose a RL algorithm based on [Formula: see text]-learning with a perspective on biomimetics (neuro-inspired RL) to decode rodent movement trajectories. The reward function, inspired by the neuronal information processing uncovered in the hippocampus, combines the preferred direction of pyramidal cell firing as the extrinsic reward signal with the coupling between pyramidal cell-interneuron pairs as the intrinsic reward signal. Our experimental results demonstrate that the neuro-inspired RL, with a combined use of extrinsic and intrinsic rewards, outperforms other spatial decoding algorithms, including RL methods that use a single reward function. The new RL algorithm could help accelerate learning convergence rates and improve the prediction accuracy for moving trajectories.


Assuntos
Recompensa , Navegação Espacial , Animais , Aprendizagem/fisiologia , Neurônios/fisiologia , Reforço Psicológico
12.
Sensors (Basel) ; 22(12)2022 Jun 17.
Artigo em Inglês | MEDLINE | ID: mdl-35746364

RESUMO

As one of the main elements of reinforcement learning, the design of the reward function is often not given enough attention when reinforcement learning is used in concrete applications, which leads to unsatisfactory performances. In this study, a reward function matrix is proposed for training various decision-making modes with emphasis on decision-making styles and further emphasis on incentives and punishments. Additionally, we model a traffic scene via graph model to better represent the interaction between vehicles, and adopt the graph convolutional network (GCN) to extract the features of the graph structure to help the connected autonomous vehicles perform decision-making directly. Furthermore, we combine GCN with deep Q-learning and multi-step double deep Q-learning to train four decision-making modes, which are named the graph convolutional deep Q-network (GQN) and the multi-step double graph convolutional deep Q-network (MDGQN). In the simulation, the superiority of the reward function matrix is proved by comparing it with the baseline, and evaluation metrics are proposed to verify the performance differences among decision-making modes. Results show that the trained decision-making modes can satisfy various driving requirements, including task completion rate, safety requirements, comfort level, and completion efficiency, by adjusting the weight values in the reward function matrix. Finally, the decision-making modes trained by MDGQN had better performance in an uncertain highway exit scene than those trained by GQN.


Assuntos
Condução de Veículo , Recompensa , Benchmarking , Aprendizagem , Incerteza
13.
Sensors (Basel) ; 22(9)2022 May 08.
Artigo em Inglês | MEDLINE | ID: mdl-35591271

RESUMO

When a traditional Deep Deterministic Policy Gradient (DDPG) algorithm is used in mobile robot path planning, due to the limited observable environment of mobile robots, the training efficiency of the path planning model is low, and the convergence speed is slow. In this paper, Long Short-Term Memory (LSTM) is introduced into the DDPG network, the former and current states of the mobile robot are combined to determine the actions of the robot, and a Batch Norm layer is added after each layer of the Actor network. At the same time, the reward function is optimized to guide the mobile robot to move faster towards the target point. In order to improve the learning efficiency, different normalization methods are used to normalize the distance and angle between the mobile robot and the target point, which are used as the input of the DDPG network model. When the model outputs the next action of the mobile robot, mixed noise composed of Gaussian noise and Ornstein-Uhlenbeck (OU) noise is added. Finally, the simulation environment built by a ROS system and a Gazebo platform is used for experiments. The results show that the proposed algorithm can accelerate the convergence speed of DDPG, improve the generalization ability of the path planning model and improve the efficiency and success rate of mobile robot path planning.


Assuntos
Robótica , Algoritmos , Simulação por Computador , Memória de Longo Prazo , Políticas , Robótica/métodos
14.
Addiction ; 117(1): 19-32, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-33861888

RESUMO

AIMS: To estimate the aggregated effect sizes of reward-related decision-making deficits in internet gaming disorder (IGD) and to explore potential moderators on the variability of effect sizes across studies. DESIGN: Review of peer-reviewed studies comparing reward-related decision-making performance between IGD and control participants identified via PubMed, Web of Science and ProQuest databases. Random-effects modeling was conducted using Hedge's g as the effect size (ES). The effects of decision-making situation, valence, sample type, testing environment, IGD severity and self-reported impulsivity on decision-making differences were examined by moderator analyses. SETTING: No restrictions on location. PARTICIPANTS: Twenty-four studies (20 independent samples) were included in the meta-analysis, resulting in 604 IGD and 641 control participants and 35 ESs. MEASURES: Reward-related decision-making differences between IGD and control groups. FINDINGS: The overall ES for decision-making deficits in IGD was small (g = -0.45, P < 0.01). The effects were comparable across risky, ambiguous and inter-temporal decision-making. Larger aggregate ESs were identified for pure-gain and mixed compared with pure-loss decision-making. Studies based on clinical and community samples showed similar effects. No significant difference between behavioral studies and those with extra measurements was observed. Decision-making alterations were not closely associated with IGD severity or self-reported impulsivity differences at the study level. CONCLUSIONS: Internet gaming disorder appears to be consistently associated with reward-related decision-making deficits.


Assuntos
Comportamento Aditivo , Jogos de Vídeo , Humanos , Comportamento Impulsivo , Internet , Transtorno de Adição à Internet , Recompensa
15.
Neural Netw ; 145: 260-270, 2022 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-34781214

RESUMO

Learning complex tasks from scratch is challenging and often impossible for humans as well as for artificial agents. Instead, a curriculum can be used, which decomposes a complex task - the target task - into a sequence of source tasks. Each source task is a simplified version of the next source task with increasing complexity. Learning then occurs gradually by training on each source task while using knowledge from the curriculum's prior source tasks. In this study, we present a new algorithm that combines curriculum learning with Hindsight Experience Replay (HER), to learn sequential object manipulation tasks for multiple goals and sparse feedback. The algorithm exploits the recurrent structure inherent in many object manipulation tasks and implements the entire learning process in the original simulation without adjusting it to each source task. We test our algorithm on three challenging throwing tasks in simulation and show significant improvements compared to vanilla-HER.


Assuntos
Currículo , Aprendizagem , Algoritmos , Simulação por Computador , Humanos
16.
Sensors (Basel) ; 21(19)2021 Sep 30.
Artigo em Inglês | MEDLINE | ID: mdl-34640890

RESUMO

In recent years, machine learning for trading has been widely studied. The direction and size of position should be determined in trading decisions based on market conditions. However, there is no research so far that considers variable position sizes in models developed for trading purposes. In this paper, we propose a deep reinforcement learning model named LSTM-DDPG to make trading decisions with variable positions. Specifically, we consider the trading process as a Partially Observable Markov Decision Process, in which the long short-term memory (LSTM) network is used to extract market state features and the deep deterministic policy gradient (DDPG) framework is used to make trading decisions concerning the direction and variable size of position. We test the LSTM-DDPG model on IF300 (index futures of China stock market) data and the results show that LSTM-DDPG with variable positions performs better in terms of return and risk than models with fixed or few-level positions. In addition, the investment potential of the model can be better tapped by the reward function of the differential Sharpe ratio than that of profit reward function.


Assuntos
Investimentos em Saúde , Memória de Longo Prazo , Previsões , Aprendizado de Máquina , Políticas
17.
Neurosci Biobehav Rev ; 131: 192-210, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34537265

RESUMO

There is a need for innovation with respect to therapeutics in psychiatry. Available evidence indicates that the trace amine-associated receptor 1 (TAAR1) agonist SEP-363856 is promising, as it improves measures of cognitive and reward function in schizophrenia. Hedonic and cognitive impairments are transdiagnostic and constitute major burdens in mood disorders. Herein, we systematically review the behavioural and genetic literature documenting the role of TAAR1 in reward and cognitive function, and propose a mechanistic model of TAAR1's functions in the brain. Notably, TAAR1 activity confers antidepressant-like effects, enhances attention and response inhibition, and reduces compulsive reward seeking without impairing normal function. Further characterization of the responsible mechanisms suggests ion-homeostatic, metabolic, neurotrophic, and anti-inflammatory enhancements in the limbic system. Multiple lines of evidence establish the viability of TAAR1 as a biological target for the treatment of mood disorders. Furthermore, the evidence suggests a role for TAAR1 in reward and cognitive function, which is attributed to a cascade of events that are relevant to the cellular integrity and function of the central nervous system.


Assuntos
Transtornos do Humor , Receptores Acoplados a Proteínas G , Humanos , Sistema Límbico/metabolismo , Transtornos do Humor/tratamento farmacológico , Receptores Acoplados a Proteínas G/metabolismo , Recompensa
18.
Accid Anal Prev ; 161: 106355, 2021 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-34461394

RESUMO

Using simulation models to conduct safety assessments can have several advantages as it enables the evaluation of the safety of various design and traffic management options before actually making changes. However, limited studies have developed microsimulation models for the safety evaluation of active road users such as pedestrians. This can be attributed to the limited ability of simulation models to capture the heterogeneity in pedestrian behavior and their complex collision avoidance mechanisms. Therefore, the objective of this study is to develop an agent-based framework to realistically model pedestrian behavior in near misses and to improve the understanding of pedestrian evasive action mechanisms in interactions with vehicles. Pedestrian-vehicle conflicts are modeled using the Markov Decision Process (MDP) framework. A continuous Gaussian Process Inverse Reinforcement Learning (GP-IRL) approach is implemented to retrieve pedestrians' reward functions and infer their collision avoidance mechanisms in conflict situations. Video data from a congested intersection in Shanghai, China is used as a case study. Trajectories of pedestrians and vehicles involved in traffic conflicts were extracted with computer vision algorithms. A Deep Reinforcement Learning (DRL) model is used to estimate optimal pedestrian policies in traffic conflicts. Results show that the developed model predicted pedestrian trajectories and their evasive action mechanisms (i.e., swerving maneuver and speed changing) in conflict situations with high accuracy. As well, the model provided predictions of the post encroachment time (PET) conflict indicator that strongly correlated with the corresponding values of the field-measured conflicts. This study is a crucial step in developing a safety-oriented microsimulation tool for pedestrians in mixed traffic conditions.


Assuntos
Near Miss , Pedestres , Acidentes de Trânsito/prevenção & controle , China , Humanos , Segurança , Caminhada
19.
Int J Mol Sci ; 22(14)2021 Jul 14.
Artigo em Inglês | MEDLINE | ID: mdl-34299139

RESUMO

Acupuncture affects the central nervous system via the regulation of neurotransmitter transmission. We previously showed that Shemen (HT7) acupoint stimulation decreased cocaine-induced dopamine release in the nucleus accumbens. Here, we used the intracranial self-stimulation (ICSS) paradigm to evaluate whether HT stimulation regulates the brain reward function of rats. We found that HT stimulation triggered a rightward shift of the frequency-rate curve and elevated the ICSS thresholds. However, HT7 stimulation did not affect the threshold-lowering effects produced by cocaine. These results indicate that HT7 points only effectively regulates the ICSS thresholds of the medial forebrain bundle in drug-naïve rats.


Assuntos
Terapia por Acupuntura/métodos , Cocaína/administração & dosagem , Estimulação Elétrica/métodos , Feixe Prosencefálico Mediano/fisiologia , Recompensa , Autoestimulação/fisiologia , Anestésicos Locais/administração & dosagem , Animais , Masculino , Feixe Prosencefálico Mediano/efeitos dos fármacos , Ratos , Ratos Sprague-Dawley , Autoestimulação/efeitos dos fármacos
20.
Sensors (Basel) ; 21(8)2021 Apr 07.
Artigo em Inglês | MEDLINE | ID: mdl-33916995

RESUMO

One of the critical challenges in deploying the cleaning robots is the completion of covering the entire area. Current tiling robots for area coverage have fixed forms and are limited to cleaning only certain areas. The reconfigurable system is the creative answer to such an optimal coverage problem. The tiling robot's goal enables the complete coverage of the entire area by reconfiguring to different shapes according to the area's needs. In the particular sequencing of navigation, it is essential to have a structure that allows the robot to extend the coverage range while saving energy usage during navigation. This implies that the robot is able to cover larger areas entirely with the least required actions. This paper presents a complete path planning (CPP) for hTetran, a polyabolo tiled robot, based on a TSP-based reinforcement learning optimization. This structure simultaneously produces robot shapes and sequential trajectories whilst maximizing the reward of the trained reinforcement learning (RL) model within the predefined polyabolo-based tileset. To this end, a reinforcement learning-based travel sales problem (TSP) with proximal policy optimization (PPO) algorithm was trained using the complementary learning computation of the TSP sequencing. The reconstructive results of the proposed RL-TSP-based CPP for hTetran were compared in terms of energy and time spent with the conventional tiled hypothetical models that incorporate TSP solved through an evolutionary based ant colony optimization (ACO) approach. The CPP demonstrates an ability to generate an ideal Pareto optima trajectory that enhances the robot's navigation inside the real environment with the least energy and time spent in the company of conventional techniques.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA