RESUMO
This paper investigates the single agile optical satellite scheduling problem, which has received increasing attention due to the rapid growth in earth observation requirements. Owing to the complicated constraints and considerable solution space of this problem, the conventional exact methods and heuristic methods, which are sensitive to the problem scale, demand high computational expenses. Thus, an efficient approach is demanded to solve this problem, and this paper proposes a deep reinforcement learning algorithm with a local attention mechanism. A mathematical model is first established to describe this problem, which considers a series of complex constraints and takes the profit ratio of completed tasks as the optimization objective. Then, a neural network framework with an encoder-decoder structure is adopted to generate high-quality solutions, and a local attention mechanism is designed to improve the generation of solutions. In addition, an adaptive learning rate strategy is proposed to guide the actor-critic training algorithm to dynamically adjust the learning rate in the training process to enhance the training effectiveness of the proposed network. Finally, extensive experiments verify that the proposed algorithm outperforms the comparison algorithms in terms of solution quality, generalization performance, and computation efficiency.
RESUMO
Introduction: The rate of adjustment in a movement, driven by feedback error, is referred to as the adaptation rate, and the rate of recovery of a newly adapted movement to its unperturbed condition is called the de-adaptation rate. The rates of adaptation and de-adaptation are dependent on the training mechanism and intrinsic factors such as the participant's sensorimotor abilities. This study investigated the facilitation of the motor adaptation and de-adaptation processes for spatiotemporal features of an asymmetric gait pattern by sequentially applying split-belt treadmill (SBT) and asymmetric rhythmic auditory cueing (ARAC). Methods: Two sessions tested the individual gait characteristics of SBT and ARAC, and the remaining four sessions consisted of applying the two interventions sequentially during training. The adjustment process to the second intervention is referred to as "re-adaptation" and is driven by feedback error associated with the second intervention. Results: Ten healthy individuals participated in the randomized six-session trial. Spatiotemporal asymmetries during the adaptation and post-adaptation (when intervention is removed) stages were fitted into a two-component exponential model that reflects the explicit and implicit adaptation processes. A double component was shown to fit better than a single-component model. The decay constants of the model were indicative of the corresponding timescales and compared between trials. Results revealed that the explicit (fast) component of adaptation to ARAC was reduced for step length and step time when applied after SBT. Contrarily, the explicit component of adaptation to SBT was increased when it was applied after ARAC for step length. Additionally, the implicit (slow) component of adaptation to SBT was inhibited when applied incongruently after ARAC for step time. Discussion: These outcomes show that the role of working motor memory as a translational tool between different gait interventions is dependent on (i) the adaptation mechanisms associated with the interventions, (ii) the targeted motor outcome of the interventions; the effects of factors (i) and (ii) are specific to the explicit and implicit components of the adaptation processes; these effects are unique to spatial and temporal gait characteristics.
RESUMO
Biological accounts of reinforcement learning posit that dopamine encodes reward prediction errors (RPEs), which are multiplied by a learning rate to update state or action values. These values are thought to be represented by corticostriatal synaptic weights, which are updated by dopamine-dependent plasticity. This suggests that dopamine release reflects the product of the learning rate and RPE. Here, we characterize dopamine encoding of learning rates in the nucleus accumbens core (NAcc) in a volatile environment. Using a task with semi-observable states offering different rewards, we find that rats adjust how quickly they initiate trials across states using RPEs. Computational modeling and behavioral analyses show that learning rates are higher following state transitions and scale with trial-by-trial changes in beliefs about hidden states, approximating normative Bayesian strategies. Notably, dopamine release in the NAcc encodes RPEs independent of learning rates, suggesting that dopamine-independent mechanisms instantiate dynamic learning rates.
Assuntos
Dopamina , Aprendizagem , Núcleo Accumbens , Recompensa , Dopamina/metabolismo , Animais , Masculino , Ratos , Núcleo Accumbens/metabolismo , Núcleo Accumbens/fisiologia , Aprendizagem/fisiologia , Teorema de Bayes , Ratos Long-Evans , Ratos Sprague-DawleyRESUMO
Convolutional neural network (CNN) has recently become popular for addressing multi-domain image classification. However, most existing methods frequently suffer from poor performance, especially in performance and convergence for various datasets. Herein, we have proposed an algorithm for multi-domain image classification by introducing a novel adaptive learning rate rule to the conventional CNN. Specifically, we adopt the CNN to extract rich feature representations. Given that the hyperparameters of the learning rate have a positive effect on the prediction error, the Egret Swarm Optimization Algorithm (ESOA) is introduced to update the learning rate, which can jump out of local extrema during exploration. Therefore, combined with quadratic interpolation, the objective function can be approximated by a polynomial, thereby improving its prediction accuracy. To verify the robustness of the proposed algorithm, we conducted comprehensive experiments in five domain public datasets to fulfil the task of image classification. Meanwhile, the highest accuracy rate of 97.15 % was obtained on the test set. The performances of our method on 24 benchmark functions (CEC2017 and CEC2022) are compared with Particle Swarm Optimization (PSO), Genetic Algorithm(GA), Whale Optimization Algorithm(WOA), Catch Fish Optimization Algorithm(CFOA), GOOSE Algorithm(GO) and ESOA. In two benchmark sets, the performance metric values of our algorithm rank no. 1, especially in all unimodal functions in contrast with other baseline algorithms.
RESUMO
Addressing the issues of prolonged training times and low recognition rates in large model applications, this paper proposes a weight training method based on entropy gain for weight initialization and dynamic adjustment of the learning rate using the multilayer perceptron (MLP) model as an example. Initially, entropy gain was used to replace random initial values for weight initialization. Subsequently, an incremental learning rate strategy was employed for weight updates. The model was trained and validated using the MNIST handwritten digit dataset. The experimental results showed that, compared to random initialization, the proposed initialization method improves training effectiveness by 39.8% and increases the maximum recognition accuracy by 8.9%, demonstrating the feasibility of this method in large model applications.
RESUMO
Cognitive control refers to the ability to override prepotent response tendencies to achieve goal-directed behavior. On the other hand, reinforcement learning refers to the learning of actions through feedback and reward. Although cognitive control and reinforcement learning are often viewed as opposing forces in driving behavior, recent theories have emphasized possible similarities in their underling processes. With this study, we aimed to investigate whether a similar time window of integration could be observed during the learning of control on the one hand, and the learning rate in reinforcement learning paradigms on the other. To this end, we performed a correlational analysis on a large public dataset (n = 522) including data from two reinforcement learning tasks, i.e., a probabilistic selection task and a probabilistic Wisconsin Card Sorting Task (WCST), and data from a classic conflict task (i.e., the Stroop task). Results showed expected correlations between the time scale of control indices and learning rate in the probabilistic WCST. Moreover, the learning-rate parameters of the two reinforcement learning tasks did not correlate with each other. Together, these findings suggest a reliance on a shared learning mechanism between these two traditionally distinct domains, while at the same time emphasizing that value updating processes can still be very task-specific. We speculate that updating processes in the Stroop and WCST may be more related because both tasks require task-specific updating of stimulus features (e.g., color, word meaning, pattern, shape), as opposed to stimulus identity.
RESUMO
A radial basis function neural network PID controller under fuzzy rules (FUZZY-RBF-PID) was designed for the electro-hydraulic position servo system under the influence of uncertain factors such as load mutation, and load stiffness change. Firstly, the mathematical model of the system is established, and the frequency domain and time domain analysis of the system are carried out. Secondly, based on the analysis results, a radial basis function (RBF) neural network PID controller is designed, and fuzzy rules are innovatively used to adjust the learning rate of PID parameters in the RBF neural network learning algorithm in real time. Thirdly, the simulation results show that under the action of the FUZZY-RBF-PID controller, the unit step response of the system has high steady-state accuracy, fast response speed, and under the condition of large load stiffness, the system can recover to the steady-state value faster after being disturbed. At the same time, when the input signal is the sinusoidal signal of 10 HZ, the system under the action of the FUZZY-RBF-PID controller has no obvious phase lag phenomenon, and the tracking error is minimal. The proposed method can effectively improve the comprehensive performance of the electro-hydraulic position servo system under the influence of uncertain factors.
RESUMO
This study explores the impact of aging on reinforcement learning in mice, focusing on changes in learning rates and behavioral strategies. A 5-armed bandit task (5-ABT) and a computational Q-learning model were used to evaluate the positive and negative learning rates and the inverse temperature across three age groups (3, 12, and 18 months). Results showed a significant decline in the negative learning rate of 18-month-old mice, which was not observed for the positive learning rate. This suggests that older mice maintain the ability to learn from successful experiences while decreasing the ability to learn from negative outcomes. We also observed a significant age-dependent variation in inverse temperature, reflecting a shift in action selection policy. Middle-aged mice (12 months) exhibited higher inverse temperature, indicating a higher reliance on previous rewarding experiences and reduced exploratory behaviors, when compared to both younger and older mice. This study provides new insights into aging research by demonstrating that there are age-related differences in specific components of reinforcement learning, which exhibit a non-linear pattern.
Assuntos
Envelhecimento , Animais , Envelhecimento/psicologia , Envelhecimento/fisiologia , Masculino , Camundongos Endogâmicos C57BL , Reforço Psicológico , Comportamento Animal , Aprendizagem por Probabilidade , Camundongos , Comportamento Exploratório/fisiologiaRESUMO
A Bayesian method based on the learning rate parameter η is called a generalized Bayesian method. In this study, joint hybrid censored type I and type II samples from k exponential populations were examined to determine the influence of the parameter η on the estimation results. To investigate the selection effects of the learning rate and the loss parameters on the estimation results, we considered two additional loss functions in the Bayesian approach: the linear and the generalized entropy loss functions. We then compared the generalized Bayesian algorithm with the traditional Bayesian algorithm. We performed Monte Carlo simulations to compare the performance of the estimation results with the losses and different values of η . The effects of different losses with different values and learning rate parameters are examined using an example.
RESUMO
Cosmetics and topical medications, such as gels, foams, creams, and lotions, are viscoelastic substances that are applied to the skin or mucous membranes. The human perception of these materials is complex and involves multiple sensory modalities. Traditional panel-based sensory evaluations have limitations due to individual differences in sensory receptors and factors such as age, race, and gender. Therefore, this study proposes a deep-learning-based method for systematically analyzing and effectively identifying the physical properties of cosmetic gels. Time-series friction signals generated by rubbing the gels were measured. These signals were preprocessed through short-time Fourier transform (STFT) and continuous wavelet transform (CWT), respectively, and the frequency factors that change over time were distinguished and analyzed. The deep learning model employed a ResNet-based convolution neural network (CNN) structure with optimization achieved through a learning rate scheduler. The optimized STFT-based 2D CNN model outperforms the CWT-based 2D and 1D CNN models. The optimized STFT-based 2D CNN model also demonstrated robustness and reliability through k-fold cross-validation. This study suggests the potential for an innovative approach to replace traditional expert panel evaluations and objectively assess the user experience of cosmetics.
Assuntos
Cosméticos , Aprendizado Profundo , Análise de Fourier , Géis , Cosméticos/química , Géis/química , Humanos , Redes Neurais de ComputaçãoRESUMO
Research in reinforcement learning indicates that animals respond differently to positive and negative reward prediction errors, which can be calculated by assuming learning rate bias. Many studies have shown that humans and other animals have learning rate bias during learning, but it is unclear whether and how the bias changes throughout the entire learning process. Here, we recorded the behavior data and the local field potentials (LFPs) in the striatum of five pigeons performing a probabilistic learning task. Reinforcement learning models with and without learning rate biases were used to dynamically fit the pigeons' choice behavior and estimate the option values. Furthemore, the correlation between the striatal LFPs power and the model-estimated option values was explored. We found that the pigeons' learning rate bias shifted from negative to positive during the learning process, and the striatal Gamma (31 to 80 Hz) power correlated with the option values modulated by dynamic learning rate bias. In conclusion, our results support the hypothesis that pigeons employ a dynamic learning strategy in the learning process from both behavioral and neural aspects, providing valuable insights into reinforcement learning mechanisms of non-human animals.
RESUMO
Background: The orbitofrontal cortex (OFC) is essential for decision making, and functional disruptions within the OFC are evident in schizophrenia. Postnatal phencyclidine (PCP) administration in rats is a neurodevelopmental manipulation that induces schizophrenia-relevant cognitive impairments. We aimed to determine whether manipulating OFC glutamate cell activity could ameliorate postnatal PCP-induced deficits in decision making. Methods: Male and female Wistar rats (n = 110) were administered saline or PCP on postnatal days 7, 9, and 11. In adulthood, we expressed YFP (yellow fluorescent protein) (control), ChR2 (channelrhodopsin-2) (activation), or eNpHR 3.0 (enhanced halorhodopsin) (inhibition) in glutamate neurons within the ventromedial OFC (vmOFC). Rats were tested on the probabilistic reversal learning task once daily for 20 days while we manipulated the activity of vmOFC glutamate cells. Behavioral performance was analyzed using a Q-learning computational model of reinforcement learning. Results: Compared with saline-treated rats expressing YFP, PCP-treated rats expressing YFP completed fewer reversals, made fewer win-stay responses, and had lower learning rates. We induced similar performance impairments in saline-treated rats by activating vmOFC glutamate cells (ChR2). Strikingly, PCP-induced performance deficits were ameliorated when the activity of vmOFC glutamate cells was inhibited (halorhodopsin). Conclusions: Postnatal PCP-induced deficits in decision making are associated with hyperactivity of vmOFC glutamate cells. Thus, normalizing vmOFC activity may represent a potential therapeutic target for decision-making deficits in patients with schizophrenia.
RESUMO
Parkinson's Disease (PD) is a common disorder of the central nervous system. The Unified Parkinson's Disease Rating Scale or UPDRS is commonly used to track PD symptom progression because it displays the presence and severity of symptoms. To model the relationship between speech signal properties and UPDRS scores, this study develops a new method using Neuro-Fuzzy (ANFIS) and Optimized Learning Rate Learning Vector Quantization (OLVQ1). ANFIS is developed for different Membership Functions (MFs). The method is evaluated using Parkinson's telemonitoring dataset which includes a total of 5875 voice recordings from 42 individuals in the early stages of PD which comprises 28 men and 14 women. The dataset is comprised of 16 vocal features and Motor-UPDRS, and Total-UPDRS. The method is compared with other learning techniques. The results show that OLVQ1 combined with the ANFIS has provided the best results in predicting Motor-UPDRS and Total-UPDRS. The lowest Root Mean Square Error (RMSE) values (UPDRS (Total)=0.5732; UPDRS (Motor)=0.5645) and highest R-squared values (UPDRS (Total)=0.9876; UPDRS (Motor)=0.9911) are obtained by this method. The results are discussed and directions for future studies are presented.i.ANFIS and OLVQ1 are combined to predict UPDRS.ii.OLVQ1 is used for PD data segmentation.iii.ANFIS is developed for different MFs to predict Motor-UPDRS and Total-UPDRS.
RESUMO
Sharpness aware minimization (SAM) optimizer has been extensively explored as it can generalize better for training deep neural networks via introducing extra perturbation steps to flatten the landscape of deep learning models. Integrating SAM with adaptive learning rate and momentum acceleration, dubbed AdaSAM, has already been explored empirically to train large-scale deep neural networks without theoretical guarantee due to the triple difficulties in analyzing the coupled perturbation step, adaptive learning rate and momentum step. In this paper, we try to analyze the convergence rate of AdaSAM in the stochastic non-convex setting. We theoretically show that AdaSAM admits a O(1/bT) convergence rate, which achieves linear speedup property with respect to mini-batch size b. Specifically, to decouple the stochastic gradient steps with the adaptive learning rate and perturbed gradient, we introduce the delayed second-order momentum term to decompose them to make them independent while taking an expectation during the analysis. Then we bound them by showing the adaptive learning rate has a limited range, which makes our analysis feasible. To the best of our knowledge, we are the first to provide the non-trivial convergence rate of SAM with an adaptive learning rate and momentum acceleration. At last, we conduct several experiments on several NLP tasks and the synthetic task, which show that AdaSAM could achieve superior performance compared with SGD, AMSGrad, and SAM optimizers.
Assuntos
Redes Neurais de Computação , Movimento (Física)RESUMO
Learning audio-visual associations is foundational to a number of real-world skills, such as reading acquisition or social communication. Characterizing individual differences in such learning has therefore been of interest to researchers in the field. Here, we present a novel audio-visual associative learning task designed to efficiently capture inter-individual differences in learning, with the added feature of using non-linguistic stimuli, so as to unconfound language and reading proficiency of the learner from their more domain-general learning capability. By fitting trial-by-trial performance in our novel learning task using simple-to-use statistical tools, we demonstrate the expected inter-individual variability in learning rate as well as high precision in its estimation. We further demonstrate that such measured learning rate is linked to working memory performance in Italian-speaking (N = 58) and French-speaking (N = 51) adults. Finally, we investigate the extent to which learning rate in our task, which measures cross-modal audio-visual associations while mitigating familiarity confounds, predicts reading ability across participants with different linguistic backgrounds. The present work thus introduces a novel non-linguistic audio-visual associative learning task that can be used across languages. In doing so, it brings a new tool to researchers in the various domains that rely on multi-sensory integration from reading to social cognition or socio-emotional learning.
Assuntos
Idioma , Aprendizagem , Adulto , Humanos , Linguística , Memória de Curto Prazo , CogniçãoRESUMO
Wheat leaf diseases are considered to be the foremost threat to wheat yield. In the realm of crop disease detection, convolutional neural networks (CNNs) have emerged as important tools. The training strategy and the initial learning rate are key factors that impact the performance and training speed of the model in CNNs. This study employed six training strategies, including Adam, SGD, Adam + StepLR, SGD + StepLR, Warm-up + Cosine annealing + SGD, Warm-up + Cosine, and annealing + Adam, with three initial learning rates (0.05, 0.01, and 0.001). Using the wheat stripe rust, wheat powdery mildew, and healthy wheat datasets, five lightweight CNN models, namely MobileNetV3, ShuffleNetV2, GhostNet, MnasNet, and EfficientNetV2, were evaluated. The results showed that upon combining the SGD + StepLR with the initial learning rate of 0.001, the MnasNet obtained the highest recognition accuracy of 98.65%. The accuracy increased by 1.1% as compared to that obtained with the training strategy with a fixed learning rate, and the size of the parameters was only 19.09 M. The above results indicated that the MnasNet was appropriate for porting to the mobile terminal and efficient for automatically identifying wheat leaf diseases.
RESUMO
Background: The examination, counting, and classification of white blood cells (WBCs), also known as leukocytes, are essential processes in the diagnosis of many disorders, including leukemia, a kind of blood cancer characterized by the uncontrolled proliferation of carcinogenic leukocytes in the marrow of the bone. Blood smears can be chemically or microscopically studied to better understand hematological diseases and blood disorders. Detecting, identifying, and categorizing the many blood cell types are essential for disease diagnosis and therapy planning. A theoretical and practical issue. However, methods based on deep learning (DL) have greatly helped blood cell classification. Materials and Methods: Images of blood cells in a microscopic smear were collected from GitHub, a public source that uses the MIT license. An end-to-end computer-aided diagnosis (CAD) system for leukocytes has been created and implemented as part of this study. The introduced system comprises image preprocessing and enhancement, image segmentation, feature extraction and selection, and WBC classification. By combining the DenseNet-161 and the cyclical learning rate (CLR), we contribute an approach that speeds up hyperparameter optimization. We also offer the one-cycle technique to rapidly optimize all hyperparameters of DL models to boost training performance. Results: The dataset has been split into two sets: approximately 80% of the data (9,966 images) for the training set and 20% (2,487 images) for the validation set. The validation set has 623, 620, 620, and 624 eosinophil, lymphocyte, monocyte, and neutrophil images, whereas the training set has 2,497, 2,483, 2,487, and 2,499, respectively. The suggested method has 100% accuracy on the training set of images and 99.8% accuracy on the testing set. Conclusion: Using a combination of the recently developed pretrained convolutional neural network (CNN), DenseNet, and the one fit cycle policy, this study describes a technique of training for the classification of WBCs for leukemia detection. The proposed method is more accurate compared to the state of the art.
RESUMO
BACKGROUND: Substance use disorders (SUDs) represent a major public health risk. Yet, our understanding of the mechanisms that maintain these disorders remains incomplete. In a recent computational modeling study, we found initial evidence that SUDs are associated with slower learning rates from negative outcomes and less value-sensitive choice (low "action precision"), which could help explain continued substance use despite harmful consequences. METHODS: Here we aimed to replicate and extend these results in a pre-registered study with a new sample of 168 individuals with SUDs and 99 healthy comparisons (HCs). We performed the same computational modeling and group comparisons as in our prior report (doi: 10.1016/j.drugalcdep.2020.108208) to confirm previously observed effects. After completing all pre-registered replication analyses, we then combined the previous and current datasets (N = 468) to assess whether differences were transdiagnostic or driven by specific disorders. RESULTS: Replicating prior results, SUDs showed slower learning rates for negative outcomes in both Bayesian and frequentist analyses (partial η2=.02). Previously observed differences in action precision were not confirmed. Learning rates for positive outcomes were also similar between groups. Logistic regressions including all computational parameters as predictors in the combined datasets could differentiate several specific disorders from HCs, but could not differentiate most disorders from each other. CONCLUSIONS: These results provide robust evidence that individuals with SUDs adjust behavior more slowly in the face of negative outcomes than HCs. They also suggest this effect is common across several different SUDs. Future research should examine its neural basis and whether learning rates could represent a new treatment target or moderator of treatment outcome.
Assuntos
Transtornos Relacionados ao Uso de Substâncias , Humanos , Teorema de Bayes , Transtornos Relacionados ao Uso de Substâncias/complicaçõesRESUMO
Previous research suggests that mnemonic discrimination (i.e., the ability to discriminate between previously encountered and novel stimuli even when they are highly similar) improves substantially during childhood. To further understand the development of mnemonic discrimination during childhood, the current study had 4-year-old children, 6-year-old children, and young adults complete the forced-choice Mnemonic Similarity Task (MST). The forced-choice MST offers a significant advantage in the context of developmental research because it is not sensitive to age-related differences in response criteria and includes three test formats that are theorized to be supported by different cognitive processes. A target (i.e., a previously encountered item) is paired with either a novel item (A-X), a corresponding lure (A-A'; i.e., an item mnemonically similar to the target), or a non-corresponding lure (A-B'; i.e., an item mnemonically similar to a different previously encoded item). We observed that 4-year-olds performed more poorly than 6-year-olds on the A-X and A-A' test formats, whereas both 4- and 6-year-olds performed more poorly than young adults on the A-B' test format. The MINERVA 2.2 computational model effectively accounted for these age-related differences. The model suggested that 4-year-olds have a lower learning rate (i.e., probability of encoding stimulus features) than 6-year-olds and young adults and that both 4- and 6-year-olds have greater encoding variability than young adults. These findings provide new insight into possible mechanisms underlying memory development during childhood and serve as the basis for multiple avenues of future research.
Assuntos
Desenvolvimento Infantil , Comportamento de Escolha , Aprendizagem por Discriminação , Psicologia da Criança , Humanos , Pré-Escolar , Criança , Adulto Jovem , Tempo de Reação , Masculino , Feminino , Modelos Psicológicos , EnvelhecimentoRESUMO
This study compared the effect of two face-to-face(F2F) and e-learning education methods on learning, retention, and interest in English language courses. Participants were EFL students studying at Islamic Azad University, for the academic year 2021-2022. A multiple-stage cluster-sampling method was used to select the target participants. Three hundred and twenty EFL learners participated in the study. Students were studying in different majors: accounting, economics, psychology, physical education, law, management, and sociology. Two English tests were applied, a teacher-made VTS (Vocabulary Size Test) and an achievement test (including reading comprehension and grammar questions). Also, a questionnaire was applied to measure the students' learning interest in F2F and online learning groups. The study found significant differences in learning outcomes related to students' English learning and vocabulary retention rates. It was seen that the E-learning group that participated in online sessions through the Learning Management Systems (LMS) platform outperformed the F2F group. Another critical finding revealed that learners' interest in learning English in E-learning classes was higher than in the F2F group. In addition, all constructs of interest (feeling happy, attention, interest, and participation) were higher in scores in the E-learning than in the F2F group. Language teachers, university instructors, educators, syllabus designers, school administrators, and policymakers might rethink their teaching approaches and incorporate E-learning into the curriculum to meet their students' needs.