Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 37
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Artículo en Inglés | MEDLINE | ID: mdl-38728125

RESUMEN

Continuous-time reinforcement learning (CT-RL) methods hold great promise in real-world applications. Adaptive dynamic programming (ADP)-based CT-RL algorithms, especially their theoretical developments, have achieved great successes. However, these methods have not been demonstrated for solving realistic or meaningful learning control problems. Thus, the goal of this work is to introduce a suite of new excitable integral reinforcement learning (EIRL) algorithms for control of CT affine nonlinear systems. This work develops a new excitation framework to improve persistence of excitation (PE) and numerical performance via input/output insights from classical control. Furthermore, when the system dynamics afford a physically-motivated partition into distinct dynamical loops, the proposed methods break the control problem into smaller subproblems, resulting in reduced complexity. By leveraging the known affine nonlinear dynamics, the methods achieve well-behaved system responses and considerable data efficiency. The work provides convergence, solution optimality, and closed-loop stability guarantees of the proposed methods, and it demonstrates these guarantees on a significant application problem of controlling an unstable, nonminimum phase hypersonic vehicle (HSV).

2.
IEEE Trans Neural Netw Learn Syst ; 35(3): 3156-3167, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-37027592

RESUMEN

We aim at creating a transfer reinforcement learning framework that allows the design of learning controllers to leverage prior knowledge extracted from previously learned tasks and previous data to improve the learning performance of new tasks. Toward this goal, we formalize knowledge transfer by expressing knowledge in the value function in our problem construct, which is referred to as reinforcement learning with knowledge shaping (RL-KS). Unlike most transfer learning studies that are empirical in nature, our results include not only simulation verifications but also an analysis of algorithm convergence and solution optimality. Also different from the well-established potential-based reward shaping methods which are built on proofs of policy invariance, our RL-KS approach allows us to advance toward a new theoretical result on positive knowledge transfer. Furthermore, our contributions include two principled ways that cover a range of realization schemes to represent prior knowledge in RL-KS. We provide extensive and systematic evaluations of the proposed RL-KS method. The evaluation environments not only include classical RL benchmark problems but also include a challenging task of real-time control of a robotic lower limb with a human user in the loop.

3.
Artículo en Inglés | MEDLINE | ID: mdl-37027747

RESUMEN

This exposition discusses continuous-time reinforcement learning (CT-RL) for the control of affine nonlinear systems. We review four seminal methods that are the centerpieces of the most recent results on CT-RL control. We survey the theoretical results of the four methods, highlighting their fundamental importance and successes by including discussions on problem formulation, key assumptions, algorithm procedures, and theoretical guarantees. Subsequently, we evaluate the performance of the control designs to provide analyses and insights on the feasibility of these design methods for applications from a control designer's point of view. Through systematic evaluations, we point out when theory diverges from practical controller synthesis. We, furthermore, introduce a new quantitative analytical framework to diagnose the observed discrepancies. Based on the analyses and the insights gained through quantitative evaluations, we point out potential future research directions to unleash the potential of CT-RL control algorithms in addressing the identified challenges.

4.
Artículo en Inglés | MEDLINE | ID: mdl-37018713

RESUMEN

The tuning of robotic prosthesis control is essential to provide personalized assistance to individual prosthesis users. Emerging automatic tuning algorithms have shown promise to ease the device personalization procedure. However, very few automatic tuning algorithms consider the user preference as the tuning goal, which may limit the adoptability of the robotic prosthesis. In this study, we propose and evaluate a novel prosthesis control tuning framework for a robotic knee prosthesis, which could enable user preferred robot behavior in the device tuning process. The framework consists of 1) a User-Controlled Interface that allows the user to select their preferred knee kinematics in gait and 2) a reinforcement learning-based algorithm for tuning high-dimension prosthesis control parameters to meet the desired knee kinematics. We evaluated the performance of the framework along with usability of the developed user interface. In addition, we used the developed framework to investigate whether amputee users can exhibit a preference between different profiles during walking and whether they can differentiate between their preferred profile and other profiles when blinded. The results showed effectiveness of our developed framework in tuning 12 robotic knee prosthesis control parameters while meeting the user-selected knee kinematics. A blinded comparative study showed that users can accurately and consistently identify their preferred prosthetic control knee profile. Further, we preliminarily examined gait biomechanics of the prosthesis users when walking with different prosthesis control and did not find clear difference between walking with preferred prosthesis control and when walking with normative gait control parameters. This study may inform future translation of this novel prosthesis tuning framework for home or clinical use.

5.
IEEE Trans Neural Netw Learn Syst ; 34(8): 4249-4260, 2023 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-34739383

RESUMEN

We introduce an innovative solution approach to the challenging dynamic load-shedding problem which directly affects the stability of large power grid. Our proposed deep Q-network for load-shedding (DQN-LS) determines optimal load-shedding strategy to maintain power system stability by taking into account both spatial and temporal information of a dynamically operating power system, using a convolutional long-short-term memory (ConvLSTM) network to automatically capture dynamic features that are translation-invariant in short-term voltage instability, and by introducing a new design of the reward function. The overall goal for the proposed DQN-LS is to provide real-time, fast, and accurate load-shedding decisions to increase the quality and probability of voltage recovery. To demonstrate the efficacy of our proposed approach and its scalability to large-scale, complex dynamic problems, we utilize the China Southern Grid (CSG) to obtain our test results, which clearly show superior voltage recovery performance by employing the proposed DQN-LS under different and uncertain power system fault conditions. What we have developed and demonstrated in this study, in terms of the scale of the problem, the load-shedding performance obtained, and the DQN-LS approach, have not been demonstrated previously.

6.
IEEE Trans Biomed Eng ; 70(5): 1634-1642, 2023 05.
Artículo en Inglés | MEDLINE | ID: mdl-36417736

RESUMEN

Automatically personalizing complex control of robotic prostheses to improve gait performance, such as gait symmetry, is challenging. Recently, human-in-the-loop (HIL) optimization and reinforcement learning (RL) have shown promise in achieving optimized control of wearable robots for each individual user. However, HIL optimization methods lack scalability for high-dimensional space, while RL has mostly focused on optimizing robot kinematic performance. Thus, we propose a novel hierarchical framework to personalize robotic knee prosthesis control and improve overall gait performance. Specifically, in this study the framework was implemented to simultaneously design target knee kinematics and tune 12 impedance control parameters for improved symmetry of propulsive impulse in walking. In our proposed framework, HIL optimization is used to identify an optimal target knee kinematics with respect to symmetry improvement, while RL is leveraged to yield an optimal policy for tuning impedance parameters in high-dimensional space to match the kinematics target. The proposed framework was validated on human subjects, walking with robotic knee prosthesis. The results showed that our design successfully shaped the target knee kinematics as well as configured 12 impedance control parameters to improve propulsive impulse symmetry of the human users. The knee kinematics that yielded best propulsion symmetry did not preserve the normative knee kinematics profile observed in non-disabled individuals, suggesting that restoration of normative joint biomechanics in walking does not necessarily optimize the gait performance of human-prosthesis systems. This new framework for prosthesis control personalization may be extended to other wearable devices or different gait performance optimization goals in the future.


Asunto(s)
Prótesis de la Rodilla , Procedimientos Quirúrgicos Robotizados , Robótica , Humanos , Marcha , Caminata , Articulación de la Rodilla/cirugía , Fenómenos Biomecánicos
7.
IEEE Trans Neural Netw Learn Syst ; 33(8): 4139-4144, 2022 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-33534714

RESUMEN

In this work, time-driven learning refers to the machine learning method that updates parameters in a prediction model continuously as new data arrives. Among existing approximate dynamic programming (ADP) and reinforcement learning (RL) algorithms, the direct heuristic dynamic programming (dHDP) has been shown an effective tool as demonstrated in solving several complex learning control problems. It continuously updates the control policy and the critic as system states continuously evolve. It is therefore desirable to prevent the time-driven dHDP from updating due to insignificant system event such as noise. Toward this goal, we propose a new event-driven dHDP. By constructing a Lyapunov function candidate, we prove the uniformly ultimately boundedness (UUB) of the system states and the weights in the critic and the control policy networks. Consequently, we show the approximate control and cost-to-go function approaching Bellman optimality within a finite bound. We also illustrate how the event-driven dHDP algorithm works in comparison to the original time-driven dHDP.

8.
IEEE Trans Neural Netw Learn Syst ; 33(10): 5873-5887, 2022 10.
Artículo en Inglés | MEDLINE | ID: mdl-33956634

RESUMEN

We are motivated by the real challenges presented in a human-robot system to develop new designs that are efficient at data level and with performance guarantees, such as stability and optimality at system level. Existing approximate/adaptive dynamic programming (ADP) results that consider system performance theoretically are not readily providing practically useful learning control algorithms for this problem, and reinforcement learning (RL) algorithms that address the issue of data efficiency usually do not have performance guarantees for the controlled system. This study fills these important voids by introducing innovative features to the policy iteration algorithm. We introduce flexible policy iteration (FPI), which can flexibly and organically integrate experience replay and supplemental values from prior experience into the RL controller. We show system-level performances, including convergence of the approximate value function, (sub)optimality of the solution, and stability of the system. We demonstrate the effectiveness of the FPI via realistic simulations of the human-robot system. It is noted that the problem we face in this study may be difficult to address by design methods based on classical control theory as it is nearly impossible to obtain a customized mathematical model of a human-robot system either online or offline. The results we have obtained also indicate the great potential of RL control to solving realistic and challenging problems with high-dimensional control inputs.


Asunto(s)
Procedimientos Quirúrgicos Robotizados , Robótica , Humanos , Simulación por Computador , Redes Neurales de la Computación , Políticas
9.
Artículo en Inglés | MEDLINE | ID: mdl-34458654

RESUMEN

Robotic lower-limb prostheses aim to replicate the power-generating capability of biological joints during locomotion to empower individuals with lower-limb loss. However, recent clinical trials have not demonstrated clear advantages of these devices over traditional passive devices. We believe this is partly because the current designs of robotic prothesis controllers and clinical methods for fitting and training individuals to use them do not ensure good coordination between the prosthesis and user. Accordingly, we advocate for new holistic approaches in which human motor control and intelligent prosthesis control function as one system (defined as human-prosthesis symbiosis). We hope engineers and clinicians will work closely to achieve this symbiosis, thereby improving the functionality and acceptance of robotic prostheses and users' quality of life.

10.
IEEE Trans Neural Syst Rehabil Eng ; 28(4): 904-913, 2020 04.
Artículo en Inglés | MEDLINE | ID: mdl-32149646

RESUMEN

With advances in robotic prostheses, rese-archers attempt to improve amputee's gait performance (e.g., gait symmetry) beyond restoring normative knee kinematics/kinetics. Yet, little is known about how the prosthesis mechanics/control influence wearer-prosthesis' gait performance, such as gait symmetry, stability, etc. This study aimed to investigate the influence of robotic transfemoral prosthesis mechanics on human wearers' gait symmetry. The investigation was enabled by our previously designed reinforcement learning (RL) supplementary control, which simultaneously tuned 12 control parameters that determined the prosthesis mechanics throughout a gait cycle. The RL control design facilitated safe explorations of prosthesis mechanics with the human in the loop. Subjects were recruited and walked with a robotic transfemoral prosthesis on a treadmill while the RL controller tuned the control parameters. Stance time symmetry, step length symmetry, and bilateral anteroposterior (AP) impulses were measured. The data analysis showed that changes in robotic knee mechanics led to movement variations in both lower limbs and therefore gait temporal-spatial symmetry measures. Consistent across all the subjects, inter-limb AP impulse measurements explained gait symmetry: the stance time symmetry was significantly correlated with the net inter-limb AP impulse, and the step length symmetry was significantly correlated with braking and propulsive impulse symmetry. The results suggest that it is possible to personalize transfemoral prosthesis control for improved temporal-spatial gait symmetry. However, adjusting prosthesis mechanics alone was insufficient to maximize the gait symmetry. Rather, achieving gait symmetry may require coordination between the wearer's motor control of the intact limb and adaptive control of the prosthetic joints. The results also indicated that the RL-based prosthesis tuning system was a potential tool for studying wearer-prosthesis interactions.


Asunto(s)
Amputados , Miembros Artificiales , Fenómenos Biomecánicos , Marcha , Humanos , Articulación de la Rodilla , Diseño de Prótesis , Caminata
11.
IEEE Trans Cybern ; 50(6): 2346-2356, 2020 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-30668514

RESUMEN

Robotic prostheses deliver greater function than passive prostheses, but we face the challenge of tuning a large number of control parameters in order to personalize the device for individual amputee users. This problem is not easily solved by traditional control designs or the latest robotic technology. Reinforcement learning (RL) is naturally appealing. The recent, unprecedented success of AlphaZero demonstrated RL as a feasible, large-scale problem solver. However, the prosthesis-tuning problem is associated with several unaddressed issues such as that it does not have a known and stable model, the continuous states and controls of the problem may result in a curse of dimensionality, and the human-prosthesis system is constantly subject to measurement noise, environmental change and human-body-caused variations. In this paper, we demonstrated the feasibility of direct heuristic dynamic programming, an approximate dynamic programming (ADP) approach, to automatically tune the 12 robotic knee prosthesis parameters to meet individual human users' needs. We tested the ADP-tuner on two subjects (one able-bodied subject and one amputee subject) walking at a fixed speed on a treadmill. The ADP-tuner learned to reach target gait kinematics in an average of 300 gait cycles or 10 min of walking. We observed improved ADP tuning performance when we transferred a previously learned ADP controller to a new learning session with the same subject. To the best of our knowledge, our approach to personalize robotic prostheses is the first implementation of online ADP learning control to a clinical problem involving human subjects.


Asunto(s)
Dispositivo Exoesqueleto , Prótesis de la Rodilla , Refuerzo en Psicología , Adulto , Algoritmos , Amputados/rehabilitación , Fenómenos Biomecánicos/fisiología , Marcha/fisiología , Humanos , Masculino , Procesamiento de Señales Asistido por Computador , Adulto Joven
12.
J Neurophysiol ; 121(1): 50-60, 2019 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-30379632

RESUMEN

To better understand the neural cortical underpinnings that explain behavioral differences in learning rate, we recorded single-unit activity in primary motor (M1) and secondary motor (M2) areas while rats learned to perform a directional (left or right) operant visuomotor association task. Analysis of neural activity during the early portion of the cue period showed that neural modulation in the motor cortex was most strongly associated with two task factors: the previous trial outcome (success or error) and the current trial's directional choice (left or right). Furthermore, the fast learners, defined as those who had steeper learning curves and required fewer learning sessions to reach criterion performance, encoded the previous trial outcome factor more strongly than the directional choice factor. Conversely, the slow learners encoded directional choice more strongly than previous trial outcome. These differences in task factor encoding were observed in both the percentage of neurons and the neural modulation depth. These results suggest that fast learning is accompanied by a stronger component of previous trial outcome in the modulation representation present in motor cortex, which therefore may be a contributing factor to behavioral differences in learning rate. NEW & NOTEWORTHY We chronically recorded neural activity as rats learned a visuomotor directional choice task from a naive state. Learning rates varied. Single-unit neural modulation of two motor areas revealed that the fast learners encoded previous trial outcome more strongly than directional choice, whereas the reverse was true for slow learners. This finding provides novel evidence that rat learning rate is strongly correlated with the strength of neural modulation by previous trial outcome in motor cortex.


Asunto(s)
Retroalimentación Psicológica/fisiología , Individualidad , Aprendizaje/fisiología , Actividad Motora/fisiología , Corteza Motora/fisiología , Percepción Visual/fisiología , Potenciales de Acción , Animales , Atención/fisiología , Conducta de Elección/fisiología , Electrodos Implantados , Masculino , Neuronas/fisiología , Ratas Long-Evans , Procesamiento de Señales Asistido por Computador
13.
eNeuro ; 5(5)2018.
Artículo en Inglés | MEDLINE | ID: mdl-30221189

RESUMEN

Advances in calcium imaging have made it possible to record from an increasingly larger number of neurons simultaneously. Neuroscientists can now routinely image hundreds to thousands of individual neurons. An emerging technical challenge that parallels the advancement in imaging a large number of individual neurons is the processing of correspondingly large datasets. One important step is the identification of individual neurons. Traditional methods rely mainly on manual or semimanual inspection, which cannot be scaled for processing large datasets. To address this challenge, we focused on developing an automated segmentation method, which we refer to as automated cell segmentation by adaptive thresholding (ACSAT). ACSAT works with a time-collapsed image and includes an iterative procedure that automatically calculates global and local threshold values during successive iterations based on the distribution of image pixel intensities. Thus, the algorithm is capable of handling variations in morphological details and in fluorescence intensities in different calcium imaging datasets. In this paper, we demonstrate the utility of ACSAT by testing it on 500 simulated datasets, two wide-field hippocampus datasets, a wide-field striatum dataset, a wide-field cell culture dataset, and a two-photon hippocampus dataset. For the simulated datasets with truth, ACSAT achieved >80% recall and precision when the signal-to-noise ratio was no less than ∼24 dB.


Asunto(s)
Calcio/metabolismo , Hipocampo/metabolismo , Procesamiento de Imagen Asistido por Computador , Neuroimagen , Neuronas/metabolismo , Algoritmos , Animales , Células Cultivadas , Femenino , Procesamiento de Imagen Asistido por Computador/métodos , Ratones Endogámicos C57BL
14.
IEEE Trans Neural Netw Learn Syst ; 29(7): 2794-2807, 2018 07.
Artículo en Inglés | MEDLINE | ID: mdl-28600262

RESUMEN

Policy iteration approximate dynamic programming (DP) is an important algorithm for solving optimal decision and control problems. In this paper, we focus on the problem associated with policy approximation in policy iteration approximate DP for discrete-time nonlinear systems using infinite-horizon undiscounted value functions. Taking policy approximation error into account, we demonstrate asymptotic stability of the control policy under our problem setting, show boundedness of the value function during each policy iteration step, and introduce a new sufficient condition for the value function to converge to a bounded neighborhood of the optimal value function. Aiming for practical implementation of an approximate policy, we consider using Volterra series, which has been extensively covered in controls literature for its good theoretical properties and for its success in practical applications. We illustrate the effectiveness of the main ideas developed in this paper using several examples including a practical problem of excitation control of a hydrogenerator.

15.
IEEE Trans Neural Netw Learn Syst ; 28(9): 2215-2220, 2017 09.
Artículo en Inglés | MEDLINE | ID: mdl-27416607

RESUMEN

This brief presents a novel application of adaptive dynamic programming (ADP) for optimal adaptive control of powered lower limb prostheses, a type of wearable robots to assist the motor function of the limb amputees. Current control of these robotic devices typically relies on finite state impedance control (FS-IC), which lacks adaptability to the user's physical condition. As a result, joint impedance settings are often customized manually and heuristically in clinics, which greatly hinder the wide use of these advanced medical devices. This simulation study aimed at demonstrating the feasibility of ADP for automatic tuning of the twelve knee joint impedance parameters during a complete gait cycle to achieve balanced walking. Given that the accurate models of human walking dynamics are difficult to obtain, the model-free ADP control algorithms were considered. First, direct heuristic dynamic programming (dHDP) was applied to the control problem, and its performance was evaluated on OpenSim, an often-used dynamic walking simulator. For the comparison purposes, we selected another established ADP algorithm, the neural fitted Q with continuous action (NFQCA). In both cases, the ADP controllers learned to control the right knee joint and achieved balanced walking, but dHDP outperformed NFQCA in this application during a 200 gait cycle-based testing.

16.
IEEE Trans Cybern ; 47(8): 2110-2120, 2017 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-27925603

RESUMEN

In this paper, we present a novel adaptive consensus algorithm for a class of nonlinear multiagent systems with time-varying asymmetric state constraints. As such, our contribution is a step forward beyond the usual consensus stabilization result to show that the states of the agents remain within a user defined, time-varying bound. To prove our new results, the original multiagent system is transformed into a new one. Stabilization and consensus of transformed states are sufficient to ensure the consensus of the original networked agents without violating of the predefined asymmetric time-varying state constraints. A single neural network (NN), whose weights are tuned online, is used in our design to approximate the unknown functions in the agent's dynamics. To account for the NN approximation residual, reconstruction error, and external disturbances, a robust term is introduced into the approximating system equation. Additionally in our design, each agent only exchanges the information with its neighbor agents, and thus the proposed consensus algorithm is decentralized. The theoretical results are proved via Lyapunov synthesis. Finally, simulations are performed on a nonlinear multiagent system to illustrate the performance of our consensus design scheme.

17.
IEEE Trans Cybern ; 46(1): 85-95, 2016 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25667363

RESUMEN

In this paper, we present a novel tracking controller for a class of uncertain nonaffine systems with time-varying asymmetric output constraints. Firstly, the original nonaffine constrained (in the sense of the output signal) control system is transformed into a output-feedback control problem of an unconstrained affine system in normal form. As a result, stabilization of the transformed system is sufficient to ensure constraint satisfaction. It is subsequently shown that the output tracking is achieved without violation of the predefined asymmetric time-varying output constraints. Therefore, we are capable of quantifying the system performance bounds as functions of time on both transient and steady-state stages. Furthermore, the transformed system is linear with respect to a new input signal and the traditional backstepping scheme is avoided, which makes the synthesis extremely simplified. All the signals in the closed-loop system are proved to be semi-globally, uniformly, and ultimately bounded via Lyapunov synthesis. Finally, the simulation results are presented to illustrate the performance of the proposed controller.


Asunto(s)
Redes Neurales de la Computación , Dinámicas no Lineales , Procesamiento de Señales Asistido por Computador , Modelos Teóricos , Torque , Viento
18.
IEEE Trans Neural Netw Learn Syst ; 27(8): 1748-61, 2016 08.
Artículo en Inglés | MEDLINE | ID: mdl-26087500

RESUMEN

The emergence of smart grids has posed great challenges to traditional power system control given the multitude of new risk factors. This paper proposes an online supplementary learning controller (OSLC) design method to compensate the traditional power system controllers for coping with the dynamic power grid. The proposed OSLC is a supplementary controller based on approximate dynamic programming, which works alongside an existing power system controller. By introducing an action-dependent cost function as the optimization objective, the proposed OSLC is a nonidentifier-based method to provide an online optimal control adaptively as measurement data become available. The online learning of the OSLC enjoys the policy-search efficiency during policy iteration and the data efficiency of the least squares method. For the proposed OSLC, the stability of the controlled system during learning, the monotonic nature of the performance measure of the iterative supplementary controller, and the convergence of the iterative supplementary controller are proved. Furthermore, the efficacy of the proposed OSLC is demonstrated in a challenging power system frequency control problem in the presence of high penetration of wind generation.

19.
Annu Int Conf IEEE Eng Med Biol Soc ; 2016: 5071-5074, 2016 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-28269408

RESUMEN

In this study, we developed and tested a novel adaptive controller for powered transfemoral prostheses. Adaptive dynamic programming (ADP) was implemented within the prosthesis control to complement the existing finite state impedance control (FS-IC) in a prototypic active-transfemoral prosthesis (ATP). The ADP controller interacts with the human user-prosthesis system, observes the prosthesis user's dynamic states during walking, and learns to personalize user performance properties via online adaptation to meet the individual user's objectives. The new ADP controller was preliminarily tested on one able-bodied subject walking on a treadmill. The test objective was for the user to approach normative knee kinematics by tuning the FS-IC impedance parameters via ADP. The results showed the ADP was able to adjust the prosthesis controller to generate the desired normative knee kinematics within 10 minutes. In the meantime, the FS-IC impedance parameters converged at the end of the adaptive tuning procedure while maintaining the desired human-prosthesis performance. This study demonstrated the feasibility of ADP for adaptive control of a powered lower limb prosthesis. Future research efforts will address several important issues in order to validate the system on amputees. To achieve this goal, human user-centered performance objective functions will be developed, tested, and used in this adaptive controller design.


Asunto(s)
Algoritmos , Miembros Artificiales , Fémur/cirugía , Amputados , Fenómenos Biomecánicos , Impedancia Eléctrica , Humanos , Diseño de Prótesis , Caminata
20.
Front Syst Neurosci ; 9: 28, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25798093

RESUMEN

Animals learn to choose a proper action among alternatives to improve their odds of success in food foraging and other activities critical for survival. Through trial-and-error, they learn correct associations between their choices and external stimuli. While a neural network that underlies such learning process has been identified at a high level, it is still unclear how individual neurons and a neural ensemble adapt as learning progresses. In this study, we monitored the activity of single units in the rat medial and lateral agranular (AGm and AGl, respectively) areas as rats learned to make a left or right side lever press in response to a left or right side light cue. We noticed that rat movement parameters during the performance of the directional choice task quickly became stereotyped during the first 2-3 days or sessions. But learning the directional choice problem took weeks to occur. Accompanying rats' behavioral performance adaptation, we observed neural modulation by directional choice in recorded single units. Our analysis shows that ensemble mean firing rates in the cue-on period did not change significantly as learning progressed, and the ensemble mean rate difference between left and right side choices did not show a clear trend of change either. However, the spatiotemporal firing patterns of the neural ensemble exhibited improved discriminability between the two directional choices through learning. These results suggest a spatiotemporal neural coding scheme in a motor cortical neural ensemble that may be responsible for and contributing to learning the directional choice task.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...