Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
IEEE Trans Pattern Anal Mach Intell ; 45(5): 5534-5548, 2023 May.
Artículo en Inglés | MEDLINE | ID: mdl-36260585

RESUMEN

Solving the Hamilton-Jacobi-Bellman equation is important in many domains including control, robotics and economics. Especially for continuous control, solving this differential equation and its extension the Hamilton-Jacobi-Isaacs equation, is important as it yields the optimal policy that achieves the maximum reward on a give task. In the case of the Hamilton-Jacobi-Isaacs equation, which includes an adversary controlling the environment and minimizing the reward, the obtained policy is also robust to perturbations of the dynamics. In this paper we propose continuous fitted value iteration (cFVI) and robust fitted value iteration (rFVI). These algorithms leverage the non-linear control-affine dynamics and separable state and action reward of many continuous control problems to derive the optimal policy and optimal adversary in closed form. This analytic expression simplifies the differential equations and enables us to solve for the optimal value function using value iteration for continuous actions and states as well as the adversarial case. Notably, the resulting algorithms do not require discretization of states or actions. We apply the resulting algorithms to the Furuta pendulum and cartpole. We show that both algorithms obtain the optimal policy. The robustness Sim2Real experiments on the physical systems show that the policies successfully achieve the task in the real-world. When changing the masses of the pendulum, we observe that robust value iteration is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm. Videos of the experiments are shown at https://sites.google.com/view/rfvi.

2.
Surg Innov ; 30(1): 94-102, 2023 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-35503302

RESUMEN

Background. The revolutions in AI hold tremendous capacity to augment human achievements in surgery, but robust integration of deep learning algorithms with high-fidelity surgical simulation remains a challenge. We present a novel application of reinforcement learning (RL) for automating surgical maneuvers in a graphical simulation.Methods. In the Unity3D game engine, the Machine Learning-Agents package was integrated with the NVIDIA FleX particle simulator for developing autonomously behaving RL-trained scissors. Proximal Policy Optimization (PPO) was used to reward movements and desired behavior such as movement along desired trajectory and optimized cutting maneuvers along the deformable tissue-like object. Constant and proportional reward functions were tested, and TensorFlow analytics was used to informed hyperparameter tuning and evaluate performance.Results. RL-trained scissors reliably manipulated the rendered tissue that was simulated with soft-tissue properties. A desirable trajectory of the autonomously behaving scissors was achieved along 1 axis. Proportional rewards performed better compared to constant rewards. Cumulative reward and PPO metrics did not consistently improve across RL-trained scissors in the setting for movement across 2 axes (horizontal and depth).Conclusion. Game engines hold promising potential for the design and implementation of RL-based solutions to simulated surgical subtasks. Task completion was sufficiently achieved in one-dimensional movement in simulations with and without tissue-rendering. Further work is needed to optimize network architecture and parameter tuning for increasing complexity.


Asunto(s)
Procedimientos Quirúrgicos Robotizados , Humanos , Refuerzo en Psicología , Recompensa , Algoritmos , Simulación por Computador
3.
Artículo en Inglés | MEDLINE | ID: mdl-36331649

RESUMEN

A key challenge in offline reinforcement learning (RL) is how to ensure the learned offline policy is safe, especially in safety-critical domains. In this article, we focus on learning a distributional value function in offline RL and optimizing a worst-case criterion of returns. However, optimizing a distributional value function in offline RL can be hard, since the crossing quantile issue is serious, and the distribution shift problem needs to be addressed. To this end, we propose monotonic quantile network (MQN) with conservative quantile regression (CQR) for risk-averse policy learning. First, we propose an MQN to learn the distribution over returns with non-crossing guarantees of the quantiles. Then, we perform CQR by penalizing the quantile estimation for out-of-distribution (OOD) actions to address the distribution shift in offline RL. Finally, we learn a worst-case policy by optimizing the conditional value-at-risk (CVaR) of the distributional value function. Furthermore, we provide theoretical analysis of the fixed-point convergence in our method. We conduct experiments in both risk-neutral and risk-sensitive offline settings, and the results show that our method obtains safe and conservative behaviors in robotic locomotion tasks.

4.
Auton Robots ; 46(1): 115-147, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-34366568

RESUMEN

Assistive robot arms enable people with disabilities to conduct everyday tasks on their own. These arms are dexterous and high-dimensional; however, the interfaces people must use to control their robots are low-dimensional. Consider teleoperating a 7-DoF robot arm with a 2-DoF joystick. The robot is helping you eat dinner, and currently you want to cut a piece of tofu. Today's robots assume a pre-defined mapping between joystick inputs and robot actions: in one mode the joystick controls the robot's motion in the x-y plane, in another mode the joystick controls the robot's z-yaw motion, and so on. But this mapping misses out on the task you are trying to perform! Ideally, one joystick axis should control how the robot stabs the tofu, and the other axis should control different cutting motions. Our insight is that we can achieve intuitive, user-friendly control of assistive robots by embedding the robot's high-dimensional actions into low-dimensional and human-controllable latent actions. We divide this process into three parts. First, we explore models for learning latent actions from offline task demonstrations, and formalize the properties that latent actions should satisfy. Next, we combine learned latent actions with autonomous robot assistance to help the user reach and maintain their high-level goals. Finally, we learn a personalized alignment model between joystick inputs and latent actions. We evaluate our resulting approach in four user studies where non-disabled participants reach marshmallows, cook apple pie, cut tofu, and assemble dessert. We then test our approach with two disabled adults who leverage assistive devices on a daily basis.

5.
IEEE Trans Pattern Anal Mach Intell ; 44(7): 3883-3894, 2022 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-33513098

RESUMEN

Unsupervised landmark learning is the task of learning semantic keypoint-like representations without the use of expensive input keypoint annotations. A popular approach is to factorize an image into a pose and appearance data stream, then to reconstruct the image from the factorized components. The pose representation should capture a set of consistent and tightly localized landmarks in order to facilitate reconstruction of the input image. Ultimately, we wish for our learned landmarks to focus on the foreground object of interest. However, the reconstruction task of the entire image forces the model to allocate landmarks to model the background. Using a motion-based foreground assumption, this work explores the effects of factorizing the reconstruction task into separate foreground and background reconstructions in an unsupervised way, allowing the model to condition only the foreground reconstruction on the unsupervised landmarks. Our experiments demonstrate that the proposed factorization results in landmarks that are focused on the foreground object of interest when measured against ground-truth foreground masks. Furthermore, the rendered background quality is also improved as ill-suited landmarks are no longer forced to model this content. We demonstrate this improvement via improved image fidelity in a video-prediction task. Code is available at https://github.com/NVIDIA/UnsupervisedLandmarkLearning.

6.
J Appl Clin Med Phys ; 16(1): 5168, 2015 Jan 08.
Artículo en Inglés | MEDLINE | ID: mdl-25679174

RESUMEN

The purpose of this study was to evaluate the radiation attenuation properties of PC-ISO, a commercially available, biocompatible, sterilizable 3D printing material, and its suitability for customized, single-use gynecologic (GYN) brachytherapy applicators that have the potential for accurate guiding of seeds through linear and curved internal channels. A custom radiochromic film dosimetry apparatus was 3D-printed in PC-ISO with a single catheter channel and a slit to hold a film segment. The apparatus was designed specifically to test geometry pertinent for use of this material in a clinical setting. A brachytherapy dose plan was computed to deliver a cylindrical dose distribution to the film. The dose plan used an 192Ir source and was normalized to 1500 cGy at 1 cm from the channel. The material was evaluated by comparing the film exposure to an identical test done in water. The Hounsfield unit (HU) distributions were computed from a CT scan of the apparatus and compared to the HU distribution of water and the HU distribution of a commercial GYN cylinder applicator. The dose depth curve of PC-ISO as measured by the radiochromic film was within 1% of water between 1 cm and 6 cm from the channel. The mean HU was -10 for PC-ISO and -1 for water. As expected, the honeycombed structure of the PC-ISO 3D printing process created a moderate spread of HU values, but the mean was comparable to water. PC-ISO is sufficiently water-equivalent to be compatible with our HDR brachytherapy planning system and clinical workflow and, therefore, it is suitable for creating custom GYN brachytherapy applicators. Our current clinical practice includes the use of custom GYN applicators made of commercially available PC-ISO when doing so can improve the patient's treatment. 


Asunto(s)
Braquiterapia/instrumentación , Braquiterapia/métodos , Dosimetría por Película , Neoplasias de los Genitales Femeninos/radioterapia , Radioisótopos de Iridio/uso terapéutico , Planificación de la Radioterapia Asistida por Computador/métodos , Radioterapia de Intensidad Modulada/métodos , Simulación por Computador , Femenino , Humanos , Método de Montecarlo , Dosificación Radioterapéutica , Tomografía Computarizada por Rayos X
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...