Episode-based RL with Movement Primitive | TransferLab — appliedAI Institute

Episode-based RL with Movement Primitive

08 Feb 24 15:00 UTC

Download the slides

Series: Reinforcement Learning

Series: Reinforcement Learning

Recent and popular advances in Reinforcement Learning are known to be data-hungry. Attempts to handle this deficit include developing complex simulators while improving the sim2real transition of models, explicitly modelling the …

Maximilian Hüttenrauch (appliedAI Initiative)

After taking machine learning classes during his studies of electrical engineering, Max decided to pursue a Ph.D. in computer science with a focus on reinforcement learning, which he is currently finishing. During his Ph.D., he mainly worked on deep reinforcement learning for decentralized swarm control and black-box optimization for movement primitives in robotic manipulation. Starting in 2023, he continues working on reinforcement related topics as an AI Researcher at appliedAI.

Maximilian Hüttenrauch, AI Researcher at appliedAI Institute, talks about episode-based reinforcement learning with movement primitive.

Abstract

The usual reinforcement learning paradigm requires a problem to be specified in terms of a Markov decision process. Associated with each transition, the environment emits a (meaningful) reward signal, based only on the current state and action. In reality, however, not every problem can be formulated in this way. A reward for an episodic task could only be meaningful once a whole episode of the task is completed. Additionally, it may not even be Markovian at all, taking the whole trajectory into account. In these cases, step-based (deep) reinforcement learning algorithms such as PPO or SAC often yield sub-optimal performance or fail completely. A different approach is to parameterize whole trajectories or sub-trajectories using movement primitives and learn the parameters of this “high-level” policy. The actions for the system are then determined by a “low-level” trajectory tracking controller such as a PD controller.

In this seminar, we introduce the concepts of movement primitives as parameterized trajectory descriptors and show different algorithms for optimizing their parameters. Results are shown on continuous state- and action-space tasks from the domain of robotics.

In this series →

Reinforcement Learning

Trainings: Safe and efficient deep reinforcement … Trainings: Machine Learning and Control Software: Tianshou: An elegant deep reinforcement … Seminar: Dense Rewards and Continual RL for … Seminar: Double Gumbel Q-learning Pills: Hyperparameters in Reinforcement … Pills: Exploiting past success in Off-Policy … Pills: Jump-Start Reinforcement Learning Pills: Implicit Q Learning Pills: Advantage-Induced Policy Alignment Pills: Critic Regularized Regression Pills: Deep Reinforcement Learning at the Edge … Pills: Planning with Diffusion for Flexible … Blog: Natural, Trust Region and Proximal … Seminar: Problems and solutions in offline … Seminar: Neural Network Dynamics for Model-Based … Seminar: A short introduction to model-based …