The usual reinforcement learning paradigm requires a problem to be specified in terms of a Markov decision process. Associated with each transition, the environment emits a (meaningful) reward signal, based only on the current state and action. In reality, however, not every problem can be formulated in this way. A reward for an episodic task could only be meaningful once a whole episode of the task is completed. Additionally, it may not even be Markovian at all, taking the whole trajectory into account. In these cases, step-based (deep) reinforcement learning algorithms such as PPO or SAC often yield sub-optimal performance or fail completely. A different approach is to parameterize whole trajectories or sub-trajectories using movement primitives and learn the parameters of this “high-level” policy. The actions for the system are then determined by a “low-level” trajectory tracking controller such as a PD controller.
In this seminar, we introduce the concepts of movement primitives as parameterized trajectory descriptors and show different algorithms for optimizing their parameters. Results are shown on continuous state- and action-space tasks from the domain of robotics.