Safe and efficient deep reinforcement learning

This 2-day dive into deep RL for decision and control is suitable for engineers who want to solve real-world control problems with efficient and safe methods.
This workshop is always offered back-to-back with a one day training on Machine Learning & Control. Due to the overlap in applications, we recommend attending the two together. A discount applies in this case.

The task of performing a sequence of decisions in an optimal way is ubiquitous in industrial applications. From controlling plasma in a fusion reactor, to driving cars or playing games: automatically performing sequences of decisions is required in most domains. In recent years, more and more tasks that were not traditionally thought of as sequential decision-making were formulated as such, for example chip design or video compression.

This training gives introduction to deep reinforcement learning (RL) for practitioners, where particular focus is placed on learning safe policies (relevant in applications with constrained control spaces) and sample efficient algorithms. We will discuss the different forms that a decision process can take, how a variety of tasks can be formulated as such processes, and a selection of the multitude of RL methods that can be used for optimally solving them.

As in Machine Learning and Control, we will analyse the stability during training as well as the gap between simulation and reality (sim2real).

Learning outcomes

Participants will learn about the most important recent advances in deep RL, and obtain an intuition for when and how to use RL techniques (and also, maybe more importantly, when not to). The learnings include:

  • Basics of deep model-free and model-based RL
  • Which problems are suitable for an approach RL
  • Exploration vs. exploitation
  • On-policy, off-policy and offline learning
  • Various deep RL algorithms
  • Effects of reward engineering
  • Sample efficiency, safety and robustness in RL


  • “Classical” deep RL: Different Variants of Q-Learning
  • Combining Q-learning with planning: self-play
  • Continuous actions: DDPG and SAC
  • Learning safe policies in constrained control spaces
  • Sample efficiency in model-free RL
  • Imitation learning and offline RL
  • Software for RL 1: designing environments and evaluation scenarios
  • Software for RL 2: parallelization, vectorization
  • Basics of deep model-based RL, an introduction to mu-zero

In this series