Double Gumbel Q-Learning

Reference

Double Gumbel Q-Learning, David Yu-Tung Hui, Aaron C. Courville, Pierre-Luc Bacon. Advances in Neural Information Processing Systems(2023)

Abstract

We show that Deep Neural Networks introduce two heteroscedastic Gumbel noise sources into Q-Learning. To account for these noise sources, we propose Double Gumbel Q-Learning, a Deep Q-Learning algorithm applicable for both discrete and continuous control. In discrete control, we derive a closed-form expression for the loss function of our algorithm. In continuous control, this loss function is intractable and we therefore derive an approximation with a hyperparameter whose value regulates pessimism in Q-Learning. We present a default value for our pessimism hyperparameter that enables DoubleGum to outperform DDPG, TD3, SAC, XQL, quantile regression, and Mixture-of-Gaussian Critics in aggregate over 33 tasks from DeepMind Control, MuJoCo, MetaWorld, and Box2D and show that tuning this hyperparameter may further improve sample efficiency.

Content citing this item

Seminar

David Yu-Tung Hui, MILA, will talk about his work presented at NeurIPS 2023, Double Gumbel Q-Learning, a Deep Q-Learning algorithm …

Reinforcement Learning

Jan 25, 2024

All works referenced in our site...