Fitted Q-iteration by Advantage Weighted Regression | TransferLab

Reference

Fitted Q-iteration by Advantage Weighted Regression, Gerhard Neumann, Jan Peters. Advances in Neural Information Processing Systems(2008)

Publication

Abstract

Recently, fitted Q-iteration (FQI) based methods have become more popular due to their increased sample efficiency, a more stable learning process and the higher quality of the resulting policy. However, these methods remain hard to use for continuous action spaces which frequently occur in real-world tasks, e.g., in robotics and other technical applications. The greedy action selection commonly used for the policy improvement step is particularly problematic as it is expensive for continuous actions, can cause an unstable learning process, introduces an optimization bias and results in highly non-smooth policies unsuitable for real-world systems. In this paper, we show that by using a soft-greedy action selection the policy improvement step used in FQI can be simplified to an inexpensive advantage-weighted regression. With this result, we are able to derive a new, computationally efficient FQI algorithm which can even deal with high dimensional action spaces.

Content citing this item

Pill

Advantage-Induced Policy Alignment

Building on the classic results on reward weighted regression and its more recent adaptation to deep learning, a new algorithm called …

Reinforcement Learning

Jul 14, 2023

All works referenced in our site...