In Chinese, Tianshou means divinely ordained and is derived from the gift of being born. Tianshou is a reinforcement learning platform, and the nature of reinforcement learning is to not explicitly learn from a teacher (as in supervised learning) but rather to learn by oneself through constant interaction with the environment. It thus complements the scope of sensAI: The Python Library for Sensible AI which is named after a Japanese term for teacher (“sensei”).
Supported Algorithms
Tianshou supports a wide variety of algorithms:
- Deep Q-Network (DQN)
- Double DQN
- Dueling DQN
- Branching DQN
- Categorical DQN (C51)
- Rainbow DQN (Rainbow)
- Quantile Regression DQN (QRDQN)
- Implicit Quantile Network (IQN)
- Fully-parameterized Quantile Function (FQF)
- Policy Gradient (PG)
- Natural Policy Gradient (NPG)
- Advantage Actor-Critic (A2C)
- Trust Region Policy Optimization (TRPO)
- Proximal Policy Optimization (PPO)
- Deep Deterministic Policy Gradient (DDPG)
- Twin Delayed DDPG (TD3)
- Soft Actor-Critic (SAC)
- Randomized Ensembled Double Q-Learning (REDQ)
- Discrete Soft Actor-Critic (SAC-Discrete)
- Vanilla Imitation Learning
- Batch-Constrained deep Q-Learning (BCQ)
- Conservative Q-Learning (CQL)
- Twin Delayed DDPG with Behavior Cloning (TD3+BC)
- Discrete Batch-Constrained deep Q-Learning (BCQ-Discrete)
- Discrete Conservative Q-Learning (CQL-Discrete)
- Discrete Critic Regularized Regression (CRR-Discrete)
- Generative Adversarial Imitation Learning (GAIL)
- Prioritized Experience Replay (PER)
- Generalized Advantage Estimator (GAE)
- Posterior Sampling Reinforcement Learning (PSRL)
- Intrinsic Curiosity Module (ICM)
- Hindsight Experience Replay (HER)