Learning to learn and to optimize

  1. From Gradient Descent to Stochastic Gradient Descent (SGD)
  2. RMSProp and Adam
  3. Learning to learn by gradient descent by gradient descent
  4. Learning to optimize

References