In recent years, model free reinforcement learning has had great success in controlling physically simulated environments. This is largely thanks to the development of powerful algorithms such as proximal policy optimisation (PPO) [Sch17P] and soft actor critic methods (SAC) [Haa18aS]. Nevertheless, these algorithms often remain prohibitively expensive due to their high sample complexity: even simple control problems may require millions of training steps.
Most 3D physics engines treat objects (such as robots) as “trees” of rigid bodies, with each branch having 6 degrees of freedom (position and rotation), and with positions and velocities constrained, e.g. to prevent the bodies from penetrating each other. This “linear complementarity problem” (LCP) has been studied extensively over the course of many decades [Fea08R], with the most recent studies being on numerical stability and speed. Different simulators, like Mujoco, PhysX or Bullet treat contacts and friction in different ways, with higher accuracy often coming at the cost of throughput.
One line of recent development involves the differentiability of the parameters of such engines. Similarly to how automatic differentiation libraries, like Tensorflow or PyTorch, enable efficient differentiation of complex functions, and thus can train neural networks’ parameters to minimise a loss function, the parameters of physics engines could also be optimised to find the optimal friction, body structure, initial position, joint torque, etc. for a robot to solve a certain task. In the simplest case, minimising a loss through gradient descent on the simulation parameters is enough to solve the control problem, without the need for a neural network. This constitutes a fully model-based approach to robotic control.
Nevertheless, model-free approaches still hold the highest potential for efficiency and accuracy in highly complex environments. Differentiable environments will greatly help in reducing the drop in performance (the so called sim-to-real gap) that models trained in a virtual environment witness when deployed in the real world. In particular, it will be possible to fine tune the friction and contact forces of custom-made environments directly from video input, as done in a recent paper [Jat21G].
Brax [Fre21B] is one of the newest differentiable rigid body simulators. Written in JAX, it puts the engine and RL optimizer together on the same GPU/TPU chip, obtaining speed-ups of RL training of up to 100-1000x. Such speed comes at a slight cost in accuracy, as contacts and friction are computed within the simplifying framework of “spring joints”. Other simulators, like Mujoco, rely on more accurate methods, but are much slower or not fully differentiable. The choice of one simulator over another depends on the problem at hand: typically, planning and navigation do not require complex contact modelling, while manipulation and grasping do.
If you feel like giving Brax a try, you can run it for free in this colab notebook. You can try it out on the OpenAI gym environments and even train some PyTorch models. Another example is related to analytic policy gradient, i.e. the possibility to train a neural network directly using the gradients returned by the simulator. It is interesting to notice how we can leverage the differentiability of the simulator for all sorts of hybrid approaches between classic control and neural networks. Finally, if you want to dig deeper, you can build a brand-new environment and run it on your local machine: Brax is open-source!