Using trl you can run one of the most popular Deep RL algorithms, PPO, in a distributed manner or on a single device! We leverage accelerate from the Hugging Face ecosystem to make this possible, so that any user can scale up the experiments up to an interesting scale. Fine-tuning a la...