Owing to the introduction of the traditional PID control parameters (as constrained by experience knowledge), the exploration space for the RL is greatly reduced, meaning that the optimal control policy can be learned quickly. Based on the goal of completing a lap driving task, the results for ...