Since the state value function 𝑉𝜓𝑖(𝑠𝑡)Vψi(st) approximates the reward sum using the entropy-augmented reward in SAC, it is also considered in the advantage function. The SAC with the proposed adaptive algorithm is summarized in Algorithm A2. 5. Experiment in the Environments ...