Dec 25, 2022
5 stories
1 save
This covers the SAC algorithm which uses entropy in its reward function to maximize exploration
It contains PPO and TRPO and I describe trust region methods can enhance the optimization process in deep learning
DDPG is an extension of Policy gradient algorithms which is an Actor-critic model. You will learn about the improvements of DDPG over vanilla PG algorithms
Policy gradient algorithms encourage exploration over value-iteration methods. Know about its advantages and disadvantages in this article