POLICY GRADIENTS IN DEEP REINFORCEMENT LEARNING

Published in

Analytics Vidhya

9 min readJun 13, 2021

In 2016, a deep learning Reinforcement agent AlphaGobeat Lee Sedol, who is a professional Go player of 9 dan rank (the highest honor in the field of Go). Go was regarded as a game far-fetched from computer algorithms because it has an artistic essence to it. But AlphaGo, developed by Google DeepMind was able to beat Lee Sedol 4–1 in a 5-match series. This event really sent a shrill to human intelligence all over the world, it becomes apparent from the fact that Lee Sedol took a cigarette break during one of the matches.

POLICY GRADIENTS IN DEEP REINFORCEMENT LEARNING

Written by Astarag Mohapatra