RMSProp

RMSProp

The RMSProp optimization algorithm has first been introduced by Hinton [1]. This algorithm keeps track of the moving average of the square of the gradient:

s=βs+(1β)[θL(ˆY,Y)]2

where β is the decay rate of the moving average and the gradient of the loss function θL(ˆY,Y) is computed with the backpropagation algorithm. The parameters are then updated according to

θ=θαθL(ˆY,Y)s+ϵ

where α is the learning rate and ϵ is a small quantity added to ensure numerical stability. Note that s is applied element-wise. By default, the hyperparameters of the optimizer are set to

β=0.9ϵ=108

References

[1] Lecture notes: http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf, accessed November 2019.