RMSProp
RMSProp
The RMSProp optimization algorithm has first been introduced by Hinton [1]. This algorithm keeps track of the moving average of the square of the gradient:
s=βs+(1−β)[∇θL(ˆY,Y)]2
where β is the decay rate of the moving average and the gradient of the loss function ∇θL(ˆY,Y) is computed with the backpropagation algorithm. The parameters are then updated according to
θ=θ−α∇θL(ˆY,Y)√s+ϵ
where α is the learning rate and ϵ is a small quantity added to ensure numerical stability. Note that √s is applied element-wise. By default, the hyperparameters of the optimizer are set to
β=0.9ϵ=10−8
References
[1] Lecture notes: http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf, accessed November 2019.