RMSProp

The RMSProp optimization algorithm has first been introduced by Hinton [1]. This algorithm keeps track of the moving average of the square of the gradient:

$$s = \beta s + (1-\beta)[\nabla_{\theta}\mathcal{L}(\hat{Y},Y)]^2$$

where $\beta$ is the decay rate of the moving average and the gradient of the loss function $\nabla_{\theta}\mathcal{L}(\hat{Y},Y)$ is computed with the backpropagation algorithm. The parameters are then updated according to

$$\theta = \theta - \alpha \frac{\nabla_{\theta}\mathcal{L}(\hat{Y},Y)}{\sqrt{s}+\epsilon}$$

where $\alpha$ is the learning rate and $\epsilon$ is a small quantity added to ensure numerical stability. Note that $\sqrt{s}$ is applied element-wise. By default, the hyperparameters of the optimizer are set to

$$\begin{align} \beta &= 0.9 \newline \epsilon &= 10^{-8} \end{align} $$

References

[1] Lecture notes: http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf, accessed November 2019.