ReLU

Rectified Linear Unit

The ReLU or rectified linear unit is one of the most widely used activation functions. This function is defined as follows:

$$g(x) = \max(0, x)$$

That is, the function returns the input value if it is positive and zero otherwise.

Its derivative can be expressed as:

$$\frac{\partial g}{\partial x} = \begin{cases} 1 & \text{if } x > 0 \newline \text{undefined} & \text{if } x = 0 \newline 0 & \text{if } x < 0 \end{cases}$$

Since the case $x=0$ is undefined, the derivative can be arbitrarily set to 1 for this value such that

$$\frac{\partial g}{\partial x} = \begin{cases} 1 & \text{if } x \geq 0 \newline 0 & \text{if } x < 0 \end{cases}$$

The ReLU activation function has been shown to help with the vanishing gradient problem that may arise when training very deep networks [1].

References

[1] Glorot X., Bordes A., Bengio Y., “Deep Sparse Rectifier Neural Networks”, Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, 2011, Fort Lauderdale, FL, USA.