Mean Absolute Error

The mean absolute error loss function computes the mean of the absolute values of the error. Assuming that the predicted values are $\hat{Y} = \begin{bmatrix} \boldsymbol{\hat{y}}^{(1)} & \boldsymbol{\hat{y}}^{(2)} & \dots & \boldsymbol{\hat{y}}^{(m)} \end{bmatrix}$ and that the true values are $Y = \begin{bmatrix} \boldsymbol{y}^{(1)} & \boldsymbol{y}^{(2)} & \dots & \boldsymbol{y}^{(m)} \end{bmatrix}$, then the mean absolute error is given by

$$\mathcal{L}(\hat{Y},Y)=\frac{1}{m}\sum_{i=1}^m|\boldsymbol{\hat{y}}^{(i)}-\boldsymbol{y}^{(i)}|$$

where $m$ is the number of samples in the mini-batch. Since the derivative of the absolute value is given by

$$\frac{d|x|}{dx} = \frac{x}{|x|} = \begin{cases} 1 & \text{if } x > 0 \newline \text{undefined} & \text{if } x = 0 \newline -1 & \text{if } x < 0 \end{cases}$$

the gradient of $\mathcal{L}$ with respect to $\hat{Y}$ yields

$$\nabla_{\hat{Y}}\mathcal{L}(\hat{Y}, Y)=\begin{cases}\frac{1}{m} &\text{if } \boldsymbol{\hat{y}}^{(i)} \ge \boldsymbol{y}^{(i)} \newline -\frac{1}{m} &\text{if } \boldsymbol{\hat{y}}^{(i)} < \boldsymbol{y}^{(i)} \end{cases}$$

where the case $\boldsymbol{\hat{y}}^{(i)} = \boldsymbol{y}^{(i)}$ is arbitrarily set to $\frac{1}{m}$.