# Evaluation metrics for regression

## Introduction

An informed choice of a suitable metric can help define an appropriate loss to optimize the model for a given task during training. By rigorously evaluating to understand the generalization performance of a model using techniques such as cross-validation, a trained model may be identified to be superior to other models and hence chosen to be deployed for the particular task. Making these informed choices during training and testing is possible with a clear understanding of evaluation metrics.

In this article we will cover the many metrics to evaluate performance of machine learning models for classification. We will also comment on their suitability for various tasks and scenarios. We have a separate comprehensive article on evaluation metrics for regression.

## Prerequisites

To understand the various evaluation metrics for regression, we recommend familiarity with the concepts in

• Probability: A sound understanding of conditional and marginal probabilities and Bayes Theorem is desirable.
• Introduction to machine learning: An introduction to basic concepts in machine learning such as classification, training instances, features, and feature types.

Follow the above links to first get acquainted with the corresponding concepts.

## Problem setting

To evaluate the performance of a model, we compare the predictions of the model to the actual target values on set of examples. If the set of examples has been used for training the model, then we are effectively measuring the performance on the training set. If the set of examples has not been used for training the model, the so called unseen or held-out examples, then the metric is a performance on the test set.

In regression, the goal of the predictive model is to predict a continuous valued output for a given multivariate instance. We need to predict a real-valued output $\hat{y} \in \real$ that is as close as possible to the true target $y \in \real$. The hat $\hat{ }$ denotes that $\hat{y}$ is an estimate, to distinguish it from the truth.

## Mean squared error (MSE)

The most popular evaluation metric for regression problems is the mean squared error (MSE). As the name implies, it the mean of squared differences between the actual and predicted values of the target variable.

If $y_\nlabeledsmall$ indicates the actual value of the target variable and $\yhat_\nlabeledsmall$ denotes the predicted value, for $\nlabeledsmall=1,\ldots,\nlabeled$, then the mean squared error of these predictions is

$$\text{MSE} = \frac{1}{\nlabeled} \sum_{\nlabeledsmall=1}^\nlabeled \left(y_\nlabeledsmall - \yhat_\nlabeledsmall\right)^2$$

Due to the squaring of the errors, examples with huge differences in the actual and predicted values dominate the overall error values. This could be a challenge if the evaluation set has outliers. It is usually better to use mean absolute error (MAE), as we outline next.

## Mean absolute error (MAE)

As the name implies, the mean absolute error (MAE) is computed as the mean of the absolute differences between the predicted and actual value of the target variable.

If $y_\nlabeledsmall$ indicates the actual value of the target variable and $\yhat_\nlabeledsmall$ denotes the predicted value, for $\nlabeledsmall=1,\ldots,\nlabeled$, then the MAE of these predictions is

$$\text{MAE} = \frac{1}{\nlabeled} \sum_{\nlabeledsmall=1}^\nlabeled \left| y_\nlabeledsmall - \yhat_\nlabeledsmall \right|$$

Unlike the mean squared error, the MAE treats all differences equally and is robust to outliers in the evaluation set. Moreover, it has a simpler interpretation compared to MSE. For example, if we are predicting the time to arrival (in minutes) for flights at an airport, then an MAE of 5 should be understood as the statement On an average, the model is incorrect by +/- 5 minutes compared to the actual arrival time. Such simpler statements cannot be made with respect to MSE.