Problem setting
In regression, the goal of the predictive model is to predict a continuous valued output for a given multivariate instance.
In this article, for simplicity, we will work with real-valued input observations.
Consider such an instance \( \vx \in \real^\ndim \), a vector consisting of \( \ndim \) features, \(\vx = [x_1, x_2, \ldots, x_\ndim] \).
We need to predict a real-valued output \( \hat{y} \in \real \) that is as close as possible to the true target \( y \in \real \).
The hat \( \hat{ } \) denotes that \( \hat{y} \) is an estimate, to distinguish it from the truth.
In the standard linear regression model with Gaussian noise, the actual target \( y \) is related to the input \( \vx \) through some function \( f: \real^\ndim \to \real \) such that
$$ y = f(\vx) + \epsilon $$
where, \( \epsilon \) is zero-centered Gaussian noise with variance \( \sigma^2 \). This means, \( \epsilon \sim \Gauss(0,\sigma^2) \).
The predictive model is inferred over a collection of supervised observations provided as tuples \( (\vx_i,y_i) \) containing the instance vector \( \vx_i \) and the true target variable \( y_i \).
This collection of labeled observations is known as the training set \( \labeledset = \set{(\vx_1,y_1), \ldots (\vx_\nlabeled,y_\nlabeled)} \).
Typically, these examples are supposed to be independent and identically distributed random variables.