The model selection recipe
Model selection is straightforward.
- To choose suitable settings for hyperparameter, select those that can help the model achieve the best predictive performance.
- When choosing among several model families, again, select the family that (after hyperparameter tuning) with the best predictive performance.
Seems, the most important step in model selection is actually estimating predictive performance.
We have explained an elaborate list of evaluation metrics for classification and similarly, for regression.
So, we have metrics for predictive performance.
But how do we measure it?
A naive approach would be evaluating predictive performance on training data.
The problem with this approach is that the model has already seen all examples from the training set.
The model may have just memorized a direct mapping from input instance to its output target variable, without learning a general signature or pattern for this mapping.
Such a model will have superb predictive performance on the training data, but miserable performance on future unseen examples.
An alternative strategy might involve splitting the dataset into two parts — a training set and a testing set.
As the name implies, we train the model on the training set and evaluate its predictive performance on the testing set.
Although better than the previous naive approach, this train-test splitting strategy has a problem — the predictive performance is specific to the testing set.
If the test set is not big enough, it may not represent the variety of data the model may encounter in the future.
We need a better strategy that ensures that the estimated predictive performance is generalized to multiple testing sets, instead of a single test set.
The strategy of cross-validation offers this better strategy.