Introduction
An informed choice of a suitable metric can help define an appropriate loss to optimize the model for a given task during training. By rigorously evaluating to understand the generalization performance of a model using techniques such as cross-validation, a trained model may be identified to be superior to other models and hence chosen to be deployed for the particular task. Making these informed choices during training and testing is possible with a clear understanding of evaluation metrics.
In this article we will cover the many metrics to evaluate performance of machine learning models for classification. We will also comment on their suitability for various tasks and scenarios. We have a separate comprehensive article on evaluation metrics for regression.