Classification
Classification is the task of assigning categories (or classes) to given instances automatically.
The machine learning model that has been trained to achieve such a goal is known as a classifier.
Classification falls in the realm of supervised learning — the sub-field of machine learning that enables models to be trained by observing labeled or supervised examples.
For example, to learn a classifier to identify spam emails, each supervised example will be a tuple consisting of the email information (text, subject, from, to) and its category (spam or no spam).
Depending on the number of categories and their relationships, classification problems fall into several types.
- Binary classification: An instance must belong to exactly one among two categories. The classifier itself is known as a binary classifier.
- Multi-class classification: An instance must belong to exactly one among many (more than two) categories. In a multi-class scenario, the categories are mutually exclusive.
- Multi-labeled classification: An instance may simultaneously belong to more than one category from among several categories. Thus, in a multi-labeled set up, the categories are not mutually exclusive.
Mathematically, we can express the classification problem as follows:
If \( \vx \) denotes an \( \ndim\)-dimensional input instance, then the goal of classification is to assign \( \vx \) to the appropriate category (or categories, in the case of multi-labeled) from among \( \nclass \) categories \( \set{C_1, \ldots, C_\nclass} \), where \( \nclass \ge 2 \).
The classifier is trained over a collection of labeled observations provided as tuples \( (\vx_i,y_i) \) containing the instance vector \( \vx_i \) and the true target variable \( y_i \in \set{C_1,\ldots,C_\nclass}\).
This collection of \( \nlabeled \) labeled observations is known as the labeled training set, or simply the training set, \( \labeledset = \set{(\vx_1,y_1), \ldots (\vx_\nlabeled,y_\nlabeled)} \).
Below, we list some of the most popular approaches to classification. Follow the corresponding links to study the classifiers in further detail.
The binary classifiers in the above list can be adapted to support multi-class or multi-labeled classification scenarios through the one-vs-one or one-vs-rest strategies.