Types of tasks in machine learning

This learning module has many interactive demos. It is easier to work with them on a larger screen. Bookmark and revisit if you are currently on a small screen device.

\(\DeclareMathOperator*{\argmax}{arg\,max} \DeclareMathOperator*{\argmin}{arg\,min} \DeclareMathOperator*{\asterisk}{\ast} \newcommand{\sup}{\text{sup}} \newcommand{\inf}{\text{inf}} \newcommand{\min}{\text{min}\;} \newcommand{\max}{\text{max}\;} \newcommand{\maxunder}[1]{\underset{#1}{\max}} \newcommand{\minunder}[1]{\underset{#1}{\min}} \newcommand{\real}{\mathbb{R}} \newcommand{\natural}{\mathbb{N}} \newcommand{\integer}{\mathbb{Z}} \newcommand{\rational}{\mathbb{Q}} \newcommand{\irrational}{\mathbb{I}} \newcommand{\complex}{\mathbb{C}} \newcommand{\cardinality}[1]{|#1|} \newcommand{\vec}[1]{\mathbf{#1}} \newcommand{\mat}[1]{\mathbf{#1}} \newcommand{\star}[1]{#1^*} \newcommand{\inv}[1]{#1^{-1}} \newcommand{\indicator}[1]{\mathcal{I}(#1)} \renewcommand{\BigO}[1]{\mathcal{O}(#1)} \renewcommand{\BigOsymbol}{\mathcal{O}} \renewcommand{\smallo}[1]{\mathcal{o}(#1)} \renewcommand{\smallosymbol}[1]{\mathcal{o}} \newcommand{\set}[1]{\mathbb{#1}} \newcommand{\complement}[1]{#1^c} \newcommand{\powerset}[1]{\mathcal{P}(#1)} \newcommand{\setdiff}{\setminus} \newcommand{\setsymmdiff}{\oplus} \newcommand{\dash}[1]{#1^{'}} \newcommand{\permutation}[2]{{}_{#1} \mathrm{ P }_{#2}} \newcommand{\combination}[2]{{}_{#1} \mathrm{ C }_{#2}} \newcommand{\prob}[1]{P(#1)} \newcommand{\pmf}[1]{P(#1)} \newcommand{\pdf}[1]{p(#1)} \newcommand{\cdf}[1]{F(#1)} \newcommand{\expect}[2]{E_{#1}\left[#2\right]} \newcommand{\entropy}[1]{\mathcal{H}\left[#1\right]} \newcommand{\expe}[1]{\mathrm{e}^{#1}} \newcommand{\textexp}[1]{\text{exp}\left(#1\right)} \def\independent{\perp\!\!\!\perp} \def\notindependent{\not\!\independent} \newcommand{\yhat}{\hat{y}} \newcommand{\vs}{\vec{s}} \newcommand{\vt}{\vec{t}} \newcommand{\vu}{\vec{u}} \newcommand{\vv}{\vec{v}} \newcommand{\vw}{\vec{w}} \newcommand{\vx}{\vec{x}} \newcommand{\vy}{\vec{y}} \newcommand{\vz}{\vec{z}} \newcommand{\va}{\vec{a}} \newcommand{\vb}{\vec{b}} \newcommand{\vc}{\vec{c}} \newcommand{\vd}{\vec{d}} \newcommand{\ve}{\vec{e}} \newcommand{\vg}{\vec{g}} \newcommand{\vh}{\vec{h}} \newcommand{\vi}{\vec{i}} \newcommand{\vk}{\vec{k}} \newcommand{\vo}{\vec{o}} \newcommand{\vp}{\vec{p}} \newcommand{\vq}{\vec{q}} \newcommand{\vr}{\vec{r}} \newcommand{\vs}{\vec{s}} \newcommand{\vmu}{\vec{\mu}} \newcommand{\vsigma}{\vec{\sigma}} \newcommand{\vphi}{\vec{\phi}} \newcommand{\vtau}{\vec{\tau}} \newcommand{\vtheta}{\vec{\theta}} \newcommand{\mA}{\mat{A}} \newcommand{\mB}{\mat{B}} \newcommand{\mC}{\mat{C}} \newcommand{\mD}{\mat{D}} \newcommand{\mE}{\mat{E}} \newcommand{\mH}{\mat{H}} \newcommand{\mK}{\mat{K}} \newcommand{\mP}{\mat{P}} \newcommand{\mQ}{\mat{Q}} \newcommand{\mR}{\mat{R}} \newcommand{\mS}{\mat{S}} \newcommand{\mU}{\mat{U}} \newcommand{\mV}{\mat{V}} \newcommand{\mW}{\mat{W}} \newcommand{\mX}{\mat{X}} \newcommand{\mY}{\mat{Y}} \newcommand{\mZ}{\mat{Z}} \newcommand{\mI}{\mat{I}} \newcommand{\mLambda}{\mat{\Lambda}} \newcommand{\mSigma}{\mat{\Sigma}} \newcommand{\mTheta}{\mat{\theta}} \newcommand{\setsymb}[1]{#1} \newcommand{\sA}{\setsymb{A}} \newcommand{\sB}{\setsymb{B}} \newcommand{\sC}{\setsymb{C}} \newcommand{\sO}{\setsymb{O}} \newcommand{\sP}{\setsymb{P}} \newcommand{\sQ}{\setsymb{Q}} \newcommand{\sH}{\setsymb{H}} \newcommand{\sX}{\setsymb{X}} \newcommand{\sY}{\setsymb{Y}} \newcommand{\norm}[2]{||{#1}||_{#2}} \newcommand{\infnorm}[1]{\norm{#1}{\infty}} \newcommand{\fillinblank}{\text{ }\underline{\text{ ? }}\text{ }} \newcommand{\lbrace}{\left\{} \newcommand{\rbrace}{\right\}} \newcommand{\set}[1]{\lbrace #1 \rbrace} \newcommand{\seq}[1]{\left( #1 \right)} \newcommand{\ndim}{N} \newcommand{\ndimsmall}{n} \newcommand{\dataset}{\mathbb{D}} \newcommand{\ndata}{D} \newcommand{\ndatasmall}{d} \newcommand{\labeledset}{\mathbb{L}} \newcommand{\nlabeled}{L} \newcommand{\nlabeledsmall}{l} \newcommand{\unlabeledset}{\mathbb{U}} \newcommand{\nunlabeled}{U} \newcommand{\nunlabeledsmall}{u} \newcommand{\nclass}{M} \newcommand{\nclasssmall}{m} \newcommand{\loss}{\mathcal{L}} \newcommand{\sign}{\text{sign}} \newcommand{\Gauss}{\mathcal{N}} \newcommand{\hadamard}{\circ} \newcommand{\doh}[2]{\frac{\partial #1}{\partial #2}} \newcommand{\dox}[1]{\doh{#1}{x}} \newcommand{\doy}[1]{\doh{#1}{y}} \newcommand{\doxx}[1]{\doh{#1}{x^2}} \newcommand{\doyy}[1]{\doh{#1}{y^2}} \newcommand{\doxy}[1]{\frac{\partial #1}{\partial x \partial y}} \newcommand{\doyx}[1]{\frac{\partial #1}{\partial y \partial x}} \newcommand{\qed}{\tag*{$\blacksquare$}}\)

        Types of tasks in machine learning
        Machine Learning
      

Introduction

Machine learning is a broad field with a variety of approaches to addressing a gamut of tasks. In this article, we will describe some of the commonly addressed tasks using machine learning. We will also comment and point to suitable approaches for handling such tasks.

Prerequisites

To understand the variety of tasks in machine learning, we recommend familiarity with the concepts in

Introduction to machine learning

Follow the above link to first get acquainted with the corresponding concepts.

Classification

Classification is the task of assigning categories (or classes) to given instances automatically. The machine learning model that has been trained to achieve such a goal is known as a classifier. Classification falls in the realm of supervised learning — the sub-field of machine learning that enables models to be trained by observing labeled or supervised examples. For example, to learn a classifier to identify spam emails, each supervised example will be a tuple consisting of the email information (text, subject, from, to) and its category (spam or no spam).

Depending on the number of categories and their relationships, classification problems fall into several types.

Binary classification: An instance must belong to exactly one among two categories. The classifier itself is known as a binary classifier.
Multi-class classification: An instance must belong to exactly one among many (more than two) categories. In a multi-class scenario, the categories are mutually exclusive.
Multi-labeled classification: An instance may simultaneously belong to more than one category from among several categories. Thus, in a multi-labeled set up, the categories are not mutually exclusive.

Mathematically, we can express the classification problem as follows: If $ \vx $ denotes an $ \ndim$-dimensional input instance, then the goal of classification is to assign $ \vx $ to the appropriate category (or categories, in the case of multi-labeled) from among $ \nclass $ categories $ \set{C_1, \ldots, C_\nclass} $, where $ \nclass \ge 2 $.

The classifier is trained over a collection of labeled observations provided as tuples $ (\vx_i,y_i) $ containing the instance vector $ \vx_i $ and the true target variable $ y_i \in \set{C_1,\ldots,C_\nclass}$. This collection of $ \nlabeled $ labeled observations is known as the labeled training set, or simply the training set, $ \labeledset = \set{(\vx_1,y_1), \ldots (\vx_\nlabeled,y_\nlabeled)} $.

Below, we list some of the most popular approaches to classification. Follow the corresponding links to study the classifiers in further detail.

The binary classifiers in the above list can be adapted to support multi-class or multi-labeled classification scenarios through the one-vs-one or one-vs-rest strategies.

Regression

Regression is the task of assigning a real-valued output to an input instance. For example, we may need to predict the selling price, a real number, of a house given its location, area, lot-size, number of bedrooms, bathrooms, and installed amenities. Just like classification, regression models are also trained using the supervised learning approach to machine learning.

Mathematically, we can express the regression problem as follows: If $ \vx $ denotes an $ \ndim$-dimensional input instance, then the goal of regression is to predict a real-valued output $ \vy \in \real $ for the input $ \vx $.

The regression model is trained over a collection of supervised observations provided as tuples $ (\vx_i,y_i) $ containing the instance vector $ \vx_i $ and the true target variable $ y_i \in \real $. This collection of $ \nlabeled $ labeled observations is known as the training set, $ \labeledset = \set{(\vx_1,y_1), \ldots (\vx_\nlabeled,y_\nlabeled)} $.

In some problem settings, the output variable is also a multi-dimensional vector $ \vy \in \real^\nclass $. Such scenarios are known as multi-output regression problems.

Below, we list some of the most popular approaches to regression. Follow the corresponding links to study the regression models in further detail.

Clustering

Clustering involves the assignment of input instances into groups or clusters of similar instances. For example, we may wish to automatically group news items coming from disparate sources into clusters of related news to be summarized by a single headline. Clustering is a form of unsupervised learning scenario, one that does not involve the use of prior supervision or labels about the assignment of individual instances to various groups.

Mathematically, we can express the clustering problem as follows: Consider observations represented as vectors, for example $ \vx \in \real^\ndim $ — vectors consisting of $ \ndim $ features, $\vx = [x_1, x_2, \ldots, x_\ndim] $. A collection of such observations is provided in the unlabeled set $ \unlabeledset = \set{\vx_1,\ldots,\vx_\nunlabeled} $. The goal of clustering is to automatically infer the groups of examples $ G_1, \ldots, G_\nclass $ such that the examples belong to a single group, say $ G_i $, are similar in some sense. The groups $ G_1, \ldots, G_\nclass $ are not pre-defined. They are automatically discovered as part of the clustering process.

Below, we list some of the most popular approaches to clustering. Follow the corresponding links to study the specifics of a clustering algorithm in further detail.

Density estimation

Density estimation is the task of modeling a probability density function for some feature space to facilitate the estimation of density of any given input instance. For example, given observations of past confirmed locations of underground reserves, a density estimator may be learnt that can provide the likelihood of an oil reserve at any given input location conditioned on the historical observations. Just like clustering, density estimation is also a form of unsupervised learning approach in machine learning.

Mathematically, we can express the density estimation problem as follows: Consider observations represented as vectors, for example $ \vx \in \real^\ndim $ — vectors consisting of $ \ndim $ features, $\vx = [x_1, x_2, \ldots, x_\ndim] $. A collection of such observations is provided in the unlabeled set $ \unlabeledset = \set{\vx_1,\ldots,\vx_\nunlabeled} $. The goal of density estimation is to automatically infer the probability density function that aligns the best with the observations in $ \unlabeledset $, so that the $ p(vx) $ can be estimated most accurately for any instance $ \vx $ in that feature space.

Below, we list some of the most popular approaches to density estimation. Follow the corresponding links to study the specifics of a density estimation algorithm in further detail.

Dimensionality reduction

Dimensionality reduction, as the name implies, involves transforming an multivariate input instance to an output instance with fewer dimensions than the input, while retaining task-dependent relevant information in the instance. For example, we may wish to reduce a 10-dimensional dataset to a 2-dimensional dataset for easy visualization as a scatter plot, while maybe retaining the natural groupings among the input instances, even in the two-dimensional space. Dimensionality reduction may involve an unsupervised or a supervised learning strategy, depending on whether the reduced dimensions are arrived at by being informed with some categorical labels.

Mathematically, we can express the dimensionality reduction problem as follows: Consider observations represented as vectors, for example $ \vx \in \real^\ndim $ — vectors consisting of $ \ndim $ features, $\vx = [x_1, x_2, \ldots, x_\ndim] $. In dimensionality reduction, we wish to transform these into output vectors with $ \nclass $ dimensions, $ \vy \in \real^\nclass $, such that $ \nclass \ll \ndim $.

Below, we list some of the most popular approaches to dimensionality reduction. Follow the corresponding links to study the specifics of a dimensionality reduction algorithm in further detail.

In addition to these, the approaches to clustering, that we saw earlier, can also be considered as dimensionality reduction approaches where the examples are reduced to a single dimension!

Please support us

Help us create more engaging and effective content and keep it free of paywalls and advertisements!

Please donate

Let's connect

Please share your comments, questions, encouragement, and feedback.