Introduction to restricted Boltzmann machines (RBMs)

This learning module has many interactive demos. It is easier to work with them on a larger screen. Bookmark and revisit if you are currently on a small screen device.

\(\DeclareMathOperator*{\argmax}{arg\,max} \DeclareMathOperator*{\argmin}{arg\,min} \DeclareMathOperator*{\asterisk}{\ast} \newcommand{\sup}{\text{sup}} \newcommand{\inf}{\text{inf}} \newcommand{\min}{\text{min}\;} \newcommand{\max}{\text{max}\;} \newcommand{\maxunder}[1]{\underset{#1}{\max}} \newcommand{\minunder}[1]{\underset{#1}{\min}} \newcommand{\real}{\mathbb{R}} \newcommand{\natural}{\mathbb{N}} \newcommand{\integer}{\mathbb{Z}} \newcommand{\rational}{\mathbb{Q}} \newcommand{\irrational}{\mathbb{I}} \newcommand{\complex}{\mathbb{C}} \newcommand{\cardinality}[1]{|#1|} \newcommand{\vec}[1]{\mathbf{#1}} \newcommand{\mat}[1]{\mathbf{#1}} \newcommand{\star}[1]{#1^*} \newcommand{\inv}[1]{#1^{-1}} \newcommand{\indicator}[1]{\mathcal{I}(#1)} \renewcommand{\BigO}[1]{\mathcal{O}(#1)} \renewcommand{\BigOsymbol}{\mathcal{O}} \renewcommand{\smallo}[1]{\mathcal{o}(#1)} \renewcommand{\smallosymbol}[1]{\mathcal{o}} \newcommand{\set}[1]{\mathbb{#1}} \newcommand{\complement}[1]{#1^c} \newcommand{\powerset}[1]{\mathcal{P}(#1)} \newcommand{\setdiff}{\setminus} \newcommand{\setsymmdiff}{\oplus} \newcommand{\dash}[1]{#1^{'}} \newcommand{\permutation}[2]{{}_{#1} \mathrm{ P }_{#2}} \newcommand{\combination}[2]{{}_{#1} \mathrm{ C }_{#2}} \newcommand{\prob}[1]{P(#1)} \newcommand{\pmf}[1]{P(#1)} \newcommand{\pdf}[1]{p(#1)} \newcommand{\cdf}[1]{F(#1)} \newcommand{\expect}[2]{E_{#1}\left[#2\right]} \newcommand{\entropy}[1]{\mathcal{H}\left[#1\right]} \newcommand{\expe}[1]{\mathrm{e}^{#1}} \newcommand{\textexp}[1]{\text{exp}\left(#1\right)} \def\independent{\perp\!\!\!\perp} \def\notindependent{\not\!\independent} \newcommand{\yhat}{\hat{y}} \newcommand{\vs}{\vec{s}} \newcommand{\vt}{\vec{t}} \newcommand{\vu}{\vec{u}} \newcommand{\vv}{\vec{v}} \newcommand{\vw}{\vec{w}} \newcommand{\vx}{\vec{x}} \newcommand{\vy}{\vec{y}} \newcommand{\vz}{\vec{z}} \newcommand{\va}{\vec{a}} \newcommand{\vb}{\vec{b}} \newcommand{\vc}{\vec{c}} \newcommand{\vd}{\vec{d}} \newcommand{\ve}{\vec{e}} \newcommand{\vg}{\vec{g}} \newcommand{\vh}{\vec{h}} \newcommand{\vi}{\vec{i}} \newcommand{\vk}{\vec{k}} \newcommand{\vo}{\vec{o}} \newcommand{\vp}{\vec{p}} \newcommand{\vq}{\vec{q}} \newcommand{\vr}{\vec{r}} \newcommand{\vs}{\vec{s}} \newcommand{\vmu}{\vec{\mu}} \newcommand{\vsigma}{\vec{\sigma}} \newcommand{\vphi}{\vec{\phi}} \newcommand{\vtau}{\vec{\tau}} \newcommand{\vtheta}{\vec{\theta}} \newcommand{\mA}{\mat{A}} \newcommand{\mB}{\mat{B}} \newcommand{\mC}{\mat{C}} \newcommand{\mD}{\mat{D}} \newcommand{\mE}{\mat{E}} \newcommand{\mH}{\mat{H}} \newcommand{\mK}{\mat{K}} \newcommand{\mP}{\mat{P}} \newcommand{\mQ}{\mat{Q}} \newcommand{\mR}{\mat{R}} \newcommand{\mS}{\mat{S}} \newcommand{\mU}{\mat{U}} \newcommand{\mV}{\mat{V}} \newcommand{\mW}{\mat{W}} \newcommand{\mX}{\mat{X}} \newcommand{\mY}{\mat{Y}} \newcommand{\mZ}{\mat{Z}} \newcommand{\mI}{\mat{I}} \newcommand{\mLambda}{\mat{\Lambda}} \newcommand{\mSigma}{\mat{\Sigma}} \newcommand{\mTheta}{\mat{\theta}} \newcommand{\setsymb}[1]{#1} \newcommand{\sA}{\setsymb{A}} \newcommand{\sB}{\setsymb{B}} \newcommand{\sC}{\setsymb{C}} \newcommand{\sO}{\setsymb{O}} \newcommand{\sP}{\setsymb{P}} \newcommand{\sQ}{\setsymb{Q}} \newcommand{\sH}{\setsymb{H}} \newcommand{\sX}{\setsymb{X}} \newcommand{\sY}{\setsymb{Y}} \newcommand{\norm}[2]{||{#1}||_{#2}} \newcommand{\infnorm}[1]{\norm{#1}{\infty}} \newcommand{\fillinblank}{\text{ }\underline{\text{ ? }}\text{ }} \newcommand{\lbrace}{\left\{} \newcommand{\rbrace}{\right\}} \newcommand{\set}[1]{\lbrace #1 \rbrace} \newcommand{\seq}[1]{\left( #1 \right)} \newcommand{\ndim}{N} \newcommand{\ndimsmall}{n} \newcommand{\dataset}{\mathbb{D}} \newcommand{\ndata}{D} \newcommand{\ndatasmall}{d} \newcommand{\labeledset}{\mathbb{L}} \newcommand{\nlabeled}{L} \newcommand{\nlabeledsmall}{l} \newcommand{\unlabeledset}{\mathbb{U}} \newcommand{\nunlabeled}{U} \newcommand{\nunlabeledsmall}{u} \newcommand{\nclass}{M} \newcommand{\nclasssmall}{m} \newcommand{\loss}{\mathcal{L}} \newcommand{\sign}{\text{sign}} \newcommand{\Gauss}{\mathcal{N}} \newcommand{\hadamard}{\circ} \newcommand{\doh}[2]{\frac{\partial #1}{\partial #2}} \newcommand{\dox}[1]{\doh{#1}{x}} \newcommand{\doy}[1]{\doh{#1}{y}} \newcommand{\doxx}[1]{\doh{#1}{x^2}} \newcommand{\doyy}[1]{\doh{#1}{y^2}} \newcommand{\doxy}[1]{\frac{\partial #1}{\partial x \partial y}} \newcommand{\doyx}[1]{\frac{\partial #1}{\partial y \partial x}} \newcommand{\qed}{\tag*{$\blacksquare$}}\)

        Restricted Boltzmann machines (RBMs)
        Deep Learning
      

Introduction

A restricted Boltzmann machine (RBM), originally invented under the name harmonium, is a popular building block for deep probabilistic models. For example, they are the constituents of deep belief networks that started the recent surge in deep learning advances in 2006.

RBMs specify joint probability distributions over random variables, both visible and latent, using an energy function, similar to Boltzmann machines, but with some restrictions. Hence the name. In this article, we will introduce Boltzmann machines and their extension to RBMs.

Prerequisites

To understand RBMs, we recommend familiarity with the concepts in

Probability: A sound understanding of conditional and marginal probabilities and Bayes Theorem is desirable.
Introduction to machine learning: An introduction to basic concepts in machine learning such as classification, training instances, features, and feature types.

Follow the above links to first get acquainted with the corresponding concepts.

Boltzmann machines

A Boltzmann machine is a parametric model for the joint probability of binary random variables.

Consider an $ \ndim$-dimensional binary random variable $ \vx \in \set{0,1}^\ndim $ with an unknown distribution. The joint probability of such a random variable using the Boltzmann machine model is calculated as

\begin{equation} \prob{\vx} = \frac{\expe{-E(\vx)}}{Z} \label{eqn:bm} \end{equation}

Here, $ Z $ is a normalization term, also known as the partition function that ensures $ \sum_{\vx} \prob{\vx} = 1 $.

The function $ E: \ndim \to 1 $ is a parametric function known as the energy function. It is defined as

\begin{equation} E(\vx) = -\vx^T \mW \vx - \vb^T \vx \label{eqn:energy} \end{equation}

with the parameters $ \mW $ and $ \vb $.

Boltzmann machine with hidden variables

The Boltzmann machine model for binary variables readily extends to scenarios where the variables are only partially observable. Say, the random variable $ \vx $ consists of a elements that are observable (or visible) $ \vv $ and the elements that are latent (or hidden) $ \vh $.

Retaining the same formulation for the joint probability of $ \vx $, we can now define the energy function of $ \vx $ with specialized parameters for the two kinds of variables, indicated below with corresponding subscripts.

\begin{aligned} E(\vx) &= E(\vv, \vh) \\\\ &= -\vv^T \mW_v \vv - \vb_v^T \vv -\vh^T \mW_h \vh - \vb_h^T - \vv^T \mW_{vh} \vh \label{eqn:energy-hidden} \end{aligned}

Restricted Boltzmann machine (RBM)

RBMs are undirected probabilistic graphical models for jointly modeling visible and hidden variables. They are a specialized version of Boltzmann machine with a restriction — there are no links among visible variables and among hidden variables.

As a result, the energy function of RBM has two fewer terms than in Equation \ref{eqn:energy-hidden}

\begin{aligned} E(\vv, \vh) &= - \vb_v^T \vv - \vb_h^T - \vv^T \mW_{vh} \vh \label{eqn:energy-rbm} \end{aligned}

Note that the quadratic terms for the self-interaction among the visible variables ($ -\vv^T \mW_v \vv $) and those among the hidden variables ($-\vh^T \mW_h \vh $ ) are not included in the RBM energy function. Hence the name restricted Boltzmann machines.

Using this modified energy function, the joint probability of the variables is

\begin{equation} \prob{v=\vv, h=\vh} = \frac{\expe{-E(\vv, \vh)}}{Z} \label{eqn:rbm} \end{equation}

The partition function is a summation over the probabilities of all possible instantiations of the variables

$$ Z = \sum_{\vv} \sum_{\vh} \prob{v=\vv, h=\vh} $$

Training

Training an RBM involves the discovery of optimal parameters $ \vb, \vc $ and $ \mW_{vh} $ of the the model. You can notice that the partition function is intractable due to the enumeration of all possible values of the hidden states. Therefore, typically RBMs are trained using approximation methods meant for models with intractable partition functions, with necessary terms being calculated using sampling methods such as Gibb sampling.

Please support us

Help us create more engaging and effective content and keep it free of paywalls and advertisements!

Please donate

Subscribe for article updates

Stay up to date with new material for free.