Multimodal regression
GMMs are suitable to modeling multimodal data — data with multiple peaks in their probability density.
Since the same value of input can lead to multiple peaks in the distribution \( p(\vy|\vx) \), we can infer a GMM at each point in the input space.
To enable this, the \( k \)-th Gaussian component at the input \( \vx \) will have the mean \( \vmu_k(\vx) \) and the covariance \( \mSigma_k(\vx) \).
Thus, the conditional distribution \( p(\vy|\vx) \) using such a GMM is
\begin{equation}
p(\vy|\vx) = \sum_{k=1}^K P(c=k | \vx) \Gauss(\vy;\vmu_k(\vx),\mSigma_k(\vx))
\label{eqn:mdn-pdf}
\end{equation}
Note that each input example \( \vx \) has its own custom GMM, since the GMM is conditional on the input.
To compute the conditional distribution in Equation \eqref{eqn:mdn-pdf}, we clearly need three quantities — \( p(c=k|\vx),~ \vmu_k(\vx),~ \mSigma_k(\vx) \) for all mixture components \( k = 1, \ldots, K \).
Mixture density networks provide as outputs these quantities to support the calculation of the conditional distribution.