## Multivariate Gaussian as class-conditional density

In the case of linear discriminant analysis, we model the class-conditional density \( P(\vx | C_m) \) as a multivariate Gaussian.

$$ P(\vx|C_m) = \frac{1}{\sqrt{2\pi |\mSigma_m|}} \expe{-\frac{1}{2}(\vx - \vmu_m)^T \mSigma_m^{-1} (\vx - \vmu_m)} $$

Here, \( \vmu_m \) is the mean of the training examples for the class \( m \) and \( \mSigma_m \) is the covariance for those training examples.

In the case of linear discriminant analysis, **the covariance is assumed to be the same for all the classes**.
This means, \( \mSigma_m = \mSigma, \forall m \).

In comparing two classes, say \( C_p \) and \( C_q \), it suffices to check the log-ratio

$$ \log \frac{P(C_p | \vx}{P(C_q | \vx)} $$

Let's look at this log-ratio in further detail by expanding it with appropriate substitutions.

\begin{align}
\log \frac{P(C_p | \vx)}{P(C_q | \vx)} &= \log \frac{P(C_p)}{P(C_q)} + \log \frac{P(\vx|C_p)}{P(\vx|C_q)} \\\\
&= \log\frac{P(C_p)}{P(C_q)} - \frac{1}{2}(\vmu_p + \vmu_q)^T \mSigma^{-1} (\vmu_p - \vmu_q) + \vx^T \mSigma^{-1}(\vmu_p - \vmu_q)
\label{eqn:log-ratio-expand}
\end{align}

This equation is linear in \( \vx \), hence the name *linear* discriminant analysis.

The normalizing factors in both probabilities cancelled in the division since they were both \( \sqrt{2\pi |\mSigma|} \).
Also, the square-term in both was \( \vx^T\mSigma\vx \) and got cancelled, resulting in the linear term based classifier.
Both these cancellation will not happen if \( \mSigma_p \ne \mSigma_q \), an extension known as **quadtratic discriminant analysis**.
Of course, quadratic discriminant analysis is not a linear classifier then, due to the presence of square terms \( \vx^T(\mSigma_p + \mSigma_q)\vx \).