Training logistic regression
Training a logistic regression classifier involves discovering suitable values for the parameters — \( \vw \) and \( b \).
The parameters are optimized for maximizing the likelihood of observed data — the training set — by (maximum likelihood estimation) (MLE).
Suppose \( \labeledset = \set{(\vx_1, y_1), \ldots, (\vx_\nlabeled, y_\nlabeled)} \) denotes the training set consisting of \( \nlabeled \) training instances.
Assume that \( y_\nlabeledsmall = 1 \) if \( \vx_\nlabeledsmall \) belongs to the class \( C_1 \), and zero if it belongs to the class \( C_2 \).
The likelihood in the case of logistic regression is
$$ P(\labeledset|\vw) = \prod_{\nlabeledsmall=1}^\nlabeled P(C_1|\vx_\nlabeledsmall)^{y_\nlabeledsmall} \left(1 - P(C_1|\vx_\nlabeledsmall)\right)^{1 - y_\nlabeledsmall} $$
Note that the term \(P(C_1|\vx_\nlabeledsmall)^{y_\nlabeledsmall} \) is activated if the instance \( \vx_\nlabeledsmall \) belongs to the class \( C_1 \), since \(y_\nlabeledsmall = 1 \) in that case.
The latter term \( \left(1 - P(C_1|\vx_\nlabeledsmall)\right)^{1 - y_\nlabeledsmall} \) is activated when the training instance \( \vx_\nlabeledsmall \) belongs to the class \(C_2\), since \(y_\nlabeledsmall = 0 \) in that case.
With the likelihood function defined this way, the training proceeds by MLE using popular optimization approaches.
Particularly in the case of logisitc regression, there is no closed-form solution to the MLE optimization problem.
Hence, iterative approaches such as minibatch stochastic gradient descent may be used.