Parameter constraints
One method of regularizing deep neural networks is to constrain the parameter values, for example, by applying a suitable norm as a penalty on the parameters or weights of the model.
If \( \loss \) denotes the unregularized loss of the neural network, then we incorporate the regularization term \( \Omega(\mTheta) \) on the parameters \( \mTheta \) of the model.
$$ \loss_{\text{regularized}} = \loss + \alpha \Omega(\mTheta) $$
where, \( \alpha \in \real \) is a hyperparameter that controls the impact of the regularization term.
A popular form of penalty on the weights is the \( L_2 \) norm, also known as weight decay in neural networks, which is applied on each weight parameter in the network.
$$ \Omega(\mTheta) = \sum_{\vw \in \mTheta} \norm{\vw}{2} $$
where, the \( L_2 \) norm is defined as \( \norm{\vw}{2} = \vw^T\vw \).
Sparsity can be enforced in the model parameters by using an \(L_1\) norm instead, just like in lasso regression.
$$ \Omega(\mTheta) = \sum_{\vw \in \mTheta} \norm{\vw}{1} $$
where, the \( L_1 \) norm is defined as \( \norm{\vw}{1} = \sum_{i} |w_i| \).
For more details on the effect of \( L_2 \) and \( L_1 \) regularization in machine learning models, refer to our comprehensive article on norm-based regularization in machine learning.