SGD demo: Himmelblau's function
So far, our function had only a single minimum.
Such functions, albeit easy to understand, are rare in machine learning.
Let us study a function with multiple minima.
We have introduced Himmelblau's function in the module on Calculus.
$$ f(x,y) = (x^2 + y - 11)^2 + (x + y^2 - 7)^2 $$
Himmelblau's function is nonconvex with 4 minima (marked in green) and 1 local maximum (marked in blue).
In this demo, we have decomposed Himmelblau's function into its two summands, \( (x^2 + y - 11)^2 \) and \( (x+y^2-7)^2 \) for the sake of demonstrating SGD with partial gradients in each step.
SGD converges for Himmelblau's function if the learning rate is just right and dampening is used to slow it down with each iteration to avoid oscillations.
But as expected, it takes a more serpentine route to reach the minima. Interestingly, if the parameters are set appropriately, SGD converges to the closest minimum.
However, if the parameters are not chosen suitably, then SGD may not converge to any optima. The reliance of SGD on good learning rates and dampening factors is more than that of gradient descent, and this point is very important to remember in implementing SGD based solutions. Owing to these factors, there is increased interest in finding variants of SGD that rely on fewer parameters or just work out-of-the-box. We will investigate these in an upcoming section specifically dedicated to SGD and its popular variants.