See how the amount of data per gradient update affects the descent path.
Convex loss surface from linear regression. All methods converge, but watch the paths differ.
Batch GD uses all 100 points per step (accurate but slow). SGD uses 1 point (fast but noisy). Mini-batch balances both.