Batch Size Explorer

See how the amount of data per gradient update affects the descent path.

Epoch 0

Step: 0

Scenario

Convex loss surface from linear regression. All methods converge, but watch the paths differ.

Learning Rate 0.010

Mini-batch Size 16

Animation Speed 1x

Batch

0.00

0 steps

Running

Mini-batch

0.00

0 steps

Running

SGD

0.00

0 steps

Running

Understanding the Trade-offs

Batch GD uses all 100 points per step (accurate but slow). SGD uses 1 point (fast but noisy). Mini-batch balances both.

Dataset & Fit Lines

Batch (all 100)

Mini (16)

SGD (1)

Why Noise Helps SGD's noisy gradient can escape local minima where batch GD gets stuck. Try the "Local Trap" scenario!