The Update Rule
θ = θ - α · ∇L(θ)
If α is too large, the step overshoots the minimum. If too small, convergence takes forever.
Live Statistics
Loss
0.00000
Gradient Magnitude
0.000
Effective LR
0.000
Step Size
0.000
Loss History
Learning Rates
Ready to Explore?
See how learning rate affects gradient descent. Too small creeps slowly. Too large explodes. Find the sweet spot.
💡 Drag to rotate • Scroll to zoom • Try the presets!
Learning Rate (α)
0.1200
Compare Multiple Rates
LR Schedule (Decay)
Reduces learning rate over time:α = α₀ / (1 + 0.05t)
✓ Converging
Loss is decreasing steadily. The optimizer is making good progress toward the minimum.