C6: Softmax & Cross-Entropy

From raw scores to probabilities to loss

Pipeline: Logits → Softmax → Probabilities → Cross-Entropy → Loss
Logits
z₀ 2.00
z₁ 1.00
z₂ 0.10
e^(z/T)
e^z₀ 7.39
e^z₁ 2.72
e^z₂ 1.11
Σ 11.21
Softmax
p₀ 0.659
p₁ 0.242
p₂ 0.099
Σ 1.000
-log(p)
p_correct 0.659
log(p) -0.42
-log(p) 0.42
Loss
L = -log(py)
0.42
⚠️ Harsh penalty!
Gradients ∂L/∂z = p - y
∂L/∂z₀ -0.341
∂L/∂z₁ +0.242
∂L/∂z₂ +0.099
Loss = -log(0.659)
0.42