B5: Product Rule — Neural Pathways

1. The Setup: When Functions Multiply

You've learned to differentiate single functions like x², sin(x), and eˣ. But what happens when two functions are multiplied together?

Consider a function like: h(x) = f(x) · g(x)

For example:

h(x) = x² · sin(x)
h(x) = eˣ · x
h(x) = (x + 1) · (x² - 3)

Each of these is the product of two simpler functions. To find h'(x), we need a new rule.

2. Why Not Just Multiply Derivatives?

Here's the trap that catches nearly everyone at first: "If h(x) = f(x) · g(x), surely h'(x) = f'(x) · g'(x)?"

Let's test this with a simple example.

Take h(x) = x · x = x². We know the derivative of x² is 2x. That's our ground truth.

Now, let's try the "just multiply derivatives" approach:

f(x) = x, so f'(x) = 1
g(x) = x, so g'(x) = 1
"Multiply derivatives": 1 × 1 = 1

But wait — we said h'(x) should be 2x, not 1!

The "multiply derivatives" approach is completely wrong. This isn't a small error; we got 1 instead of 2x. So what's actually happening when we differentiate a product?

3. The Rectangle Analogy: Seeing the Product Rule

This is the key insight that makes the product rule intuitive. Think of f(x) · g(x) as the area of a rectangle.

Imagine a rectangle where:

Width = f(x)
Height = g(x)
Area = f(x) · g(x)

Now, here's the crucial question: When x changes slightly, how does the area change?

Both Dimensions Change Simultaneously

When x increases by a tiny amount Δx:

The width changes from f(x) to f(x) + Δf
The height changes from g(x) to g(x) + Δg

The change in area gives us three pieces:

f · Δg — A horizontal strip: keeping width f, height increases by Δg
g · Δf — A vertical strip: keeping height g, width increases by Δf
Δf · Δg — A tiny corner: both changes together

The Vanishing Corner

Here's the beautiful part. That corner piece (Δf · Δg) is tiny squared — it's the product of two infinitesimally small quantities. As Δx → 0, this term vanishes much faster than the others.

When we take the limit, the corner disappears entirely:

d(fg)/dx = f · (dg/dx) + g · (df/dx)

4. The Product Rule Formula

For h(x) = f(x) · g(x):

h'(x) = f'(x) · g(x) + f(x) · g'(x)

Or equivalently: (fg)' = f'g + fg'

Leibniz Notation

Using Leibniz's notation with u and v:

d(uv) = u·dv + v·du

This is elegant because it shows the symmetry — each function gets its turn being differentiated while the other stays put.

The Mnemonic

There's a classic memory aid:

"First times derivative of second, plus second times derivative of first."

Or more rhythmically: "Left d-right, plus right d-left."

5. Worked Examples

Example 1: h(x) = x² · sin(x)

Identify the parts:

f(x) = x² → f'(x) = 2x
g(x) = sin(x) → g'(x) = cos(x)

Apply the product rule:

h'(x) = 2x · sin(x) + x² · cos(x)

Final answer: h'(x) = 2x·sin(x) + x²·cos(x)

Example 2: h(x) = eˣ · x

Identify the parts:

f(x) = eˣ → f'(x) = eˣ
g(x) = x → g'(x) = 1

Apply the product rule:

h'(x) = eˣ · x + eˣ · 1 = eˣ(x + 1)

Example 3: Three Functions (Extended Product Rule)

For h(x) = f(x) · g(x) · k(x), the pattern extends naturally:

(fgk)' = f'gk + fg'k + fgk'

Each function takes a turn being differentiated.

6. Common Mistakes to Avoid

Mistake 1: Multiplying the Derivatives

❌ Wrong: (fg)' = f' · g'

✅ Right: (fg)' = f'g + fg'

Mistake 2: Forgetting One Term

❌ Wrong: (x² · sin(x))' = 2x · cos(x)

✅ Right: (x² · sin(x))' = 2x · sin(x) + x² · cos(x)

Always count your terms. Product rule = two terms. Three functions = three terms.

Mistake 3: Confusing Product Rule with Chain Rule

The product rule handles f(x) · g(x) — two separate functions multiplied.

The chain rule handles f(g(x)) — one function inside another.

These are different situations requiring different rules.

7. Why This Matters for ML/AI

The product rule isn't just calculus trivia — it's essential for understanding how neural networks learn.

Gating Mechanisms in LSTMs

Long Short-Term Memory networks use gates that multiply signals together:

output = sigmoid(weights) × hidden_state

When backpropagation computes gradients through this multiplication, it applies the product rule. The gradient flows through both the sigmoid path and the hidden state path.

Attention Mechanisms

In attention layers (the heart of Transformers):

output = softmax(scores) × values

The attention weights multiply the values. During training, gradients must flow backward through this product.

The Backpropagation Connection

When an automatic differentiation system encounters a multiplication node during backpropagation, it implements the product rule automatically:

Forward: z = x * y
Backward: ∂L/∂x = y * (∂L/∂z)
∂L/∂y = x * (∂L/∂z)

Each input's gradient is the other input times the upstream gradient. This is the product rule in action.

Summary

The Product Rule: When differentiating h(x) = f(x) · g(x):

h'(x) = f'(x) · g(x) + f(x) · g'(x)

The Rectangle Intuition: Area changes because both dimensions change. The total change is the sum of the horizontal strip (f · Δg) plus the vertical strip (g · Δf). The corner vanishes in the limit.

The Mnemonic: "First times derivative of second, plus second times derivative of first."

In ML: Every multiplication in a neural network — gating, attention, weighted losses — requires the product rule during backpropagation. The gradient splits and flows through both multiplicands.