C1: Forward Pass Step-by-Step

What is a Forward Pass?

Imagine a factory assembly line. Raw materials enter one end, pass through several workstations where they're shaped and modified, and a finished product comes out the other end. A neural network works the same way.

The forward pass is the journey data takes from input to output.

When you give a neural network an image and it tells you "cat," that answer came from a forward pass. When a language model predicts the next word, that prediction came from a forward pass. Every time a neural network produces an output, it has performed a forward pass.

Here's the key insight: a forward pass is just arithmetic. Multiplication, addition, and simple functions—repeated many times. Nothing more mysterious than that.

At each layer, three things happen:

Multiply inputs by weights
Add a bias
Apply an activation function

The Single Neuron

A neuron is the basic computing unit of a neural network. Despite the biological name, it's just a small mathematical function. Here's what a single neuron does:

Step 1: Receive Multiple Inputs

A neuron receives several numbers as input. Let's say our neuron receives three inputs: x₁ = 2, x₂ = 3, x₃ = 1

Step 2: Multiply Each Input by a Weight

Every input has an associated weight—a number that determines how much that input matters. Let's say our weights are: w₁ = 0.5, w₂ = -0.3, w₃ = 0.8

We multiply each input by its weight:

w₁ × x₁ = 0.5 × 2 = 1.0
w₂ × x₂ = -0.3 × 3 = -0.9
w₃ × x₃ = 0.8 × 1 = 0.8

Step 3: Sum All Weighted Inputs

Add up all those products:

1.0 + (-0.9) + 0.8 = 0.9

This is the weighted sum. If you remember dot products from our vectors lesson, you'll recognize this: we just computed the dot product of the input vector [2, 3, 1] and the weight vector [0.5, -0.3, 0.8].

Step 4: Add a Bias

The bias is a single number added to the weighted sum. Think of it as the neuron's "baseline" or "offset." Let's say our bias is: b = 0.1

0.9 + 0.1 = 1.0

Step 5: Apply an Activation Function

Finally, we pass this result through an activation function. Using ReLU (Rectified Linear Unit): if the number is negative, output 0; if positive, output the number unchanged.

ReLU(1.0) = 1.0

This is our neuron's output: 1.0

Why Weights and Biases?

Weights: The Importance Dial

Each weight controls how much its corresponding input influences the output:

Large positive weight (like 2.0): "This input is very important—when it goes up, push my output up strongly"
Small positive weight (like 0.1): "This input matters a little"
Negative weight (like -0.5): "This input matters, but inversely—when it goes up, push my output down"
Zero weight: "Ignore this input completely"

Bias: Shifting the Threshold

The bias shifts when the neuron "activates."

Positive bias (+2): The neuron is "eager"—it produces output even with weak or zero inputs
Negative bias (-2): The neuron is "reluctant"—inputs need to be stronger to overcome this deficit

Activation Functions: Adding Non-Linearity

Without activation functions, stacking layers would be pointless. Two linear transformations in a row can always be collapsed into a single linear transformation. No matter how many layers you stack, the network could only learn linear relationships.

Activation functions add non-linearity—the ability to learn curves, boundaries, and complex patterns.

Common activation functions:

Function	Formula	Output Range
ReLU	max(0, x)	0 if negative, x if positive
Sigmoid	1/(1 + e⁻ˣ)	Always between 0 and 1
Tanh	(eˣ - e⁻ˣ)/(eˣ + e⁻ˣ)	Always between -1 and 1

From Neuron to Layer

A single neuron is limited—it can only output one number. Real neural networks have layers containing many neurons working in parallel.

Here's the insight that connects to your matrix multiplication knowledge:

One neuron: dot product of inputs with weights, add bias
Many neurons: matrix multiplication of inputs with weight matrix, add bias vector

output = activation(W · x + b)

This is exactly the matrix multiplication you learned! The weight matrix transforms the input vector into a new vector, we add the bias vector, and apply activation to each element.

Complete Worked Example

Let's trace every single number through a complete network:

Network Architecture: 2 inputs → 2 hidden → 1 output

Input: x = [0.5, 0.8]

Layer 1 (Hidden Layer)

Weight matrix and biases:

W₁ = | 0.4   0.3 |      b₁ = | 0.1  |
     | -0.2  0.5 |           | -0.1 |

Step 1.1: Matrix Multiplication

Hidden neuron 1: 0.4 × 0.5 + 0.3 × 0.8 = 0.44
Hidden neuron 2: -0.2 × 0.5 + 0.5 × 0.8 = 0.30
Result: [0.44, 0.30]

Step 1.2: Add Bias

[0.44 + 0.1, 0.30 + (-0.1)] = [0.54, 0.20]

Step 1.3: Apply ReLU

ReLU([0.54, 0.20]) = [0.54, 0.20]  (both positive)

Layer 2 (Output Layer)

W₂ = | 0.6  -0.4 |      b₂ = | 0.2 |

Step 2.1: Matrix Multiplication

0.6 × 0.54 + (-0.4) × 0.20 = 0.324 - 0.08 = 0.244

Step 2.2: Add Bias

0.244 + 0.2 = 0.444

Step 2.3: Apply Sigmoid

sigmoid(0.444) = 1 / (1 + e^(-0.444)) ≈ 0.609

Final Output: 0.609

Why This Matters for ML/AI

1. Prediction = One Forward Pass

When you use a trained model, you're running forward passes. Every ChatGPT response, every image classification, every recommendation—it's forward passes producing outputs.

2. Inference Time = Sum of Layer Computations

The time to get a prediction depends on: number of layers (depth), size of each layer (width), and complexity of operations. Bigger matrices = more multiplications = slower inference.

3. Forward Pass is Prerequisite for Backpropagation

Training a neural network requires backpropagation—computing how to adjust weights to reduce errors. But backprop needs the forward pass first:

Forward pass: compute output
Calculate error: compare output to correct answer
Backward pass: trace error back through layers to compute weight updates

Key Takeaways

A forward pass is data flowing through layers from input to output
Each neuron computes: weighted sum + bias + activation
Weights determine importance of each input; biases shift thresholds
Activation functions add non-linearity—without them, stacking layers is pointless
Matrix multiplication handles entire layers at once—it's not just convenient, it's computationally efficient
The entire process is just arithmetic: multiply, add, apply simple functions, repeat

In your visualization, you'll see these numbers flowing through the network. Watch how different weights create different transformations. Notice how negative weighted sums become zero after ReLU. See the data reshape itself at each layer until a final answer emerges.

Next: C2 — Backpropagation: How Networks Learn from Their Mistakes