A5: Eigenvalues & Eigenvectors

When a matrix transforms space, most vectors get twisted around in complicated ways. They rotate, stretch, and end up pointing in completely different directions. But hidden in every matrix are special vectors that refuse to rotate. They might stretch or shrink, they might even flip backwards, but they stubbornly stay on the same line.

These special vectors are eigenvectors, and the amount they stretch is their eigenvalue. Understanding them unlocks deep insights into what matrices actually do, and they're fundamental to techniques like PCA that you'll use constantly in machine learning.

1. The Key Intuition

Imagine applying a matrix transformation to every vector in 2D space. Picture the whole grid warping, stretching, rotating. Most arrows get completely reoriented.

But look carefully. There are usually one or two directions where arrows stay pointing the same way. They might get longer or shorter, but they don't spin to a new angle. These are the eigenvector directions.

Think of it this way: if you apply a transformation to an eigenvector, you get the same vector back, just scaled up or down. The transformation can't knock it off its line.

Regular vector:    A × v = w   (w points somewhere different from v)
Eigenvector:       A × v = λv  (result is just v scaled by λ)

The transformation looks at an eigenvector and says, "I can only make you bigger or smaller, I can't turn you."

2. Definition Made Visual

The formal definition is elegantly simple:

Av = λv

In words: when matrix A transforms vector v, the result is v scaled by factor λ.

Here, v is an eigenvector and λ (lambda) is its corresponding eigenvalue.

Let's unpack what this means geometrically. Take a transformation matrix and a candidate vector. Apply the matrix to the vector. If the output is pointing in the same direction (or exactly opposite), you've found an eigenvector. The ratio of the output length to input length is the eigenvalue.

// Pseudo-check: is v an eigenvector of A?
function isEigenvector(A: Matrix, v: Vector): boolean {
  const result = matmul(A, v);
  // Check if result is parallel to v (same direction or opposite)
  return isParallel(result, v);
}

Most vectors fail this test. They get rotated. Only the special eigenvector directions pass.

3. Finding Eigenvectors (The High-Level Process)

You don't need to compute eigenvalues by hand in practice (libraries do it), but understanding the process clarifies what's happening.

Step 1: Find the eigenvalues

We want Av = λv, which rearranges to:

Av - λv = 0
(A - λI)v = 0

This equation says: the matrix (A - λI) transforms v to the zero vector. For this to happen with a non-zero v, the matrix (A - λI) must be "singular" (it collapses space). That happens when its determinant is zero:

det(A - λI) = 0

This gives us a polynomial equation in λ. For a 2×2 matrix, it's a quadratic with (usually) two solutions. For an n×n matrix, it's a degree-n polynomial with n solutions (counting complex numbers and multiplicities).

Step 2: Find the eigenvectors

For each eigenvalue λ, solve:

(A - λI)v = 0

This is a system of linear equations. The solutions are the eigenvectors corresponding to that eigenvalue.

# In practice, use numpy
import numpy as np

A = np.array([[4, 2], [1, 3]])
eigenvalues, eigenvectors = np.linalg.eig(A)

print("Eigenvalues:", eigenvalues)      # [5. 2.]
print("Eigenvectors:\n", eigenvectors)  # columns are eigenvectors

4. Visual Examples

Let's build intuition by examining specific matrices. Your Three.js visualization will make these transformations tangible.

Scaling Matrix: The Easy Case

const scaleMatrix = [
  [3, 0],
  [0, 2]
];

This scales x by 3 and y by 2. The eigenvectors are obvious: the x-axis direction [1, 0] and y-axis direction [0, 1]. Why? Because scaling along an axis doesn't rotate vectors on that axis.

Eigenvalues: λ₁ = 3, λ₂ = 2
Eigenvectors: v₁ = [1, 0], v₂ = [0, 1]

The eigenvectors are the axes, and eigenvalues are the scale factors. Diagonal matrices are the simplest case.

Rotation Matrix: No Real Eigenvectors

const rotate90 = [
  [0, -1],
  [1,  0]
];

This rotates everything 90°. No vector keeps pointing the same direction after a 90° rotation. Every vector gets turned.

For pure rotation, there are no real eigenvectors. (Technically, there are complex eigenvectors, but let's stay in the real world for intuition.)

This is important: not every matrix has real eigenvectors. Matrices with significant rotation components may only have complex eigenvalues.

Shear Matrix: One Eigenvector

const shearX = [
  [1, 1],
  [0, 1]
];

This shears space horizontally. Think of a rectangle becoming a parallelogram.

Points on the x-axis don't move vertically and don't get stretched. The shear only affects their relationship to other points. The x-axis direction [1, 0] is an eigenvector with eigenvalue 1.

But what about the y direction? Vectors pointing up get pushed sideways, so [0, 1] is not an eigenvector.

For shear matrices, there's only one linearly independent eigenvector direction, and its eigenvalue is 1 (no scaling).

Symmetric Matrix: Orthogonal Eigenvectors

const symmetric = [
  [2, 1],
  [1, 2]
];

Symmetric matrices (where A = Aᵀ) are special. Their eigenvectors are always perpendicular (orthogonal) to each other, and eigenvalues are always real.

For this matrix:

λ₁ = 3 with eigenvector [1, 1] (normalized: [0.707, 0.707])
λ₂ = 1 with eigenvector [1, -1] (normalized: [0.707, -0.707])

These eigenvectors are at 90° to each other. Symmetric matrices are incredibly important in ML because covariance matrices are symmetric, and this orthogonality property is why PCA works so elegantly.

5. Eigenvalue Meaning

The eigenvalue tells you what happens to vectors along the eigenvector direction:

λ > 1: Stretching

Vectors get longer. An eigenvalue of 3 means vectors triple in length. In PCA terms, this direction has high variance.

0 < λ < 1: Compression

Vectors get shorter. An eigenvalue of 0.5 halves vector lengths. This direction is being squeezed.

λ = 1: Unchanged

Vectors keep their length. The transformation preserves this direction completely.

λ < 0: Flip and Scale

Vectors reverse direction and scale. An eigenvalue of -2 means flip and double length. Watch the visualization carefully to see this flip.

λ = 0: Collapse

This is critical. An eigenvalue of 0 means vectors in that direction get crushed to zero. The matrix loses a dimension. It's not invertible (you can't undo a collapse). In ML, this indicates redundancy or degeneracy in your data.

# Quick eigenvalue interpretation
eigenvalues = [3.2, 1.0, 0.5, 0.001]

for i, ev in enumerate(eigenvalues):
    if ev > 1:
        print(f"Direction {i}: expands by {ev}x")
    elif ev == 1:
        print(f"Direction {i}: unchanged")
    elif ev > 0:
        print(f"Direction {i}: compresses to {ev}x")
    elif ev == 0:
        print(f"Direction {i}: collapses (matrix is singular)")

6. Why This Matters for ML/AI

Eigenvalues and eigenvectors aren't just mathematical curiosities. They're workhorses in machine learning.

PCA: Principal Component Analysis

PCA is the most direct application. You have high-dimensional data and want to reduce dimensions while preserving the most important information.

The algorithm:

Compute the covariance matrix of your data
Find its eigenvectors and eigenvalues
Sort by eigenvalue (largest first)
The top eigenvectors are your principal components

Why does this work? The covariance matrix captures how features vary together. Its eigenvectors point in the directions of maximum variance. The eigenvalues tell you how much variance each direction captures.

from sklearn.decomposition import PCA

# Under the hood, this computes eigenvectors of the covariance matrix
pca = PCA(n_components=2)
reduced_data = pca.fit_transform(high_dim_data)

# pca.components_ are eigenvectors (principal directions)
# pca.explained_variance_ relates to eigenvalues

The eigenvalue tells you the importance of each principal component. If the first two eigenvalues are much larger than the rest, your data is effectively 2D even if you started with 100 features.

Gradient Descent Stability

When analyzing neural network optimization, the eigenvalues of the Hessian matrix (second derivatives of the loss) tell you about the loss surface.

Large eigenvalues: steep directions, need small learning rate
Small eigenvalues: flat directions, training is slow
Negative eigenvalues: you're at a saddle point, not a minimum
All positive eigenvalues: you're at a local minimum

The condition number (ratio of largest to smallest eigenvalue) indicates how "elongated" the loss surface is. A high condition number means optimization is difficult because different directions need different step sizes.

SVD: Singular Value Decomposition

SVD is closely related to eigen-decomposition and arguably more important in practice. For any matrix M (not just square ones):

M = UΣVᵀ

The singular values (diagonal of Σ) are the square roots of the eigenvalues of MᵀM. SVD powers:

Recommender systems (Netflix prize winning solution)
Latent semantic analysis (topic modeling)
Image compression
Pseudoinverse computation (solving overdetermined systems)

7. Connection to PCA: Directions of Maximum Variance

Let's make the PCA connection crystal clear.

Your data points form a cloud in high-dimensional space. That cloud has a shape. Maybe it's elongated like a cigar, stretched more in some directions than others.

The eigenvectors of the covariance matrix point in the directions where the cloud stretches most. The eigenvalues tell you how much it stretches.

Visualize a 2D cloud of points shaped like an ellipse. The covariance matrix has two eigenvectors: one along the long axis of the ellipse, one along the short axis. The eigenvalue for the long axis is larger because there's more variance (spread) in that direction.

When you do PCA and keep only the top principal component, you're projecting your data onto the long axis of the ellipse, capturing the direction of maximum spread.

# Visual intuition
import numpy as np

# Generate elliptical data
np.random.seed(42)
data = np.random.randn(100, 2) @ [[3, 1], [1, 2]]  # stretched ellipse

# Compute covariance matrix
cov = np.cov(data.T)

# Get eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(cov)

# eigenvectors[:, 0] points along major axis (larger eigenvalue)
# eigenvectors[:, 1] points along minor axis (smaller eigenvalue)

This is why PCA is so effective for dimensionality reduction. It finds the directions that matter most (highest variance) and lets you discard the rest with minimal information loss.

Wrapping Up

Eigenvectors are the "natural directions" of a transformation. They reveal the structure hidden inside a matrix, showing you which directions matter and by how much.

For your Three.js visualization, show arrows before and after transformation. Most arrows rotate and stretch. But the eigenvector arrows? They only stretch (or flip), stubbornly staying on their line. That visual will cement the intuition better than any formula.

The eigenvalue is the stretch factor. Large eigenvalues mean important directions; small eigenvalues mean directions you might safely ignore. Zero eigenvalues mean the matrix is collapsing space.

In machine learning, this translates directly to variance in your data, stability of your optimization, and the spectral properties of graphs. Master eigenvectors and you've mastered a tool that appears across all of quantitative computing.

Next, we'll apply these ideas to build and train our first neural network, where matrix transformations (learned as weights) chain together to create powerful function approximators.