Discovering PCA: An Interactive Geometric Exploration

Principal Component Analysis (PCA) is often introduced through abstract algebra and matrix computations. We can also approach PCA geometrically: it finds new axes that best capture the variation in data. By experimenting with different directions, we can discover why PCA chooses the particular axes it does.

This interactive tool lets you explore PCA step by step. First, you can try drawing your own lines and see how well they capture the spread of the data. Then, you can project the data onto the principal components and compare with the original axes. In the end, you’ll see how PCA emerges as a powerful method for reducing data dimensions.

PCA Geometry Explorer

Let’s uncover the hidden logic of PCA!

Window 1: Scatterplot with Adjustable Lines
Window 2: Projection on PC Plane
Window 3: Variation Explained by Each Axis

For guiding your exploration, reflect on these questions as you interact with the PCA Geometry Explorer:

  1. What does the meter measure in Window 1? How can you optimize it when adjusting the first and second lines? (Hint: Think about the distances of each data point to the chosen line(s) and how small or large those distances are overall.)
  2. In Window 2, how do the projections on pc1 and pc2 look compared to projections on the x and y axes? Which axes capture more spread?
  3. In Window 3, compare variation explained by x, y, pc1, and pc2. Why do you think pc1 explains the most variation, while pc2 explains less?

What does PCA do geometrically, and why is the first principal component the most important? PCA finds new axes that show the main directions of data spread. The first component captures the most variation, while the second is perpendicular and captures the rest.