Manifold Learning, MDS, and PCA
π Manifold Learning, MDS, and PCA
Unsupervised Learning β Dimensionality Reduction
Lecture-style notes with full mathematical derivations β¨
π Data Manifold
What is a Manifold?
- A manifold is a low-dimensional surface embedded in a high-dimensional space.
- Although data lives in a high-dimensional space, it often occupies only a small subspace.
Example: MNIST
- Each image: 28 Γ 28 = 784-dimensional
- Total possible images: \(256^{28 \times 28}\)
- In reality:
- Digits concentrate around 10 semantic clusters (0β9)
- Data lies on low-dimensional manifolds
β‘οΈ High dimension β high intrinsic dimension
π§ Manifold Learning
Key Idea
Machine learning often tries to rediscover the hidden low-dimensional manifold.
Formally:
Given data: \(x^{(1)}, x^{(2)}, \dots, x^{(N)} \in \mathbb{R}^D\)
Find an embedding: \(f(x^{(i)}) \in \mathbb{R}^d, \quad d \ll D\)
such that spatial relationships are approximately preserved.
π Multidimensional Scaling (MDS)
Objective
Given a dissimilarity measure \(\rho(x^{(i)}, x^{(j)})\), find embedded points \(f(x^{(i)})\) minimizing distortion:
\[\sum_{i < j} R\Big( \| f(x^{(i)}) - f(x^{(j)}) \|, \rho(x^{(i)}, x^{(j)}) \Big)\]π Principal Component Analysis (PCA)
PCA as a Special Case of MDS
- \[\rho(x^{(i)}, x^{(j)}) = \|x^{(i)} - x^{(j)}\|_2\]
- \[R(a,b) = (a - b)^2\]
β‘οΈ PCA maximally preserves variance
βοΈ PCA Algorithm
Step 1: Zero-centering
\(\bar{x} = \frac{1}{N}\sum_{i=1}^N x^{(i)}, \quad \tilde{x}^{(i)} = x^{(i)} - \bar{x}\)
Step 2: Covariance
\(\hat{\Sigma} = \frac{1}{N}\sum_{i=1}^N \tilde{x}^{(i)}\tilde{x}^{(i)T}\)
Step 3: Eigen Decomposition
\(\hat{\Sigma} = U\Lambda U^T\)
Step 4: Projection
\(z^{(i)} = U_k^T \tilde{x}^{(i)}\)
π§ Summary
- Manifold learning uncovers intrinsic structure
- MDS preserves pairwise distances
- PCA preserves variance optimally