cheatsheet

Scikit-learn Cheatsheet: Manifold Learning

Manifold learning is an approach to non-linear dimensionality reduction. It assumes that data lies along a low-dimensional “manifold” embedded in high-dimensional space.

What can be done?

Key Algorithms

  1. TSNE (t-distributed Stochastic Neighbor Embedding):
    • Most popular for visualization. Keeps similar points together and dissimilar points apart.
    • Note: Not for feature engineering (output doesn’t preserve global distances or scale).
  2. Isomap:
    • Seeks a low-dimensional embedding that maintains “geodesic distances” between all points.
  3. LLE (Locally Linear Embedding):
    • Recovers global structure from locally linear fits.
  4. MDS (Multidimensional Scaling):
    • Aims to preserve the distances between points as much as possible.
  5. SpectralEmbedding:
    • Uses Eigendecomposition of the graph Laplacian.

Best Practices

Computational Complexity

Code Snippet: T-SNE for Visualization

from sklearn.manifold import TSNE
from sklearn.datasets import load_digits
from sklearn.preprocessing import StandardScaler

X, y = load_digits(return_X_y=True)
X_scaled = StandardScaler().fit_transform(X)

# 1. Apply T-SNE
tsne = TSNE(n_components=2, perplexity=30, n_iter=1000, random_state=42)
X_embedded = tsne.fit_transform(X_scaled)

# 2. Plotting (Visualization)
import matplotlib.pyplot as plt
plt.scatter(X_embedded[:, 0], X_embedded[:, 1], c=y, cmap='tab10')
plt.colorbar()

Credits: This cheatsheet is based on the scikit-learn documentation and examples, which are licensed under the BSD 3-Clause License. Copyright (c) 2007 - 2026 The scikit-learn developers. All rights reserved.