cheatsheet

Scikit-learn Cheatsheet: Decomposition

Matrix decomposition techniques are used for dimensionality reduction, feature extraction, and signal separation.

What can be done?

Dimensionality Reduction: Reduce feature space while preserving variance (PCA).
Feature Extraction: Identify latent patterns (NMF, Factor Analysis).
Blind Source Separation: Separate mixed signals (ICA).
Denoising: Reconstruct data by projecting onto principal components.

Key Algorithms

PCA (Principal Component Analysis):
- Linearly transforms data to a new coordinate system of orthogonal axes (Principal Components).
- IncrementalPCA: Processes data in batches (out-of-core).
- KernelPCA: Uses kernels to find non-linear principal components.
NMF (Non-negative Matrix Factorization):
- All values in the decomposition are non-negative.
- Ideal for part-based representations (e.g., topics in text, parts of faces).
FastICA (Independent Component Analysis):
- Finds components that are maximally independent (not just uncorrelated).
- Famous for the “Cocktail Party Problem” (separating voices).
DictionaryLearning:
- Learns a “dictionary” (basis vectors) such that data can be represented as sparse combinations of these vectors.

Theoretical Background

SVD: Singular Value Decomposition is the underlying engine for most PCA variants.
Variance Explained: A key metric in PCA to decide how many components to keep.

Computational Complexity

PCA: $O(min(n^2 p, n p^2))$ for exact solver. Randomized PCA is much faster for high-dimensional data.

Code Snippet: PCA & NMF

from sklearn.decomposition import PCA, NMF
from sklearn.datasets import load_digits

X, _ = load_digits(return_X_y=True)

# 1. Standard PCA
# n_components can be an int or a float (ratio of variance to keep)
pca = PCA(n_components=0.95) 
X_pca = pca.fit_transform(X)
print(f"Reduced from {X.shape[1]} to {X_pca.shape[1]} features")

# 2. NMF (only works with non-negative data)
nmf = NMF(n_components=10, init='random', random_state=0)
X_nmf = nmf.fit_transform(X)

# 3. ICA for signal separation
from sklearn.decomposition import FastICA
ica = FastICA(n_components=2)
S_source = ica.fit_transform(X_signals) # X_signals is mixed

Credits: This cheatsheet is based on the scikit-learn documentation and examples, which are licensed under the BSD 3-Clause License. Copyright (c) 2007 - 2026 The scikit-learn developers. All rights reserved.

This site is open source. Improve this page.