Kernel approximation allows you to use kernel-based methods (like SVM with RBF kernel) on large datasets by projecting features into a finite-dimensional space where linear models can be used.
SVC.SGDClassifier or Ridge.RBFSampler:
| Map features to a space where the dot product approximates the kernel $K(x, y) = \exp(-\gamma | x-y | ^2)$. |
Nystroem:
RBFSampler for many datasets but requires keeping the training samples in memory for the transformation.PolynomialCountSketch:
RBFSampler.Nystroem finds a low-rank approximation of the (potentially infinite) kernel matrix.from sklearn.kernel_approximation import RBFSampler, Nystroem
from sklearn.linear_model import SGDClassifier
from sklearn.pipeline import Pipeline
# 1. Random Fourier Features
rbf_feature = RBFSampler(gamma=1, n_components=100, random_state=1)
X_features = rbf_feature.fit_transform(X)
# 2. Pipeline for Large Scale Learning
# This allows using "RBF-like" SVC on millions of rows
pipe = Pipeline([
('kernel_approx', Nystroem(kernel='rbf', n_components=300)),
('linear_model', SGDClassifier(loss='hinge')) # hing loss + kernel = SVM
])
pipe.fit(X_train, y_train)
Credits: This cheatsheet is based on the scikit-learn documentation and examples, which are licensed under the BSD 3-Clause License. Copyright (c) 2007 - 2026 The scikit-learn developers. All rights reserved.