The model_selection module provides tools to split data, tune hyperparameters, and evaluate model performance.
KFold: Standard split.StratifiedKFold: Preserves class proportions (essential for classification).TimeSeriesSplit: Respects temporal order.GridSearchCV: Exhaustive search over specified parameter values.RandomizedSearchCV: Samples from distributions (faster, often as good as grid search).HalvingGridSearch: Efficient search using “successive halving” (early stopping for poor params).LearningCurveDisplay: Plots score vs. training set size.ValidationCurveDisplay: Plots score vs. single hyperparameter.RocCurveDisplay, PrecisionRecallDisplay.scoring='roc_auc' or scoring='f1' to optimize for metrics other than accuracy.from sklearn.model_selection import GridSearchCV, StratifiedKFold
from sklearn.svm import SVC
# 1. Setup Parameter Grid
param_grid = {
'C': [0.1, 1, 10],
'kernel': ['linear', 'rbf'],
'gamma': ['scale', 'auto']
}
# 2. Setup Search
cv = StratifiedKFold(n_splits=5)
# refit=True: Fits the best model on the whole training set
grid = GridSearchCV(SVC(), param_grid, cv=cv, scoring='accuracy', refit=True)
# 3. Execution
grid.fit(X_train, y_train)
print("Best Params:", grid.best_params_)
best_model = grid.best_estimator_
from sklearn.model_selection import LearningCurveDisplay
import matplotlib.pyplot as plt
display = LearningCurveDisplay.from_estimator(
estimator, X, y, cv=5, scoring='accuracy', train_sizes=np.linspace(0.1, 1.0, 5)
)
display.plot()
plt.show()
Credits: This cheatsheet is based on the scikit-learn documentation and examples, which are licensed under the BSD 3-Clause License. Copyright (c) 2007 - 2026 The scikit-learn developers. All rights reserved.