Feature selection reduces the number of input variables to reduce overfitting, improve accuracy, and decrease computational cost.
SelectKBest (Filter):
f_classif (ANOVA), chi2, or mutual_info_classif.RFE (Recursive Feature Elimination - Wrapper):
RFECV: Automatically finds the optimal number of features using cross-validation.SelectFromModel (Embedded):
feature_importances_ or coef_ attributes of a fitted estimator.Lasso (L1 penalty) or RandomForest.SequentialFeatureSelector:
from sklearn.pipeline import Pipeline
from sklearn.feature_selection import SelectKBest, f_classif, RFE
from sklearn.ensemble import RandomForestClassifier
# 1. Filter: Select top 10 features via ANOVA
selector = SelectKBest(score_func=f_classif, k=10)
# 2. Wrapper: Recursive Feature Elimination
# Requires an estimator that provides feature importance (e.g. RF, Linear)
rfe = RFE(estimator=RandomForestClassifier(), n_features_to_select=5)
# 3. Embedded: Select from Model
from sklearn.feature_selection import SelectFromModel
from sklearn.linear_model import Lasso
sfm = SelectFromModel(Lasso(alpha=0.1))
# Integrating into Pipeline
pipe = Pipeline([
('feature_selection', selector),
('classification', RandomForestClassifier())
])
pipe.fit(X_train, y_train)
Credits: This cheatsheet is based on the scikit-learn documentation and examples, which are licensed under the BSD 3-Clause License. Copyright (c) 2007 - 2026 The scikit-learn developers. All rights reserved.