cheatsheet

Scikit-learn Cheatsheet: Compose (Pipelines & Meta-Estimators)

The compose module provides tools to combine multiple estimators into a single one, facilitating complex workflows and preventing data leakage.

What can be done?

Key Tools

  1. Pipeline:
    • Chains multiple steps. Only the last step can be an estimator (model), others must be transformers.
    • Ensures that transformers are fitted on training data and applied to test data correctly.
  2. ColumnTransformer:
    • Routes specific columns to specific transformers.
    • Essential for handling mixed-type data (numeric + categorical).
  3. FeatureUnion:
    • Concatenates the results of multiple transformer objects.
  4. TransformedTargetRegressor:
    • Wraps a regressor to apply a transformation to the target $y$ before fitting and an inverse transformation after predicting.

Best Practices

Code Snippet: Pipeline & ColumnTransformer

from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.linear_model import LogisticRegression

# 1. Define transformers for different column types
numeric_features = ["age", "fare"]
numeric_transformer = Pipeline(steps=[("scaler", StandardScaler())])

categorical_features = ["embarked", "sex"]
categorical_transformer = OneHotEncoder(handle_unknown="ignore")

# 2. Bundle them in a ColumnTransformer
preprocessor = ColumnTransformer(
    transformers=[
        ("num", numeric_transformer, numeric_features),
        ("cat", categorical_transformer, categorical_features),
    ]
)

# 3. Create the final Pipeline
clf = Pipeline(
    steps=[("preprocessor", preprocessor), ("classifier", LogisticRegression())]
)

# 4. Use it as a single estimator
clf.fit(X_train, y_train)
score = clf.score(X_test, y_test)

Credits: This cheatsheet is based on the scikit-learn documentation and examples, which are licensed under the BSD 3-Clause License. Copyright (c) 2007 - 2026 The scikit-learn developers. All rights reserved.