cheatsheet

Scikit-learn Cheatsheet: Ensemble Methods

Ensemble methods combine the predictions of several base estimators to improve generalizability and robustness over a single estimator.

What can be done?

Key Algorithms

  1. Bagging (Averaging):
    • RandomForestClassifier/Regressor: Builds many deep trees on subsets of data/features and averages results.
  2. Boosting (Sequential):
    • GradientBoostingClassifier/Regressor: Fits new models to the residuals of previous models.
    • HistGradientBoosting: Modern, extremely fast version similar to LightGBM.
    • AdaBoost: Focuses more on samples that previous models misclassified.
  3. Voting:
    • VotingClassifier: Combines different models via majority vote (hard) or average probability (soft).
  4. Stacking:
    • StackingClassifier: Trains a “final estimator” (meta-learner) to combine predictions of base learners.

Theoretical Background

Computational Complexity

Code Snippet: Random Forest & HistGradientBoosting

from sklearn.ensemble import RandomForestClassifier, HistGradientBoostingClassifier
from sklearn.model_selection import cross_val_score

# 1. Random Forest (Great baseline)
rf = RandomForestClassifier(n_estimators=100, max_depth=None, n_jobs=-1)
# n_jobs=-1 uses all CPU cores

# 2. HistGradientBoosting (Fast for large datasets)
# Categorical support is built-in!
hgb = HistGradientBoostingClassifier(max_iter=100, learning_rate=0.1)

# 3. Voting Ensemble
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC

clf1 = LogisticRegression()
clf2 = RandomForestClassifier()
clf3 = SVC(probability=True)

eclf = VotingClassifier(
    estimators=[('lr', clf1), ('rf', clf2), ('svc', clf3)], 
    voting='soft'
)

Credits: This cheatsheet is based on the scikit-learn documentation and examples, which are licensed under the BSD 3-Clause License. Copyright (c) 2007 - 2026 The scikit-learn developers. All rights reserved.