Ensemble methods combine the predictions of several base estimators to improve generalizability and robustness over a single estimator.
RandomForestClassifier/Regressor: Builds many deep trees on subsets of data/features and averages results.GradientBoostingClassifier/Regressor: Fits new models to the residuals of previous models.HistGradientBoosting: Modern, extremely fast version similar to LightGBM.AdaBoost: Focuses more on samples that previous models misclassified.VotingClassifier: Combines different models via majority vote (hard) or average probability (soft).StackingClassifier: Trains a “final estimator” (meta-learner) to combine predictions of base learners.from sklearn.ensemble import RandomForestClassifier, HistGradientBoostingClassifier
from sklearn.model_selection import cross_val_score
# 1. Random Forest (Great baseline)
rf = RandomForestClassifier(n_estimators=100, max_depth=None, n_jobs=-1)
# n_jobs=-1 uses all CPU cores
# 2. HistGradientBoosting (Fast for large datasets)
# Categorical support is built-in!
hgb = HistGradientBoostingClassifier(max_iter=100, learning_rate=0.1)
# 3. Voting Ensemble
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
clf1 = LogisticRegression()
clf2 = RandomForestClassifier()
clf3 = SVC(probability=True)
eclf = VotingClassifier(
estimators=[('lr', clf1), ('rf', clf2), ('svc', clf3)],
voting='soft'
)
Credits: This cheatsheet is based on the scikit-learn documentation and examples, which are licensed under the BSD 3-Clause License. Copyright (c) 2007 - 2026 The scikit-learn developers. All rights reserved.