cheatsheet

Scikit-learn Cheatsheet: Probability Calibration

Probability calibration is the process of adjusting the predicted probabilities of a classifier so they better reflect the actual likelihood of an event.

What can be done?

Calibration Methods

  1. Platt Scaling (method='sigmoid'):
    • Fits a logistic regression model to the classifier’s outputs.
    • Best for Support Vector Machines (SVM) and when training data is small.
  2. Isotonic Regression (method='isotonic'):
    • Fits a non-parametric, non-decreasing function.
    • More powerful than Platt scaling but requires more data (~1000+ samples) and is prone to overfitting.

Theoretical Background

Computational Complexity

Pros & Cons

Pros

Code Snippet

from sklearn.svm import SVC
from sklearn.calibration import CalibratedClassifierCV, calibration_curve
from sklearn.model_selection import train_test_split

X, y = load_your_data()
X_train, X_test, y_train, y_test = train_test_split(X, y)

# 1. Base Estimator (e.g., SVM which is notoriously uncalibrated)
base_clf = SVC(kernel='linear', C=1.0)

# 2. Wrap with Calibration
# method='sigmoid' (Platt) or 'isotonic'
calibrated_clf = CalibratedClassifierCV(base_clf, method='sigmoid', cv=5)
calibrated_clf.fit(X_train, y_train)

# 3. Predict Probabilities
probs = calibrated_clf.predict_proba(X_test)[:, 1]

# 4. Evaluation: Calibration Curve
prob_true, prob_pred = calibration_curve(y_test, probs, n_bins=10)

Credits: This cheatsheet is based on the scikit-learn documentation and examples, which are licensed under the BSD 3-Clause License. Copyright (c) 2007 - 2026 The scikit-learn developers. All rights reserved.