cheatsheet

Scikit-learn Overview & Index

Scikit-learn is a premier Python library for machine learning, built on top of NumPy, SciPy, and matplotlib. It provides simple and efficient tools for predictive data analysis.

Core Capabilities

Scikit-learn is organized into several key areas:

1. Supervised Learning

2. Unsupervised Learning

3. Model Building & Selection

Typical Workflow (The fit/predict Pattern)

Almost all objects in Scikit-learn share a uniform interface:

  1. Estimators: model.fit(X_train, y_train)
  2. Predictors: model.predict(X_test)
  3. Transformers: transformer.transform(X) or transformer.fit_transform(X)
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# 1. Initialize
model = RandomForestClassifier()

# 2. Fit
model.fit(X_train, y_train)

# 3. Predict
predictions = model.predict(X_test)

# 4. Evaluate
print(accuracy_score(y_test, predictions))

Detailed Cheatsheets

For deeper dives into specific modules, see the following:


Maintained in the sklearn/ directory.