cheatsheet

Scikit-learn Overview & Index

Scikit-learn is a premier Python library for machine learning, built on top of NumPy, SciPy, and matplotlib. It provides simple and efficient tools for predictive data analysis.

Core Capabilities

Scikit-learn is organized into several key areas:

1. Supervised Learning

Classification: Identifying which category an object belongs to (e.g., SVM, Random Forest, Logistic Regression).
Regression: Predicting a continuous-valued attribute (e.g., Ridge, Lasso). See also Tree and SVM for regressor variants.
Ensemble Methods: Combining the predictions of several base estimators (e.g., Boosting, Bagging).

2. Unsupervised Learning

Clustering: Automatic grouping of similar objects into sets (e.g., K-Means, Spectral Clustering).
Decomposition: Reducing the number of random variables to consider (e.g., PCA, ICA).
Covariance Estimation: Estimating the magnitude of relationship between features.

3. Model Building & Selection

Preprocessing: Feature extraction and normalization.
Model Selection: Comparing, validating and choosing parameters and models (e.g., Grid Search, Cross Validation).
Pipeline/Compose: Chaining estimators and transformers.

Typical Workflow (The `fit`/`predict` Pattern)

Almost all objects in Scikit-learn share a uniform interface:

Estimators: model.fit(X_train, y_train)
Predictors: model.predict(X_test)
Transformers: transformer.transform(X) or transformer.fit_transform(X)

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# 1. Initialize
model = RandomForestClassifier()

# 2. Fit
model.fit(X_train, y_train)

# 3. Predict
predictions = model.predict(X_test)

# 4. Evaluate
print(accuracy_score(y_test, predictions))

Detailed Cheatsheets

For deeper dives into specific modules, see the following:

Data Handling: Datasets, Preprocessing, Impute
Supervised: Linear Models, SVM, Tree, Neighbors, Neural Networks
Unsupervised: Manifold, Mixture Models, Biclustering
Advanced: Feature Selection, Inspection, Kernel Approximation
Meta-Estimators: Multiclass, Multioutput, Calibration
Developer Guide: Developing Estimators

Maintained in the sklearn/ directory.

This site is open source. Improve this page.