cheatsheet

Scikit-learn Cheatsheet: Biclustering

Biclustering (also known as co-clustering or two-mode clustering) is a data mining technique which allows simultaneous clustering of the rows and columns of a matrix.

What can be done?

Algorithms in scikit-learn

  1. SpectralCoclustering:
    • Finds biclusters with values higher than those in corresponding other rows and columns.
    • Typically used for document-word clustering (identifying topics and their associated documents).
  2. SpectralBiclustering:
    • Assumes a checkerboard structure.
    • Normalizes data to make the checkerboard pattern apparent.

Theoretical Background

Biclustering treats the data matrix as a bipartite graph. Applications of Spectral Graph Theory (specifically Singular Value Decomposition - SVD) are used to find optimal partitions.

Computational Complexity

Application Examples

Pros & Cons

Pros

Code Snippet

import numpy as np
from sklearn.cluster import SpectralCoclustering
from sklearn.datasets import make_biclusters

# 1. Generate sample data
data, rows, columns = make_biclusters(
    shape=(300, 300), n_clusters=5, noise=5, shuffle=True, random_state=0
)

# 2. Fit the model
# n_clusters: The number of biclusters to find.
model = SpectralCoclustering(n_clusters=5, random_state=0)
model.fit(data)

# 3. Access results
# model.row_labels_: Which row cluster each row belongs to
# model.column_labels_: Which column cluster each column belongs to
# model.biclusters_: Boolean mask (n_clusters, n_rows, n_cols)

# Rearranging data to visualize biclusters
fit_data = data[np.argsort(model.row_labels_)]
fit_data = fit_data[:, np.argsort(model.column_labels_)]

Credits: This cheatsheet is based on the scikit-learn documentation and examples, which are licensed under the BSD 3-Clause License. Copyright (c) 2007 - 2026 The scikit-learn developers. All rights reserved.