Skip to content

omnibenchmark/clustering_example

Repository files navigation

A clustering example for omnibenchmark

How to run

  1. Install omnibenchmark using our tutorial
  2. Clone the benchmark definition / this repository with git clone git@github.com:omnibenchmark/clustering_example.git
  3. Move to the cloned repository cd clustering_example
  4. Run locally, somewhat in parallel ob run benchmark -b CLUSTERING.YAML --local --threads 6. Choose Clustering.yml specification based on whether running it with conda, easybuild, apptainer, etc. More details about the available backends.

Clustbench attribution

by Marek Gagolewski, modified by Izaskun Mallona

Data disclaimer

Some datasets are commented out to speed up calculations.

From Are cluster validity measures (in) valid?:

The original benchmark battery consists of 79 data instances, however 16 datasets are accompanied by labels that yield ; they were omitted for their computation would be too lengthy (namely: mnist/digits, mnist/fashion, other/chameleon_t7_10k, other/chameleon_t8_8k, sipu/a1, sipu/a2, sipu/a3, sipu/birch1, sipu/birch2, sipu/d31, sipu/s1, sipu/s2, sipu/s3, sipu/s4, sipu/worms_2, sipu/worms_64). Also uci/glass has been removed as one of its 25-near-neighbour graph’s connected components was too small for the NN-based methods to succeed. This leaves us with 62 datasets in total, see Table 1.

A yaml such as 0a88c91 with 30 cores should run half of the stuff in ~4 h and reach 97% completion in ~8h.

Summary

Software backends

In envs: conda, apptainer, easybuild (lmod modules)

Warnings

Mind we try to run clusterings specifying the true number of clusters +- 2. But sometimes the true number is k=3. Then we do k=2, k=2, k=3, k=5, k=6 filling with k=2s as needed, and recomputing the same values multiple times (so runtimes are comparable across datasets, regardless of their true number of clusters).

Also, we have modules by Daniel not fully incorporated into Gagolewski's flow.

Releases

No releases published

Packages

No packages published

Languages