Project

ckmeans

0.0
The project is in a healthy, maintained state
Repeatable clustering of unidimensional data
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
 Dependencies
 Project Readme

Ckmeans

Repeatable unidimensional data clustering inspired by Ckmeans.1d.dp

Installation

Install the gem and add to the application's Gemfile by executing:

bundle add ckmeans

If bundler is not being used to manage dependencies, install the gem by executing:

gem install ckmeans

Usage

Basic Clustering

# Fixed cluster count (K known in advance)
Ckmeans::Clusterer.new(data, 3).clusters
Ckmedian::Clusterer.new(data, 3).clusters

# Automatic K selection (tries K from kmin to kmax, picks optimal)
Ckmeans::Clusterer.new(data, 1, 10).clusters
Ckmedian::Clusterer.new(data, 1, 10).clusters

Choosing Between Ckmeans and Ckmedian

  • Ckmeans - Minimizes squared distances (L2). Good for normally distributed data.
  • Ckmedian - Minimizes absolute distances (L1). More robust to outliers and data bursts.
# For clean numerical data
temperatures = [20.1, 20.2, 25.5, 25.6, 30.1, 30.2]
Ckmeans::Clusterer.new(temperatures, 1, 5).clusters
# => [[20.1, 20.2], [25.5, 25.6], [30.1, 30.2]]

# For data with outliers (e.g., photo timestamps with bursts)
timestamps = photos.map(&:taken_at).map(&:to_i)
Ckmedian::Clusterer.new(timestamps, 1, 20).clusters

Stable Estimation (Recommended for Edge Cases)

By default, both algorithms use a fast heuristic for estimating K. For datasets with many duplicates, tight clusters, or outliers, use :stable for more robust estimation:

# Stable estimation (uses statistical mixture models)
Ckmeans::Clusterer.new(data, 1, 10, :stable).clusters
Ckmedian::Clusterer.new(data, 1, 10, :stable).clusters

When to use :stable:

  • Small to medium datasets (< 1000 points)
  • Many duplicate values
  • Clusters with very different sizes
  • Photo/event timeline clustering (bursts and gaps)

Expert users: :stable is an alias for :gmm (Gaussian Mixture Model) in Ckmeans and :lmm (Laplace Mixture Model) in Ckmedian.

License

The gem is available as open source under the terms of the LGPL v3 License.

References