CluDA
The aim of CLuDA is to group the data points into clusters such that similar items are lumped together in the same cluster, using different classification supervised or unsupervised learning techniques.
#Installation
gem install cluda
#Usage
In the current version it only exist Kmeans as a clustering algorithm, but in future updates the idea is to have several options to choose for clustering.
CluDA is prepared to use any clustering algorithm that is implemented within it and call the method 'classify' to get the output. Classify is has 2 mandatory parameters and 2 optionals:
Cluda::X.classify( list, k: K, distance_method: DISTANCE, max_iterations: MAX )
Mandatory:
- list => List of points that you wish to classify
Optional:
- k => Number of clusters. 1 (default)
- centroids => If you wish to work with specific initial centroids
- distance_method => Should be a string in lowercase and can be: * 'euclidean' (default) * 'manhattan' * 'chebyshev'
- be_smart => In case is necessary CluDA will create new centroids to the set passed as parameter. False (default)
- margin_distance_percentage => In case using Smart Clustering be careful with the distances for the centroids. Cluda will create as many centroids as it sees from the data. This parameter is a way to control the number of clusters. Should be a number between 0 and 1. 0 (default)
- max_iterations => Natural > 0 for local minimums. 50 (default)
The output will always be an hash with the centroids and the points clustered to the corresponding centroid.
##KMeans
Anytime that you want to use it, simply follow Cluda by the 'Kmeans' class. Showed in the example above:
require 'cluda'
...
points = [ { x: 1, y: 1}, { x: 2, y: 1}, { x: 1, y: 2}, { x: 2, y: 2}, { x: 4, y: 6}, { x: 5, y: 7}, { x: 5, y: 6}, { x: 5, y: 5}, { x: 6, y: 6}, { x: 6, y: 5}]
Cluda::Kmeans.classify( points, k: 1)
...
Output
=> {{:x=>4, :y=>5}=>
[{:x=>1, :y=>1},
{:x=>2, :y=>1},
{:x=>1, :y=>2},
{:x=>2, :y=>2},
{:x=>4, :y=>6},
{:x=>5, :y=>7},
{:x=>5, :y=>6},
{:x=>5, :y=>5},
{:x=>6, :y=>6},
{:x=>6, :y=>5}]}
Other examples followed by the outputs:
Cluda::Kmeans.classify( points, k: 2, distance_method: 'euclidean' )
Output
=> {{:x=>1, :y=>1}=>
[{:x=>1, :y=>1}, {:x=>2, :y=>1}, {:x=>1, :y=>2}, {:x=>2, :y=>2}],
{:x=>5, :y=>6}=>
[{:x=>4, :y=>6},
{:x=>5, :y=>7},
{:x=>5, :y=>6},
{:x=>5, :y=>5},
{:x=>6, :y=>6},
{:x=>6, :y=>5}]}
Cluda::Kmeans.classify( points, k: 2, distance_method: 'manhattan' )
Output
=> {{:x=>5, :y=>6}=>
[{:x=>4, :y=>6},
{:x=>5, :y=>7},
{:x=>5, :y=>6},
{:x=>5, :y=>5},
{:x=>6, :y=>6},
{:x=>6, :y=>5}],
{:x=>1, :y=>1}=>
[{:x=>1, :y=>1}, {:x=>2, :y=>1}, {:x=>1, :y=>2}, {:x=>2, :y=>2}]}
Cluda::Kmeans.classify( points, k: 2, distance_method: 'chebyshev' )
Output
=> {{:x=>1, :y=>1}=>
[{:x=>1, :y=>1}, {:x=>2, :y=>1}, {:x=>1, :y=>2}, {:x=>2, :y=>2}],
{:x=>5, :y=>6}=>
[{:x=>4, :y=>6},
{:x=>5, :y=>7},
{:x=>5, :y=>6},
{:x=>5, :y=>5},
{:x=>6, :y=>6},
{:x=>6, :y=>5}]}