Just Enumerable Stats¶ ↑
I had some tricky stuff in Statisticus and Sirb that were useful, but I ended up using someone else’s library for plainold calculations on a single enumerable. What a shame, I thought. So I extracted things out and made a simpler library that I’d use for my simpler needs.
I also included a FixedRange class for traversing floating point ranges a little more easily and a nearlyexact copy of the whole library in its own module for containing these methods in your own container. See the Known Issues section below and the specs for that module.
Usage¶ ↑
Would I release a gem without an IRB application included? Probably not. This gem has one, and it’s called jes:
jes Loading Just Enumerable Stats version: 0.0.1
Looking at the library through jes, you can see that we have all the usual goodies:
>> [1,2,3].mean => 2 >> [1,2,3].std => 1.0 >> [1,2,3].cor [2,3,5] => 0.981980506061966
The list of methods are:

average

avg

cartesian_product

categories

category_values

compliment

cor

correlation

count_if

covariance

cum_max

cum_min

cum_prod

cum_sum

cumulative_max

cumulative_min

cumulative_product

cumulative_sum

default_block

default_block=

dichotomize

euclidian_distance

exclusive_not

intersect

is_numeric?

max

max_index

max_of_lists

mean

median

min

min_index

min_of_lists

new_sort

order

pearson_correlation

permutations

product

quantile

rand_in_range

range

range_as_range

range_class

range_instance

rank

set_range

set_range_class

sigma_pairs

standard_deviation

std

sum

tanimoto_correlation

tanimoto_pairs

to_f!

to_pairs

union

var

variance

yield_transpose
One of the more interesting methods is yield_transpose:
[1,2,3].yield_transpose([5,5,5], [2,2,2]) { e e.product } # => [10, 20, 30]
The yield_transpose:

makes a list of lists (read matrix) out of the main object and one or more other lists

yields a block on the columns of the list
In this case, it multiplies 1 * 5 * 2, 2 * 5 * 2, and 3 * 5 * 2 to get the final result.
There are a lot of other interesting tools that do this (RNum, Matrix), but the ones I know about aren’t as flexible as this simple implementation.
Another interesting feature is the default block getter and setter. Sometimes I need to filter, scale, or normalize a result. I can do that in the default block and still hold on to the original value Ultimately, it’s more expensive to do things this way (every computation has to also go through a filter), but it’s a little simpler sometimes. An example:
a = [1,2,3] a.default_block = lambda {e e * 2} a.sum # => [2,4,6] a.std # => 2.0 instead of 1.0
Scaling¶ ↑
There are a few new features for scaling data:
>> a = [1,2,3] => [1, 2, 3] >> a.scale(2) => [2, 4, 6] >> a => [1, 2, 3] >> a.scale!(2) => [2, 4, 6] >> a => [2, 4, 6] >> a.scale {e e  1} => [1, 3, 5] >> a => [2, 4, 6] >> a.scale! {e e  1} => [1, 3, 5] >> a => [1, 3, 5] >> a.scale_between(3,4) => [3.0, 3.5, 4.0] >> a => [1, 3, 5] >> a.scale_between!(3,4) => [3.0, 3.5, 4.0] >> a => [3.0, 3.5, 4.0] >> a = [1,2,3] => [1, 2, 3] >> a.normalize => [0.166666666666667, 0.333333333333333, 0.5] >> a => [1, 2, 3] >> a.normalize! => [0.166666666666667, 0.333333333333333, 0.5] >> a => [0.166666666666667, 0.333333333333333, 0.5] >> a = [5,0,5] => [5, 0, 5] >> a.scale_to_sigmoid => [0.00669285092428486, 0.5, 0.993307149075715] >> a => [5, 0, 5] >> a.scale_to_sigmoid! => [0.00669285092428486, 0.5, 0.993307149075715] >> a => [0.00669285092428486, 0.5, 0.993307149075715]
Basically:

scale can scale by a number or with a block. The block is a transformation for a single element.

scale_between sets the minimum and maximum values, and keeps each value proportionate to each other.

normalize calculates the percentage of an element to the whole.

scale_to_sigmoid uses the sigmoid function to scale a set between 0 and 1 with a Gaussian distribution on the numbers.
Categories¶ ↑
Once I started using this gem with my distribution table classes, I needed to have flexible categories on an enumerable. What that looks like is:
Loading Just Enumerable Stats version: 0.0.4 >> a = [1,2,3] => [1, 2, 3] >> a.categories => [1, 2, 3] >> a.category_values => {1=>[1], 2=>[2], 3=>[3]} >> a.set_range_class FixedRange, 1, 3, 0.5 => FixedRange >> a.categories => [1.0, 1.5, 2.0, 2.5, 3.0] >> a.category_values(true) => {1.0=>[1], 1.5=>[], 2.0=>[2], 2.5=>[], 3.0=>[3]} >> a.set_range({ ?> "<3" => lambda{e e < 3}, ?> "3" => lambda{e e == 3} >> }) => ["<3", "3"] >> a.categories => ["<3", "3"] >> a.category_values(true) => {"<3"=>[1, 2], "3"=>[3]} >> a.count_if {e e < 3} => 2 >> a.count_if {e e == 3} => 1 >> a.dichotomize(2, :small, :big) => [:small, :big] >> a.categories => [:small, :big] >> a.category_values[:small] => [1, 2] >> a.category_values[:big] => [3]
OK, here we go:

If you have facets installed (sudo gem install facets), it will use a Dictionary instead of a Hash, keeping the order of the hash values consistent with the order they were loaded. It’s nice to have an ordered hash, so go ahead and install that.

The categories default as all the unique values in the enumerable

category_values is a hash or dictionary of values for each category. The values are returned as an array. This is cached, so call it like a.category_values(true) to reset the hash.

set_range_class sets an arbitrary class to use for calculating a range. It should respond to map that will return an array of all values in the range. The arguments after FixedRange are the arguments I want to use when instantiating the class. This is an arbitrary list as well. I could just as easily have typed a.set_range_class FixedRange, a.min, a.max, 0.5 with similar results.

FixedRange is a Range that can work with floating numbers. Range.new(1.0, 3.0).map chokes but FixedRange.new(1.0, 3.0).map does not.

set_range takes an arbitrary hash of lambdas and sets the categories to the keys of that hash.

The category_values calculated with a hash of lambdas makes a very flexible set interface for enumerables. There is no rule that the categories setup this way have to be mutually exclusive or collectively exhaustive (MECE), so interesting data sets can be setup here. However, MECE is generally a good guideline for most analysis.

count_if is just like a.select_all{e e < 3}.size, but a little more obvious.

dichotomize just splits the categories into two, with the first category less than or equal to the split value provided (2 in our case)
Obtrusiveness¶ ↑
This gem won’t override methods, but it puts a lot into the Enumerable namespace. It’s almost as abusive as ActiveSupport. I’ll be testing this in a Rails environment to make sure this plays nicely with ActiveSupport. The issue there was that ActiveSupport also wanted to override sum on an enumerable. If you are in an environment where jes isn’t overriding methods, then you’re going to have to use the jes prefix for the calls. So, @a._jes_sum will work, as will @a._jes_standard_deviation, @a._jes_covariance, etc. I’m realizing that I always want to use this library in all of my new stuff, because it simplifies things so much. So, this is going to be an evermore important point.
Installation¶ ↑
sudo gem install davidrichardsjust_enumerable_stats
Dependencies¶ ↑
There’s an optional dependency on facets/dictionary. You’ll have fewer surprises if you use a Dictionary instead of a hash for categories.
Known Issues¶ ↑

I don’t like the quantile methods. I found a different approach that I think is cleaner, that I should implement when I get the time.

This isn’t really for any Enumerable. It’s only tested on Arrays, though I’m pretty sure a lot of my repositories and other custom Enumerables will work well with this. Most importantly, a Hash will fall on its face here, so don’t try it. If you need labeled data, keep an eye out for Marginal, a gem I’m cleaning up that offers loglinear methods on cross tables. That gem will use this gem, so whatever goodies I add here will be available there. Also, there is data_frame out right now that uses this gem. That is a simpler and very useful gem for labeled data.

I imagine the scope of this gem may grow by about a third more methods. It’s not supposed to be an exhaustive list. TeguGears was developed to build these kinds of methods and have them work nicely with other tools. So, anything more than elementary statistics should become a TeguGears class.

I should probably rename the range methods.
COPYRIGHT¶ ↑
Copyright © 2009 David Richards. See LICENSE for details.