Enumerable Stats
A Ruby gem that extends Enumerable with statistical methods, making it easy to calculate descriptive statistics and detect outliers in your data collections.
Installation
Add this line to your application's Gemfile:
gem 'enumerable-stats'
And then execute:
bundle install
Or install it yourself as:
gem install enumerable-stats
Usage
Simply require the gem and all Enumerable objects (Arrays, Ranges, Sets, etc.) will have the statistical methods available.
require 'enumerable-stats'
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
puts data.mean # => 5.5
puts data.median # => 5.5
puts data.variance # => 9.17
Statistical Methods
Basic Statistics
#mean
Calculates the arithmetic mean (average) of the collection.
[1, 2, 3, 4, 5].mean # => 3.0
[10, 20, 30].mean # => 20.0
[-1, 0, 1].mean # => 0.0
#median
Calculates the median (middle value) of the collection.
[1, 2, 3, 4, 5].median # => 3 (odd number of elements)
[1, 2, 3, 4].median # => 2.5 (even number of elements)
[5, 1, 3, 2, 4].median # => 3 (automatically sorts)
[].median # => nil (empty collection)
#percentile(percentile)
Calculates the specified percentile of the collection using linear interpolation. This is equivalent to the "linear" method used by many statistical software packages (R-7/Excel method).
# Basic percentile calculations
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
data.percentile(0) # => 1 (minimum value)
data.percentile(25) # => 3.25 (first quartile)
data.percentile(50) # => 5.5 (median)
data.percentile(75) # => 7.75 (third quartile)
data.percentile(100) # => 10 (maximum value)
# Performance monitoring percentiles
response_times = [45, 52, 48, 51, 49, 47, 53, 46, 50, 54]
p50 = response_times.percentile(50) # => 49.5ms (median response time)
p95 = response_times.percentile(95) # => 53.8ms (95th percentile - outlier threshold)
p99 = response_times.percentile(99) # => 53.96ms (99th percentile - extreme outliers)
puts "50% of requests complete within #{p50}ms"
puts "95% of requests complete within #{p95}ms"
puts "99% of requests complete within #{p99}ms"
# Works with any numeric data
scores = [78, 85, 92, 88, 76, 94, 82, 89, 91, 87]
puts "Top 10% threshold: #{scores.percentile(90)}" # => 92.4
puts "Bottom 25% cutoff: #{scores.percentile(25)}" # => 80.5
#variance
Calculates the sample variance of the collection.
[1, 2, 3, 4, 5].variance # => 2.5
[5, 5, 5, 5].variance # => 0.0 (no variation)
#standard_deviation
Calculates the sample standard deviation (square root of variance).
[1, 2, 3, 4, 5].standard_deviation # => 1.58
[5, 5, 5, 5].standard_deviation # => 0.0
Statistical Testing Methods
#t_value(other)
Calculates the t-statistic for comparing the means of two samples using Welch's t-test formula, which doesn't assume equal variances. Used in hypothesis testing to determine if two groups have significantly different means.
# A/B test: comparing conversion rates
control_group = [0.12, 0.11, 0.13, 0.12, 0.14, 0.11, 0.12] # mean ≈ 0.121
test_group = [0.15, 0.16, 0.14, 0.17, 0.15, 0.18, 0.16] # mean ≈ 0.157
t_stat = control_group.t_value(test_group)
puts t_stat # => -4.2 (negative means test_group > control_group)
# Performance comparison: API response times
baseline = [100, 120, 110, 105, 115, 108, 112] # Slower responses
optimized = [85, 95, 90, 88, 92, 87, 89] # Faster responses
t_stat = baseline.t_value(optimized)
puts t_stat # => 5.8 (positive means baseline > optimized, which is bad for response times)
# The larger the absolute t-value, the more significant the difference
puts "Significant difference!" if t_stat.abs > 2.0 # Rule of thumb threshold
#degrees_of_freedom(other)
Calculates the degrees of freedom for statistical testing using Welch's formula. This accounts for different sample sizes and variances between groups and is used alongside the t-statistic for hypothesis testing.
# Calculate degrees of freedom for the same samples
control = [0.12, 0.11, 0.13, 0.12, 0.14, 0.11, 0.12]
test = [0.15, 0.16, 0.14, 0.17, 0.15, 0.18, 0.16]
df = control.degrees_of_freedom(test)
puts df # => ~11.8 (used to look up critical t-values in statistical tables)
# With equal variances, approaches n1 + n2 - 2
equal_var_a = [10, 11, 12, 13, 14] # variance = 2.5
equal_var_b = [15, 16, 17, 18, 19] # variance = 2.5
df_equal = equal_var_a.degrees_of_freedom(equal_var_b)
puts df_equal # => ~8.0 (close to 5 + 5 - 2 = 8)
# With unequal variances, will be less than pooled degrees of freedom
unequal_a = [10, 10, 10, 10, 10] # very low variance
unequal_b = [5, 15, 8, 20, 12, 25, 18] # high variance
df_unequal = unequal_a.degrees_of_freedom(unequal_b)
puts df_unequal # => ~6.2 (much less than 5 + 7 - 2 = 10)
#greater_than?(other, alpha: 0.05)
Tests if this collection's mean is significantly greater than another collection's mean using a one-tailed Student's t-test. Returns true
if the difference is statistically significant at the specified alpha level.
# A/B testing: is the new feature performing better?
control_conversion = [0.118, 0.124, 0.116, 0.121, 0.119, 0.122, 0.117] # ~12.0% avg
variant_conversion = [0.135, 0.142, 0.138, 0.144, 0.140, 0.136, 0.139] # ~13.9% avg
# Is the variant significantly better than control?
puts variant_conversion.greater_than?(control_conversion) # => true (significant improvement)
puts control_conversion.greater_than?(variant_conversion) # => false
# Performance testing: is new optimization significantly faster?
old_response_times = [145, 152, 148, 159, 143, 156, 147, 151, 149, 154] # ~150ms avg
new_response_times = [125, 128, 122, 131, 124, 129, 126, 130, 123, 127] # ~126ms avg
# Are old times significantly greater (worse) than new times?
puts old_response_times.greater_than?(new_response_times) # => true (significant improvement)
# Custom significance level for more conservative testing
puts variant_conversion.greater_than?(control_conversion, alpha: 0.01) # 99% confidence
puts variant_conversion.greater_than?(control_conversion, alpha: 0.10) # 90% confidence
# Check with similar groups (should be false)
similar_a = [10, 11, 12, 13, 14]
similar_b = [10.5, 11.5, 12.5, 13.5, 14.5]
puts similar_b.greater_than?(similar_a) # => false (difference not significant)
#less_than?(other, alpha: 0.05)
Tests if this collection's mean is significantly less than another collection's mean using a one-tailed Student's t-test. Returns true
if the difference is statistically significant at the specified alpha level.
# Response time improvement: are new times significantly lower?
baseline_times = [150, 165, 155, 170, 160, 145, 175, 152, 158, 163] # ~159ms avg
optimized_times = [120, 125, 115, 130, 118, 122, 128, 124, 119, 126] # ~123ms avg
# Are optimized times significantly less (better) than baseline?
puts optimized_times.less_than?(baseline_times) # => true (significant improvement)
puts baseline_times.less_than?(optimized_times) # => false
# Error rate reduction: is new implementation significantly better?
old_error_rates = [0.025, 0.028, 0.024, 0.030, 0.026, 0.027, 0.029] # ~2.7% avg
new_error_rates = [0.012, 0.015, 0.013, 0.016, 0.014, 0.011, 0.013] # ~1.3% avg
puts new_error_rates.less_than?(old_error_rates) # => true (significantly fewer errors)
# Memory usage optimization
before_optimization = [245, 250, 242, 255, 248, 253, 247] # ~248MB avg
after_optimization = [198, 205, 195, 210, 200, 202, 197] # ~201MB avg
puts after_optimization.less_than?(before_optimization) # => true (significant reduction)
# Custom alpha levels
puts optimized_times.less_than?(baseline_times, alpha: 0.01) # More stringent test
puts optimized_times.less_than?(baseline_times, alpha: 0.10) # More lenient test
Comparison Operators
The gem provides convenient operator shortcuts for statistical comparisons:
#>(other, alpha: 0.05)
and #<(other, alpha: 0.05)
Shorthand operators for greater_than?
and less_than?
respectively.
# Performance comparison using operators
baseline_times = [150, 165, 155, 170, 160, 145, 175]
optimized_times = [120, 125, 115, 130, 118, 122, 128]
# These are equivalent:
puts baseline_times.greater_than?(optimized_times) # => true
puts baseline_times > optimized_times # => true
puts optimized_times.less_than?(baseline_times) # => true
puts optimized_times < baseline_times # => true
# With custom alpha levels (use explicit method syntax for parameters)
puts baseline_times.>(optimized_times, alpha: 0.01) # More stringent
puts optimized_times.<(baseline_times, alpha: 0.10) # More lenient
#<=>(other, alpha: 0.05)
- The Spaceship Operator
The spaceship operator provides three-way statistical comparison, returning:
-
1
if this collection's mean is significantly greater than the other -
-1
if this collection's mean is significantly less than the other -
0
if there's no significant statistical difference
This is particularly useful for sorting collections by statistical significance or implementing custom comparison logic.
# Three-way statistical comparison
high_performance = [200, 210, 205, 215, 220] # mean = 210
medium_performance = [150, 160, 155, 165, 170] # mean = 160
low_performance = [50, 60, 55, 65, 70] # mean = 60
puts high_performance <=> medium_performance # => 1 (significantly greater)
puts medium_performance <=> high_performance # => -1 (significantly less)
puts high_performance <=> high_performance # => 0 (no significant difference)
# Sorting datasets by statistical significance
datasets = [
[10, 15, 12, 18, 11], # mean = 13.2
[30, 35, 32, 38, 31], # mean = 33.2
[5, 8, 6, 9, 7], # mean = 7.0
[20, 25, 22, 28, 21] # mean = 23.2
]
# Sort datasets from lowest to highest statistical significance
sorted_datasets = datasets.sort { |a, b| a <=> b }
puts sorted_datasets.map(&:mean) # => [7.0, 13.2, 23.2, 33.2] (ascending by mean)
# A/B testing - sort variants by conversion performance
control = [0.12, 0.11, 0.13, 0.12, 0.10] # 11.6% conversion
variant_a = [0.14, 0.13, 0.15, 0.14, 0.12] # 13.6% conversion
variant_b = [0.16, 0.15, 0.17, 0.16, 0.14] # 15.6% conversion
variants = [control, variant_a, variant_b]
best_to_worst = variants.sort { |a, b| b <=> a } # Descending order
puts "Performance ranking:"
best_to_worst.each_with_index do |variant, index|
puts "#{index + 1}. Mean conversion: #{variant.mean.round(3)}"
end
# Custom alpha levels (default is 0.05)
borderline_a = [100, 102, 104, 106, 108] # mean = 104
borderline_b = [95, 97, 99, 101, 103] # mean = 99
# Standard significance test (95% confidence)
result_standard = borderline_a <=> borderline_b
puts "Standard test (α=0.05): #{result_standard}"
# More lenient test (90% confidence)
# Note: Use method call syntax for custom parameters
result_lenient = borderline_a.public_send(:<=>, borderline_b, alpha: 0.10)
puts "Lenient test (α=0.10): #{result_lenient}"
Comparison Methods
#percentage_difference(other)
Calculates the absolute percentage difference between this collection's mean and another value or collection's mean using the symmetric percentage difference formula.
# Comparing two datasets
control_group = [85, 90, 88, 92, 85] # mean = 88
test_group = [95, 98, 94, 96, 97] # mean = 96
diff = control_group.percentage_difference(test_group)
puts diff # => 8.7% (always positive)
# Comparing collection to single value
response_times = [120, 135, 125, 130, 140] # mean = 130
target = 120
diff = response_times.percentage_difference(target)
puts diff # => 8.0%
# Same result regardless of order
puts control_group.percentage_difference(test_group) # => 8.7%
puts test_group.percentage_difference(control_group) # => 8.7%
#signed_percentage_difference(other)
Calculates the signed percentage difference, preserving the direction of change. Positive values indicate this collection's mean is higher than the comparison; negative values indicate it's lower.
# Performance monitoring - lower is better
baseline = [100, 110, 105, 115, 95] # mean = 105ms
optimized = [85, 95, 90, 100, 80] # mean = 90ms
improvement = optimized.signed_percentage_difference(baseline)
puts improvement # => -15.38% (negative = improvement for response times)
regression = baseline.signed_percentage_difference(optimized)
puts regression # => 15.38% (positive = regression)
# A/B testing - higher is better
control_conversions = [0.12, 0.11, 0.13, 0.12] # mean = 0.12 (12%)
variant_conversions = [0.14, 0.13, 0.15, 0.14] # mean = 0.14 (14%)
lift = variant_conversions.signed_percentage_difference(control_conversions)
puts lift # => 15.38% (positive = improvement for conversion rates)
Outlier Detection
#remove_outliers(multiplier: 1.5)
Removes outliers using the IQR (Interquartile Range) method. This is particularly effective for performance data which often has extreme values due to network issues, CPU scheduling, GC pauses, etc.
# Basic usage
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 100] # 100 is an outlier
clean_data = data.remove_outliers
# => [1, 2, 3, 4, 5, 6, 7, 8, 9] (outlier removed)
# Custom multiplier (more conservative = fewer outliers removed)
data.remove_outliers(multiplier: 2.0) # Less aggressive
data.remove_outliers(multiplier: 1.0) # More aggressive
# Performance data example
response_times = [45, 52, 48, 51, 49, 47, 53, 46, 2000, 48] # 2000ms is an outlier
clean_times = response_times.remove_outliers
# => [45, 52, 48, 51, 49, 47, 53, 46, 48]
Note: Collections with fewer than 4 elements are returned unchanged since quartile calculation requires at least 4 data points.
#outlier_stats(multiplier: 1.5)
Returns detailed statistics about outlier removal for debugging and logging purposes.
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 100]
stats = data.outlier_stats
puts stats
# => {
# original_count: 10,
# filtered_count: 9,
# outliers_removed: 1,
# outlier_percentage: 10.0
# }
Working with Different Collection Types
The gem works with any Enumerable object:
# Arrays
[1, 2, 3, 4, 5].mean # => 3.0
# Ranges
(1..10).median # => 5.5
# Sets
require 'set'
Set.new([1, 2, 3, 3, 4]).variance # => 1.67 (duplicates ignored)
# Custom Enumerable objects
class DataSet
include Enumerable
def initialize(data)
@data = data
end
def each(&block)
@data.each(&block)
end
end
dataset = DataSet.new([10, 20, 30, 40, 50])
dataset.standard_deviation # => 15.81
Real-World Examples
Performance Monitoring
# Analyzing API response times
response_times = [120, 145, 133, 128, 142, 136, 5000, 125, 139, 131]
puts "Original mean: #{response_times.mean.round(2)}ms"
# => "Original mean: 619.9ms" (skewed by the 5000ms outlier)
clean_times = response_times.remove_outliers
puts "Clean mean: #{clean_times.mean.round(2)}ms"
# => "Clean mean: 133.22ms" (more representative)
# Use percentiles for SLA monitoring (industry standard approach)
p50 = clean_times.percentile(50) # Median response time
p95 = clean_times.percentile(95) # 95% of requests complete within this time
p99 = clean_times.percentile(99) # 99% of requests complete within this time
puts "Response Time SLAs:"
puts " p50 (median): #{p50.round(1)}ms"
puts " p95: #{p95.round(1)}ms"
puts " p99: #{p99.round(1)}ms"
# Set alerting thresholds based on percentiles
sla_p95_threshold = 200 # ms
if p95 > sla_p95_threshold
puts "🚨 SLA BREACH: 95th percentile (#{p95.round(1)}ms) exceeds #{sla_p95_threshold}ms"
else
puts "✅ SLA OK: 95th percentile within acceptable limits"
end
# Get outlier statistics for monitoring
stats = response_times.outlier_stats
puts "Removed #{stats[:outliers_removed]} outliers (#{stats[:outlier_percentage]}%)"
# => "Removed 1 outliers (10.0%)"
Data Quality Analysis
# Analyzing sensor readings
temperatures = [22.1, 22.3, 22.0, 22.2, 89.5, 22.1, 22.4] # 89.5 is likely an error
puts "Raw data statistics:"
puts " Mean: #{temperatures.mean.round(2)}°C"
puts " Std Dev: #{temperatures.standard_deviation.round(2)}°C"
clean_temps = temperatures.remove_outliers
puts "\nCleaned data statistics:"
puts " Mean: #{clean_temps.mean.round(2)}°C"
puts " Std Dev: #{clean_temps.standard_deviation.round(2)}°C"
puts " Sample size: #{clean_temps.size}/#{temperatures.size}"
A/B Test Analysis
# Conversion rates for multiple variants
control = [0.12, 0.15, 0.11, 0.14, 0.13, 0.16, 0.12, 0.15] # 13.5% avg conversion
variant_a = [0.18, 0.19, 0.17, 0.20, 0.18, 0.21, 0.19, 0.18] # 18.75% avg conversion
variant_b = [0.16, 0.17, 0.15, 0.18, 0.16, 0.19, 0.17, 0.16] # 16.75% avg conversion
variant_c = [0.22, 0.24, 0.21, 0.25, 0.23, 0.26, 0.22, 0.24] # 23.4% avg conversion
variants = [
{ name: "Control", data: control },
{ name: "Variant A", data: variant_a },
{ name: "Variant B", data: variant_b },
{ name: "Variant C", data: variant_c }
]
# Display individual performance
variants.each do |variant|
mean_pct = (variant[:data].mean * 100).round(1)
std_pct = (variant[:data].standard_deviation * 100).round(1)
puts "#{variant[:name]}: #{mean_pct}% ± #{std_pct}%"
end
# Sort variants by statistical performance using spaceship operator
sorted_variants = variants.sort { |a, b| b[:data] <=> a[:data] } # Descending order
puts "\nPerformance Ranking (statistically significant):"
sorted_variants.each_with_index do |variant, index|
conversion_rate = (variant[:data].mean * 100).round(1)
puts "#{index + 1}. #{variant[:name]}: #{conversion_rate}%"
# Compare to control using statistical significance
if variant[:name] != "Control"
is_significantly_better = variant[:data] > control
puts " #{is_significantly_better ? '✅ Significantly better' : '❌ Not significantly different'} than control"
end
end
# Check for outliers that might skew results
puts "\nOutlier Analysis:"
variants.each do |variant|
outlier_count = variant[:data].outlier_stats[:outliers_removed]
puts "#{variant[:name]} outliers: #{outlier_count}"
end
Performance Comparison
# Before and after optimization comparison
before_optimization = [150, 165, 155, 170, 160, 145, 175] # API response times (ms)
after_optimization = [120, 125, 115, 130, 118, 122, 128]
puts "Before: #{before_optimization.mean.round(1)}ms ± #{before_optimization.standard_deviation.round(1)}ms"
puts "After: #{after_optimization.mean.round(1)}ms ± #{after_optimization.standard_deviation.round(1)}ms"
# Calculate improvement (negative is good for response times)
improvement = after_optimization.signed_percentage_difference(before_optimization)
puts "Performance improvement: #{improvement.round(1)}%" # => "Performance improvement: -23.2%"
# Or use absolute difference for reporting
abs_diff = after_optimization.percentage_difference(before_optimization)
puts "Total performance change: #{abs_diff.round(1)}%" # => "Total performance change: 23.2%"
Statistical Significance Testing
# Comparing two datasets for meaningful differences
dataset_a = [45, 50, 48, 52, 49, 47, 51]
dataset_b = [48, 53, 50, 55, 52, 49, 54]
# Basic comparison
difference = dataset_b.signed_percentage_difference(dataset_a)
puts "Dataset B is #{difference.round(1)}% different from Dataset A"
# Check if difference is large enough to be meaningful
abs_difference = dataset_b.percentage_difference(dataset_a)
if abs_difference > 5.0 # 5% threshold
puts "Difference of #{abs_difference.round(1)}% may be statistically significant"
else
puts "Difference of #{abs_difference.round(1)}% is likely not significant"
end
# Consider variability
a_cv = (dataset_a.standard_deviation / dataset_a.mean) * 100 # Coefficient of variation
b_cv = (dataset_b.standard_deviation / dataset_b.mean) * 100
puts "Dataset A variability: #{a_cv.round(1)}%"
puts "Dataset B variability: #{b_cv.round(1)}%"
Statistical Hypothesis Testing
# Complete example: A/B testing with proper statistical analysis
# Testing whether a new checkout flow improves conversion rates
# Conversion rate data (percentages converted to decimals)
control_conversions = [0.118, 0.124, 0.116, 0.121, 0.119, 0.122, 0.117, 0.120, 0.115, 0.123]
variant_conversions = [0.135, 0.142, 0.138, 0.144, 0.140, 0.136, 0.139, 0.141, 0.137, 0.143]
puts "=== A/B Test Statistical Analysis ==="
puts "Control group (n=#{control_conversions.count}):"
puts " Mean: #{(control_conversions.mean * 100).round(2)}%"
puts " Std Dev: #{(control_conversions.standard_deviation * 100).round(3)}%"
puts "Variant group (n=#{variant_conversions.count}):"
puts " Mean: #{(variant_conversions.mean * 100).round(2)}%"
puts " Std Dev: #{(variant_conversions.standard_deviation * 100).round(3)}%"
# Calculate effect size
lift = variant_conversions.signed_percentage_difference(control_conversions)
puts "\nEffect size: #{lift.round(2)}% lift"
# Perform statistical test
t_statistic = control_conversions.t_value(variant_conversions)
degrees_freedom = control_conversions.degrees_of_freedom(variant_conversions)
puts "\nStatistical test results:"
puts " t-statistic: #{t_statistic.round(3)}"
puts " Degrees of freedom: #{degrees_freedom.round(1)}"
puts " |t| = #{t_statistic.abs.round(3)}"
# Interpret results (simplified - in real analysis, use proper p-value lookup)
if t_statistic.abs > 2.0 # Rough threshold for significance
significance = t_statistic.abs > 3.0 ? "highly significant" : "significant"
direction = t_statistic < 0 ? "Variant is better" : "Control is better"
puts " Result: #{significance} difference detected"
puts " Conclusion: #{direction}"
else
puts " Result: No significant difference detected"
puts " Conclusion: Insufficient evidence for a difference"
end
# Data quality checks
control_outliers = control_conversions.outlier_stats
variant_outliers = variant_conversions.outlier_stats
puts "\nData quality:"
puts " Control outliers: #{control_outliers[:outliers_removed]}/#{control_outliers[:original_count]}"
puts " Variant outliers: #{variant_outliers[:outliers_removed]}/#{variant_outliers[:original_count]}"
if control_outliers[:outliers_removed] > 0 || variant_outliers[:outliers_removed] > 0
puts " ⚠️ Consider investigating outliers before concluding"
end
Production Monitoring with Statistical Analysis
# Monitor API performance changes after deployment
# Compare response times before and after optimization
before_deploy = [145, 152, 148, 159, 143, 156, 147, 151, 149, 154,
146, 158, 150, 153, 144, 157, 148, 152, 147, 155]
after_deploy = [132, 128, 135, 130, 133, 129, 131, 134, 127, 136,
133, 130, 128, 135, 132, 129, 134, 131, 130, 133]
puts "=== Performance Monitoring Analysis ==="
# Remove outliers for more accurate comparison
before_clean = before_deploy.remove_outliers
after_clean = after_deploy.remove_outliers
puts "Before deployment (cleaned): #{before_clean.mean.round(1)}ms ± #{before_clean.standard_deviation.round(1)}ms"
puts "After deployment (cleaned): #{after_clean.mean.round(1)}ms ± #{after_clean.standard_deviation.round(1)}ms"
# Calculate improvement
improvement_pct = after_clean.signed_percentage_difference(before_clean)
improvement_abs = before_clean.mean - after_clean.mean
puts "Improvement: #{improvement_pct.round(1)}% (#{improvement_abs.round(1)}ms faster)"
# Statistical significance test
t_stat = before_clean.t_value(after_clean)
df = before_clean.degrees_of_freedom(after_clean)
puts "Statistical test: t(#{df.round(1)}) = #{t_stat.round(3)}"
if t_stat.abs > 2.5 # Conservative threshold for production changes
puts "✅ Statistically significant improvement confirmed"
puts " Safe to keep the optimization"
else
puts "⚠️ Improvement not statistically significant"
puts " Consider longer observation period"
end
# Monitor for performance regression alerts
alert_threshold = 2.0 # t-statistic threshold for alerts
if t_stat < -alert_threshold # Negative means after > before (regression)
puts "🚨 PERFORMANCE REGRESSION DETECTED!"
puts " Immediate investigation recommended"
end
Method Reference
Method | Description | Returns | Notes |
---|---|---|---|
mean |
Arithmetic mean | Float | Works with any numeric collection |
median |
Middle value | Numeric or nil | Returns nil for empty collections |
percentile(percentile) |
Value at specified percentile (0-100) | Numeric or nil | Uses linear interpolation, R-7/Excel method |
variance |
Sample variance | Float | Uses n-1 denominator (sample variance) |
standard_deviation |
Sample standard deviation | Float | Square root of variance |
t_value(other) |
T-statistic for hypothesis testing | Float | Uses Welch's t-test, handles unequal variances |
degrees_of_freedom(other) |
Degrees of freedom for t-test | Float | Uses Welch's formula, accounts for unequal variances |
greater_than?(other, alpha: 0.05) |
Test if mean is significantly greater | Boolean | One-tailed t-test, customizable alpha level |
less_than?(other, alpha: 0.05) |
Test if mean is significantly less | Boolean | One-tailed t-test, customizable alpha level |
>(other, alpha: 0.05) |
Alias for greater_than?
|
Boolean | Shorthand operator for statistical comparison |
<(other, alpha: 0.05) |
Alias for less_than?
|
Boolean | Shorthand operator for statistical comparison |
<=>(other, alpha: 0.05) |
Three-way statistical comparison | Integer (-1, 0, 1) | Returns 1 if greater, -1 if less, 0 if no significant difference |
percentage_difference(other) |
Absolute percentage difference | Float | Always positive, symmetric comparison |
signed_percentage_difference(other) |
Signed percentage difference | Float | Preserves direction, useful for A/B tests |
remove_outliers(multiplier: 1.5) |
Remove outliers using IQR method | Array | Returns new array, original unchanged |
outlier_stats(multiplier: 1.5) |
Outlier removal statistics | Hash | Useful for monitoring and debugging |
Requirements
- Ruby >= 3.1.0
- No external dependencies
Development
After checking out the repo, run:
bundle install
bundle exec rspec # Run the tests
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/binarycleric/enumerable-stats.
Releasing
Tags a new version and pushes it to GitHub.
bundle exec rake release
License
The gem is available as open source under the terms of the MIT License.