BenchGecko for Ruby

The CoinGecko for AI. Ruby client for accessing AI model benchmarks, comparing language models, estimating inference costs, and discovering AI agents.

BenchGecko tracks 300+ AI models across 50+ providers with real benchmark scores, latency metrics, and transparent pricing. This gem gives you structured access to that data directly in your Ruby applications -- no API key required for the built-in catalog.

Installation

Add to your Gemfile:

gem "benchgecko"

Or install directly:

gem install benchgecko

Quick Start

require "benchgecko"

# Look up any model
model = BenchGecko.get_model("claude-3.5-sonnet")
puts model.name       #=> "Claude 3.5 Sonnet"
puts model.provider   #=> "Anthropic"
puts model.score("MMLU")  #=> 88.7

# List all tracked models
BenchGecko.list_models.each { |id| puts id }

Comparing Models

The comparison engine surfaces benchmark differences and pricing ratios, making it straightforward to evaluate tradeoffs between models:

result = BenchGecko.compare_models("gpt-4o", "claude-3.5-sonnet")

puts result[:cheaper]           #=> "gpt-4o"
puts result[:cost_ratio]        #=> 0.69
puts result[:benchmark_diff]    #=> {"MMLU" => 0.0, "HumanEval" => -1.8, ...}

# Positive diff means model_a scores higher
result[:benchmark_diff].each do |bench, diff|
  next unless diff
  winner = diff >= 0 ? "GPT-4o" : "Claude 3.5 Sonnet"
  puts "#{bench}: #{winner} wins by #{diff.abs} points"
end

Cost Estimation

Estimate inference costs before committing to a provider. Prices are per million tokens:

cost = BenchGecko.estimate_cost("gpt-4o",
  input_tokens: 2_000_000,
  output_tokens: 500_000
)

puts cost[:input_cost]   #=> 5.0
puts cost[:output_cost]  #=> 5.0
puts cost[:total]        #=> 10.0

Finding the Right Model

Filter models by benchmark performance to find the best fit for your workload:

# All models scoring 87+ on MMLU
strong_reasoners = BenchGecko.top_models("MMLU", min_score: 87.0)
strong_reasoners.each { |m| puts "#{m.name}: #{m.score('MMLU')}" }

# Cheapest model above a quality threshold
budget_pick = BenchGecko.cheapest_above("MMLU", 85.0)
puts "#{budget_pick.name} at $#{budget_pick.cost_per_million}/M tokens"

Benchmark Categories

BenchGecko organizes benchmarks into categories covering reasoning, coding, math, instruction following, safety, multimodal, multilingual, and long context evaluation:

BenchGecko.benchmark_categories.each do |key, info|
  puts "#{info[:name]}: #{info[:benchmarks].join(', ')}"
  puts "  #{info[:description]}"
end

Built-in Model Catalog

The gem ships with a curated catalog of major models from OpenAI, Anthropic, Google, Meta, Mistral, and DeepSeek. Each entry includes benchmark scores, parameter counts, context window sizes, and per-token pricing.

model = BenchGecko.get_model("deepseek-v3")
puts model.parameters       #=> 671
puts model.context_window   #=> 128000
puts model.cost_per_million  #=> 0.685

Use Cases

Model selection pipelines -- programmatically pick the cheapest model that meets your quality bar
Cost monitoring -- estimate monthly spend across different model configurations
Benchmark dashboards -- pull structured scores into internal reporting tools
Agent evaluation -- compare AI agents across capability dimensions

Resources

BenchGecko -- Full platform with interactive comparisons
Source Code -- Contributions welcome

License

MIT License. See LICENSE.txt for details.