BenchGecko for Ruby
The CoinGecko for AI. Ruby client for accessing AI model benchmarks, comparing language models, estimating inference costs, and discovering AI agents.
BenchGecko tracks 300+ AI models across 50+ providers with real benchmark scores, latency metrics, and transparent pricing. This gem gives you structured access to that data directly in your Ruby applications -- no API key required for the built-in catalog.
Installation
Add to your Gemfile:
gem "benchgecko"Or install directly:
gem install benchgeckoQuick Start
require "benchgecko"
# Look up any model
model = BenchGecko.get_model("claude-3.5-sonnet")
puts model.name #=> "Claude 3.5 Sonnet"
puts model.provider #=> "Anthropic"
puts model.score("MMLU") #=> 88.7
# List all tracked models
BenchGecko.list_models.each { |id| puts id }Comparing Models
The comparison engine surfaces benchmark differences and pricing ratios, making it straightforward to evaluate tradeoffs between models:
result = BenchGecko.compare_models("gpt-4o", "claude-3.5-sonnet")
puts result[:cheaper] #=> "gpt-4o"
puts result[:cost_ratio] #=> 0.69
puts result[:benchmark_diff] #=> {"MMLU" => 0.0, "HumanEval" => -1.8, ...}
# Positive diff means model_a scores higher
result[:benchmark_diff].each do |bench, diff|
next unless diff
winner = diff >= 0 ? "GPT-4o" : "Claude 3.5 Sonnet"
puts "#{bench}: #{winner} wins by #{diff.abs} points"
endCost Estimation
Estimate inference costs before committing to a provider. Prices are per million tokens:
cost = BenchGecko.estimate_cost("gpt-4o",
input_tokens: 2_000_000,
output_tokens: 500_000
)
puts cost[:input_cost] #=> 5.0
puts cost[:output_cost] #=> 5.0
puts cost[:total] #=> 10.0Finding the Right Model
Filter models by benchmark performance to find the best fit for your workload:
# All models scoring 87+ on MMLU
strong_reasoners = BenchGecko.top_models("MMLU", min_score: 87.0)
strong_reasoners.each { |m| puts "#{m.name}: #{m.score('MMLU')}" }
# Cheapest model above a quality threshold
budget_pick = BenchGecko.cheapest_above("MMLU", 85.0)
puts "#{budget_pick.name} at $#{budget_pick.cost_per_million}/M tokens"Benchmark Categories
BenchGecko organizes benchmarks into categories covering reasoning, coding, math, instruction following, safety, multimodal, multilingual, and long context evaluation:
BenchGecko.benchmark_categories.each do |key, info|
puts "#{info[:name]}: #{info[:benchmarks].join(', ')}"
puts " #{info[:description]}"
endBuilt-in Model Catalog
The gem ships with a curated catalog of major models from OpenAI, Anthropic, Google, Meta, Mistral, and DeepSeek. Each entry includes benchmark scores, parameter counts, context window sizes, and per-token pricing.
model = BenchGecko.get_model("deepseek-v3")
puts model.parameters #=> 671
puts model.context_window #=> 128000
puts model.cost_per_million #=> 0.685Use Cases
- Model selection pipelines -- programmatically pick the cheapest model that meets your quality bar
- Cost monitoring -- estimate monthly spend across different model configurations
- Benchmark dashboards -- pull structured scores into internal reporting tools
- Agent evaluation -- compare AI agents across capability dimensions
Resources
- BenchGecko -- Full platform with interactive comparisons
- Source Code -- Contributions welcome
License
MIT License. See LICENSE.txt for details.