Project

benchgecko

0.0
The project is in a healthy, maintained state
BenchGecko is the CoinGecko for AI. This gem provides a Ruby interface for accessing AI model benchmarks, comparing language models, estimating inference costs, and discovering AI agents. Query structured data on 300+ models across 50+ providers with real benchmark scores, latency metrics, and transparent pricing.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
 Dependencies
 Project Readme

BenchGecko for Ruby

The CoinGecko for AI. Ruby client for accessing AI model benchmarks, comparing language models, estimating inference costs, and discovering AI agents.

BenchGecko tracks 300+ AI models across 50+ providers with real benchmark scores, latency metrics, and transparent pricing. This gem gives you structured access to that data directly in your Ruby applications -- no API key required for the built-in catalog.

Installation

Add to your Gemfile:

gem "benchgecko"

Or install directly:

gem install benchgecko

Quick Start

require "benchgecko"

# Look up any model
model = BenchGecko.get_model("claude-3.5-sonnet")
puts model.name       #=> "Claude 3.5 Sonnet"
puts model.provider   #=> "Anthropic"
puts model.score("MMLU")  #=> 88.7

# List all tracked models
BenchGecko.list_models.each { |id| puts id }

Comparing Models

The comparison engine surfaces benchmark differences and pricing ratios, making it straightforward to evaluate tradeoffs between models:

result = BenchGecko.compare_models("gpt-4o", "claude-3.5-sonnet")

puts result[:cheaper]           #=> "gpt-4o"
puts result[:cost_ratio]        #=> 0.69
puts result[:benchmark_diff]    #=> {"MMLU" => 0.0, "HumanEval" => -1.8, ...}

# Positive diff means model_a scores higher
result[:benchmark_diff].each do |bench, diff|
  next unless diff
  winner = diff >= 0 ? "GPT-4o" : "Claude 3.5 Sonnet"
  puts "#{bench}: #{winner} wins by #{diff.abs} points"
end

Cost Estimation

Estimate inference costs before committing to a provider. Prices are per million tokens:

cost = BenchGecko.estimate_cost("gpt-4o",
  input_tokens: 2_000_000,
  output_tokens: 500_000
)

puts cost[:input_cost]   #=> 5.0
puts cost[:output_cost]  #=> 5.0
puts cost[:total]        #=> 10.0

Finding the Right Model

Filter models by benchmark performance to find the best fit for your workload:

# All models scoring 87+ on MMLU
strong_reasoners = BenchGecko.top_models("MMLU", min_score: 87.0)
strong_reasoners.each { |m| puts "#{m.name}: #{m.score('MMLU')}" }

# Cheapest model above a quality threshold
budget_pick = BenchGecko.cheapest_above("MMLU", 85.0)
puts "#{budget_pick.name} at $#{budget_pick.cost_per_million}/M tokens"

Benchmark Categories

BenchGecko organizes benchmarks into categories covering reasoning, coding, math, instruction following, safety, multimodal, multilingual, and long context evaluation:

BenchGecko.benchmark_categories.each do |key, info|
  puts "#{info[:name]}: #{info[:benchmarks].join(', ')}"
  puts "  #{info[:description]}"
end

Built-in Model Catalog

The gem ships with a curated catalog of major models from OpenAI, Anthropic, Google, Meta, Mistral, and DeepSeek. Each entry includes benchmark scores, parameter counts, context window sizes, and per-token pricing.

model = BenchGecko.get_model("deepseek-v3")
puts model.parameters       #=> 671
puts model.context_window   #=> 128000
puts model.cost_per_million  #=> 0.685

Use Cases

  • Model selection pipelines -- programmatically pick the cheapest model that meets your quality bar
  • Cost monitoring -- estimate monthly spend across different model configurations
  • Benchmark dashboards -- pull structured scores into internal reporting tools
  • Agent evaluation -- compare AI agents across capability dimensions

Resources

License

MIT License. See LICENSE.txt for details.