🌈 RainbowLLM

Why

I created this gem initially for Coney.app to re-route requests to another self-hosted model when one failed, or to pick the right model based on task and cost. I built it on top of RubyLLM for its excellent interface and because it was the library I was already using.

I noticed how dependent we'd become on certain providers as many websites simply stop working when OpenAI or Claude experience downtime, so I wanted to share this solution and give everyone a simple fallback mechanism for AI models.

🚀 Quick Start

bundle add rainbow_llm

📖 Table of Contents

Features
Installation
Configuration
Usage Examples
Advanced Patterns
What's Next
Development
Contributing
License

Features

Automatic failover - Tries providers in your specified order with instant fallback when a provider fails, configurable retry logic and timeouts
Cost optimization - Route to the most cost-effective provider, maximize free tier usage across providers, and avoid rate limit surprises
Flexible configuration - Support for OpenAI-compatible endpoints, Basic Auth and API key authentication, custom endpoints and model mappings
Production ready - Built on the reliable ruby_llm foundation with comprehensive error handling and detailed logging and monitoring. Already running in production in every apps I've built so far.

📦 Installation

Option 1: With Bundler (Recommended)

bundle add rainbow_llm

Option 2: Direct Install

 gem install rainbow_llm

Requirements

Ruby 3.2+

⚙️ Configuration

Configure your providers once, use them everywhere:

# config/initializers/rainbow_llm.rb
RainbowLLM.configure do |config|
  # Local Ollama instance
  config.provider :ollama, {
    provider: "openai_basic",
    uri_base: "http://localhost:11434/v1",
    access_token: ENV['OLLAMA_API_KEY'],
    assume_model_exists: true
  }

  # Cerebras cloud service
  config.provider :cerebras, {
    provider: "openai",
    uri_base: "https://api.cerebras.ai/v1",
    access_token: ENV['CEREBRAS_API_KEY'],
    assume_model_exists: true
  }
end

💡 Usage Examples

Basic Chat Completion

response = RainbowLLM.chat(
  models: ["ollama/llama3.3", "cerebras/llama-3.3-70b"]
).ask("Explain quantum computing to a 5-year-old")

puts response.content
# => "Imagine you have a magical toy that can be in two places at once..."

Fluent API Options

Chain options together for fine-grained control:

# Set temperature and timeout
response = RainbowLLM.chat(models: ["ollama/llama3.3"])
  .with_temperature(0.8)
  .with_timeout(30)
  .ask("Write a creative story")

# Use JSON schema for structured output
response = RainbowLLM.chat(models: ["cerebras/llama-3.3-70b"])
  .with_schema(MySchema)
  .ask("Extract data from this text")

# Chain multiple options
response = RainbowLLM.chat(models: ["openai/gpt-5", "cerebras/llama-3.3-70b"])
  .with_temperature(0.7)
  .with_timeout(45)
  .with_schema(ResponseSchema)
  .ask("Analyze this document")

Response Object:

# Check which model succeeded
response.model
# => "cerebras/llama-3.3-70b" (or nil if all failed)

# Get the content
response.content
# => "The analysis results..." (or nil if all failed)

# Inspect detailed status for each model
response.details
# => {
#      "ollama/llama3.3" => { status: :failed, error: "Connection refused" },
#      "cerebras/llama-3.3-70b" => { status: :success, duration: 1.23 }
#    }

Error Handling

RainbowLLM doesn't raise exceptions, instead, it returns a Response with details:

response = RainbowLLM.chat(
  models: ["primary-model", "backup-model-1", "backup-model-2"]
).ask("Important business question")

if response.content
  puts "Success: #{response.content}"
  puts "Provided by: #{response.model}"
else
  # All providers failed - inspect details to understand why
  puts "All providers failed!"

  response.details.each do |model, info|
    puts "#{model}: #{info[:status]} - #{info[:error]}"
  end
  # => primary-model: failed - Rate limit exceeded
  # => backup-model-1: failed - Connection timeout
  # => backup-model-2: failed - Invalid API key
end

🎯 Advanced Patterns

Cost-Based Routing

# Route to cheapest available provider first
cheap_models = [
  "ollama/llama3.2-3b",   # Free (local)
  "cerebras/llama-3.3-70b", # Free tier
  "openai/gpt-5"    # Paid fallback
]

response = RainbowLLM.chat(models: cheap_models)
  .with_temperature(0.5)
  .ask(user_input)

Performance-Based Routing

# Route to fastest providers for time-sensitive requests
fast_models = [
  "cerebras/llama-3.3-70b",  # Fast cloud
  "openai/gpt-5",         # Fast but more expensive
  "ollama/llama3.2"        # Local but slower
]

response = RainbowLLM.chat(models: fast_models)
  .with_timeout(1) # timeout applied to each request
  .ask(time_sensitive_question)

🔮 What's Next

RainbowLLM is not completed yet! Here's what's coming:

Model registry expansion: Supporting out-of-the-box most of the models listed on models.dev, making it simple to use any model with just a provider/model identifier.
Broader API support: Extending the failover functionality beyond chat completions to other RubyLLM APIs, giving you the same resilience layer across embeddings, vision, and other AI capabilities.

Got ideas or want to help shape the future? Open an issue or submit a PR!

🔧 Development

Want to contribute or run tests locally?

# Clone the repo
git clone https://github.com/a-chris/rainbow_llm.git
cd rainbow_llm

# Install dependencies
bin/setup

# Run tests
rake test

# Launch interactive console
bin/console

🤝 Contributing

We welcome contributions! Here's how you can help:

Report bugs: Open an issue with detailed reproduction steps
Suggest features: What would make RainbowLLM even better?
Submit pull requests: Fix bugs, add features, improve docs
Spread the word: Star the repo, share with friends!

📜 License

RainbowLLM is open source software licensed under the MIT License.

Need help? Open an issue or contact @a-chris

rainbow_llm

Development

Runtime