reranker-ruby
Cross-encoder reranking for Ruby RAG pipelines.
After vector search retrieves candidate documents, a reranker scores each candidate against the query using a cross-encoder model, producing far more accurate relevance rankings than embedding similarity alone. This is the single biggest quality improvement you can add to a RAG pipeline.
Bi-encoder (embedding search): score = cosine(embed(query), embed(doc)) — fast, approximate
Cross-encoder (reranking): score = model(query + doc) — slow, precise
The pattern: use bi-encoder for top-100 retrieval, then cross-encoder to rerank to top-10.
Installation
Add to your Gemfile:
gem "reranker-ruby"For local ONNX inference, also install:
gem "onnxruntime"
gem "tokenizers"Quick Start
require "reranker_ruby"
reranker = RerankerRuby::Cohere.new(api_key: ENV["COHERE_API_KEY"])
query = "What is the capital of France?"
documents = [
"Paris is the capital and largest city of France.",
"France is a country in Western Europe.",
"The Eiffel Tower is located in Paris.",
"Berlin is the capital of Germany.",
"Lyon is the second-largest city in France."
]
results = reranker.rerank(query, documents, top_k: 3)
results.each do |r|
puts "#{r.score.round(4)} | #{r.text[0..60]}"
end
# 0.9987 | Paris is the capital and largest city of France.
# 0.8234 | The Eiffel Tower is located in Paris.
# 0.6123 | Lyon is the second-largest city in France.Providers
Cohere Rerank
reranker = RerankerRuby::Cohere.new(api_key: ENV["COHERE_API_KEY"])
results = reranker.rerank(query, documents, top_k: 3)Uses Cohere Rerank API v2 with the rerank-v3.5 model by default.
Jina Rerank
reranker = RerankerRuby::Jina.new(api_key: ENV["JINA_API_KEY"])
results = reranker.rerank(query, documents, top_k: 3)Uses jina-reranker-v2-base-multilingual by default.
Local ONNX Inference
Run cross-encoder models locally without API calls. Models are auto-downloaded from HuggingFace Hub.
reranker = RerankerRuby::Onnx.new(
model: "cross-encoder/ms-marco-MiniLM-L-6-v2"
)
results = reranker.rerank(query, documents, top_k: 3)Or use a local model file:
reranker = RerankerRuby::Onnx.new(
model_path: "/path/to/reranker.onnx",
tokenizer: "cross-encoder/ms-marco-MiniLM-L-6-v2"
)Supported models:
cross-encoder/ms-marco-MiniLM-L-6-v2cross-encoder/ms-marco-MiniLM-L-12-v2BAAI/bge-reranker-baseBAAI/bge-reranker-largeBAAI/bge-reranker-v2-m3
Requires the onnxruntime and tokenizers gems.
Result Object
Every reranker returns an array of Result objects, sorted by relevance (highest first):
result.text # => "Paris is the capital..."
result.score # => 0.9987
result.index # => 0 (position in the original document array)
result.metadata # => {} (preserved from input)
result.to_h # => { text: "...", score: 0.9987, index: 0, metadata: {} }Structured Documents with Metadata
Pass hashes instead of strings. Metadata is preserved through reranking:
documents = [
{ text: "Paris is the capital...", source: "wiki", id: "doc1" },
{ text: "France is a country...", source: "wiki", id: "doc2" },
]
results = reranker.rerank(query, documents, top_k: 3)
results.first.metadata # => { source: "wiki", id: "doc1" }Reciprocal Rank Fusion
Combine results from multiple retrieval strategies before reranking:
vector_results = collection.search(embedding, top_k: 50)
keyword_results = Article.where("content LIKE ?", "%#{query}%").limit(50)
fused = RerankerRuby::RRF.fuse(
vector_results.map(&:id),
keyword_results.map(&:id),
k: 60
)
# => ranked array of IDs by combined relevance
# Then rerank the fused results for final precision
top_docs = fused.first(20).map { |id| Document.find(id) }
final = reranker.rerank(query, top_docs.map(&:content), top_k: 5)Ensemble Reranking
Combine multiple rerankers with weighted score aggregation:
cohere = RerankerRuby::Cohere.new(api_key: ENV["COHERE_API_KEY"])
jina = RerankerRuby::Jina.new(api_key: ENV["JINA_API_KEY"])
ensemble = RerankerRuby::Ensemble.new(
rerankers: [cohere, jina],
weights: [0.6, 0.4],
normalize: :min_max # :min_max, :softmax, :sigmoid, or :none
)
results = ensemble.rerank(query, documents, top_k: 5)Score Normalization
Different models produce scores on different scales. Normalize them for comparison:
results = reranker.rerank(query, documents)
# Min-max to [0, 1]
normalized = RerankerRuby::ScoreNormalizer.min_max(results)
# Softmax (scores sum to 1.0)
normalized = RerankerRuby::ScoreNormalizer.softmax(results)
# Sigmoid (each score independently mapped to [0, 1])
normalized = RerankerRuby::ScoreNormalizer.sigmoid(results)Batch Reranking
Rerank multiple queries concurrently:
queries = ["capital of France?", "tallest building?", "largest ocean?"]
results = RerankerRuby::Batch.rerank(
reranker, queries, documents,
top_k: 5,
threads: 4
)
results[0] # => results for queries[0]
results[1] # => results for queries[1]Caching
Avoid duplicate API calls for the same query+documents:
# In-memory cache
reranker = RerankerRuby::Cohere.new(
api_key: ENV["COHERE_API_KEY"],
cache: RerankerRuby::Cache::Memory.new(ttl: 3600)
)
# Redis cache
require "redis"
reranker = RerankerRuby::Cohere.new(
api_key: ENV["COHERE_API_KEY"],
cache: RerankerRuby::Cache::Redis.new(redis: Redis.new, ttl: 3600)
)
reranker.rerank(query, docs, top_k: 5) # API call
reranker.rerank(query, docs, top_k: 5) # cache hitLogging & Metrics
Every rerank call is automatically instrumented:
# Set log level
RerankerRuby::Logging.logger = Logger.new($stdout)
RerankerRuby::Logging.logger.level = Logger::INFO
# Subscribe to rerank events
RerankerRuby::Logging.on_rerank do |event|
puts "#{event[:reranker]} reranked #{event[:document_count]} docs in #{event[:duration_ms]}ms"
# event keys: :reranker, :query, :document_count, :top_k,
# :result_count, :duration_ms, :top_score
endRails Integration
Configuration
Run the install generator:
rails generate reranker_ruby:installThis creates config/initializers/reranker_ruby.rb:
RerankerRuby.configure do |config|
config.default_provider = :cohere # :cohere, :jina, or :onnx
config.cohere_api_key = ENV["COHERE_API_KEY"]
config.default_top_k = 10
config.cache_store = :memory # :memory, :redis, or nil
config.cache_ttl = 3600
endThen use the global convenience method anywhere:
results = RerankerRuby.rerank("What is Ruby?", documents, top_k: 5)ActiveJob for Async Reranking
For large result sets, run reranking in the background:
RerankerRuby::RerankJob.perform_later(
query: "What is Ruby?",
documents: ["doc1", "doc2", ...],
top_k: 5,
callback: "MyRerankCallback"
)
# Callback class
class MyRerankCallback
def self.on_rerank_complete(query, results)
# results is an array of hashes: [{ text:, score:, index:, metadata: }, ...]
end
endPipeline Middleware
Plug into any RAG pipeline as a reranking step:
middleware = RerankerRuby::Middleware.new(
reranker: RerankerRuby::Cohere.new(api_key: "..."),
top_k: 5,
text_key: :content
)
# Works with hashes, strings, or objects
candidates = [
{ content: "Paris is the capital...", source: "wiki" },
{ content: "Berlin is the capital...", source: "wiki" },
]
results = middleware.call(query: "capital of France?", candidates: candidates)Dependencies
Runtime: net/http (stdlib), json (stdlib), logger
Optional: onnxruntime and tokenizers (for local ONNX inference), redis (for Redis caching)
Development: minitest, rake, webmock
License
MIT