Project

rag-ruby

0.0
No release in over 3 years
A batteries-included RAG framework that orchestrates document loading, chunking, embedding, vector storage, retrieval, and generation. Think LangChain for Ruby — simpler, more opinionated, and Rails-native.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
 Dependencies

Development

~> 5.0
~> 13.0
~> 3.0

Runtime

 Project Readme

rag-ruby

A batteries-included RAG (Retrieval-Augmented Generation) pipeline framework for Ruby and Rails.

Orchestrates the full workflow: document loading → chunking → embedding → storage → retrieval → generation. Think LangChain for Ruby — simpler, more opinionated, and Rails-native.

Installation

Add to your Gemfile:

gem "rag-ruby"

Then run:

bundle install

Quick Start

require "rag_ruby"

pipeline = RagRuby::Pipeline.new do |config|
  config.loader :file
  config.chunker :recursive_character, chunk_size: 1000, chunk_overlap: 200
  config.embedder :openai, model: "text-embedding-3-small"
  config.store :memory, dimension: 1536
  config.generator :openai, model: "gpt-4o"
end

# Ingest documents
pipeline.ingest("docs/manual.md")
pipeline.ingest_directory("docs/", glob: "**/*.{md,txt}")

# Query with RAG
answer = pipeline.query("How do I reset my password?")
answer.text       # => "To reset your password, go to Settings > Security..."
answer.sources    # => [#<Source chunk="..." score=0.92>, ...]
answer.tokens_used # => { prompt: 1200, completion: 150 }

Components

Every stage of the pipeline is pluggable. Mix and match providers to fit your stack.

Document Loaders

Loader Description Require
:file Local files (.txt, .md) Built-in
:directory Bulk load from directory Built-in
:url Fetch from URLs Built-in
:active_record Load from ActiveRecord models Built-in
# Load a single file
pipeline.ingest("path/to/document.md")

# Load a directory
pipeline.ingest_directory("documents/", glob: "**/*.{md,txt}")

# Custom loader
class SlackLoader < RagRuby::Loaders::Base
  def load(channel_id)
    messages = SlackAPI.history(channel_id)
    messages.map do |msg|
      RagRuby::Document.new(
        content: msg.text,
        metadata: { author: msg.user, channel: channel_id }
      )
    end
  end
end

pipeline.ingest(channel_id, loader: SlackLoader.new)

Embedders

Provider Description Require
:openai OpenAI text-embedding-3-small/large OPENAI_API_KEY env var
:cohere Cohere embed-english-v3.0 COHERE_API_KEY env var
:onnx Local ONNX models (all-MiniLM-L6-v2) gem "onnx-ruby"
# API-based
config.embedder :openai, model: "text-embedding-3-small"
config.embedder :cohere, model: "embed-english-v3.0"

# Local (no API calls)
config.embedder :onnx, model: "all-MiniLM-L6-v2"

Vector Stores

Store Description Require
:memory In-memory store (great for dev/test) Built-in
:zvec Persistent file-based vector store gem "zvec-ruby"
config.store :memory, dimension: 1536
config.store :zvec, path: "./vectors", dimension: 1536

Custom stores are easy — implement add, search, delete, and count:

class PineconeStore < RagRuby::Stores::Base
  def add(id, embedding:, metadata: {}, chunk: nil) = ...
  def search(embedding, top_k:, filter: nil) = ...
  def delete(id) = ...
  def count = ...
end

Generators

Provider Description Require
:openai OpenAI chat completions OPENAI_API_KEY env var
:ruby_llm Any model via ruby_llm gem "ruby_llm"
config.generator :openai, model: "gpt-4o"
config.generator :ruby_llm, model: "claude-sonnet-4-20250514"

Query Options

answer = pipeline.query("What changed in v2.0?",
  top_k: 10,                    # number of chunks to retrieve
  filter: { category: "changelog" }, # metadata filter
  temperature: 0.0,             # generation temperature
  system_prompt: "You are a technical docs assistant."
)

answer.text        # generated answer
answer.sources     # retrieved chunks with scores
answer.tokens_used # { prompt: ..., completion: ... }
answer.duration    # query time in seconds
answer.query       # original question

Callbacks & Observability

Hook into every stage of the pipeline for logging, metrics, or debugging:

pipeline = RagRuby::Pipeline.new do |config|
  # ... providers ...

  config.on(:before_load)  { |src| puts "Loading: #{src}" }
  config.on(:after_load)   { |docs| puts "Loaded #{docs.size} documents" }
  config.on(:before_chunk) { |doc| puts "Chunking: #{doc.source}" }
  config.on(:after_chunk)  { |chunks| puts "Created #{chunks.size} chunks" }
  config.on(:before_embed) { |chunks| puts "Embedding #{chunks.size} chunks" }
  config.on(:after_embed)  { |chunks| puts "Embedded #{chunks.size} chunks" }
  config.on(:before_store) { |chunks| puts "Storing #{chunks.size} chunks" }
  config.on(:after_store)  { |chunks| puts "Stored #{chunks.size} chunks" }
  config.on(:before_query) { |q| Metrics.increment("rag.queries") }
  config.on(:after_query)  { |q, answer| Metrics.timing("rag.latency", answer.duration) }
end

Rails Integration

Setup

rails generate rag:install

This creates:

  • config/rag.yml — environment-specific configuration
  • config/initializers/rag_ruby.rb — optional programmatic config

Configuration

# config/rag.yml
default: &default
  chunker:
    strategy: recursive_character
    chunk_size: 1000
    chunk_overlap: 200
  embedder:
    provider: openai
    model: text-embedding-3-small
  store:
    provider: memory
    dimension: 1536
  generator:
    provider: openai
    model: gpt-4o

development:
  <<: *default

production:
  <<: *default
  store:
    provider: zvec
    path: db/vectors
    dimension: 1536

Auto-Index Models

class Article < ApplicationRecord
  include RagRuby::Indexable

  rag_index :content,
    metadata: ->(article) { { category: article.category, author: article.author } },
    on: [:create, :update]
end

# Articles are automatically indexed when saved
Article.create!(title: "Guide", content: "# Getting Started\n...")

Global API

# Search for relevant chunks
results = RagRuby.search("How to get started?", top_k: 5)

# Full RAG: retrieve + generate
answer = RagRuby.ask("How to get started?")

Controller Usage

class ChatController < ApplicationController
  def ask
    answer = RagRuby.ask(params[:question])
    render json: {
      answer: answer.text,
      sources: answer.sources.map(&:to_h)
    }
  end
end

Architecture

Ingestion Flow

Document → Loader → [Document] → Chunker → [Chunk] → Embedder → [Chunk+Embedding] → Store

Query Flow

Question → Embedder → Vector → Store.search → [Chunk] → build_context → Generator → Answer

Each stage is independent and swappable. The Pipeline class orchestrates the flow.

Dependencies

Gem Purpose Required?
chunker-ruby Text chunking Yes
zvec-ruby Persistent vector storage Optional
onnx-ruby Local ONNX embeddings Optional
ruby_llm Multi-provider LLM generation Optional

Development

git clone https://github.com/johannesdwicahyo/rag-ruby.git
cd rag-ruby
bundle install
bundle exec rake test

License

MIT License. See LICENSE for details.