Project

ragdoll

0.0
The project is in a healthy, maintained state
Under development. Contributors welcome.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
 Dependencies

Runtime

~> 7.1
 Project Readme

๐ŸŽฏ Ragdoll

Retrieval Augmented Generation for Rails Applications

Raggedy Ann playing with blocks

CAUTION: Ragdoll is still under development and may not be suitable for production use.

Ragdoll is a powerful Rails engine that adds Retrieval Augmented Generation (RAG) capabilities to any Rails application. It provides semantic search, document ingestion, and context-enhanced AI prompts using vector embeddings and PostgreSQL with pgvector. With support for multiple LLM providers through ruby_llm, you can use OpenAI, Anthropic, Google, Azure, Ollama, and more.

Demo Rails App

โœจ Features

  • ๐Ÿ” Semantic Search - Vector similarity search with flexible embedding models and pgvector
  • ๐Ÿค– Multi-Provider Support - OpenAI, Anthropic, Google, Azure, Ollama, HuggingFace via ruby_llm
  • ๐Ÿ“„ Multi-format Support - PDF, DOCX, text, HTML, JSON, XML, CSV document parsing
  • ๐Ÿง  Context Enhancement - Automatically enhance AI prompts with relevant context
  • โšก Background Processing - Asynchronous document processing with Sidekiq
  • ๐ŸŽ›๏ธ Simple API - Clean, intuitive interface for Rails integration
  • ๐Ÿ“Š Analytics - Search analytics and document management insights
  • ๐Ÿ”ง Configurable - Flexible chunking, embedding, and search parameters
  • ๐Ÿ”„ Flexible Vectors - Variable-length embeddings for different models

๐Ÿš€ Quick Start

Installation

Add Ragdoll to your Rails application:

# Gemfile
gem 'ragdoll'
bundle install

Database Setup

Ragdoll requires PostgreSQL with the pgvector extension:

# Run migrations
rails ragdoll:install:migrations
rails db:migrate

Configuration

# config/initializers/ragdoll.rb
Ragdoll.configure do |config|
  # LLM Provider Configuration
  config.llm_provider = :openai  # or :anthropic, :google, :azure, :ollama, :huggingface
  config.embedding_provider = :openai  # optional, defaults to llm_provider

  # Provider-specific API keys
  config.llm_config = {
    openai: { api_key: ENV['OPENAI_API_KEY'] },
    anthropic: { api_key: ENV['ANTHROPIC_API_KEY'] },
    google: { api_key: ENV['GOOGLE_API_KEY'], project_id: ENV['GOOGLE_PROJECT_ID'] }
  }

  # Embedding and processing settings
  config.embedding_model = 'text-embedding-3-small'
  config.chunk_size = 1000
  config.search_similarity_threshold = 0.7
  config.max_embedding_dimensions = 3072  # supports variable-length vectors
end

Basic Usage

# Add documents
Ragdoll.add_file('/path/to/manual.pdf')
Ragdoll.add_text('Your content here', title: 'Knowledge Base')

# Enhance AI prompts with context
enhanced = Ragdoll.enhance_prompt(
  'How do I configure the database?',
  context_limit: 5
)

# Use enhanced prompt with your AI service
ai_response = YourAI.complete(enhanced[:enhanced_prompt])

๐Ÿ“– API Reference

Context Enhancement for AI

The primary method for RAG applications - automatically finds relevant context and enhances prompts:

enhanced = Ragdoll.enhance_prompt(
  "How do I deploy to production?",
  context_limit: 3,
  threshold: 0.8
)

# Returns:
{
  enhanced_prompt: "...",    # Prompt with context injected
  original_prompt: "...",    # Original user prompt
  context_sources: [...],    # Source documents
  context_count: 2           # Number of context chunks
}

Semantic Search

# Search for similar content
results = Ragdoll.search(
  "database configuration",
  limit: 10,
  threshold: 0.6,
  filters: { document_type: 'pdf' }
)

# Get raw context without prompt enhancement
context = Ragdoll.client.get_context(
  "API authentication",
  limit: 5
)

Document Management

# Add documents
Ragdoll.add_file('/docs/manual.pdf')
Ragdoll.add_text('Content', title: 'Guide')
Ragdoll.add_directory('/knowledge-base', recursive: true)

# Manage documents
client = Ragdoll::Client.new
client.update_document(123, title: 'New Title')
client.delete_document(123)
client.list_documents(limit: 50)

# Bulk operations
client.reprocess_failed
client.add_directory('/docs', recursive: true)

๐Ÿ—๏ธ Rails Integration Examples

Chat Controller

class ChatController < ApplicationController
  def ask
    enhanced = Ragdoll.enhance_prompt(
      params[:question],
      context_limit: 5
    )

    ai_response = OpenAI.complete(enhanced[:enhanced_prompt])

    render json: {
      answer: ai_response,
      sources: enhanced[:context_sources],
      context_used: enhanced[:context_count] > 0
    }
  end
end

Support Bot Service

class SupportBot
  def initialize
    @ragdoll = Ragdoll::Client.new
  end

  def answer_question(question, category: nil)
    filters = { document_type: 'pdf' } if category == 'manual'

    context = @ragdoll.get_context(
      question,
      limit: 3,
      threshold: 0.8,
      filters: filters
    )

    if context[:total_chunks] > 0
      prompt = build_prompt(question, context[:combined_context])
      ai_response = call_ai_service(prompt)

      {
        answer: ai_response,
        confidence: :high,
        sources: context[:context_chunks]
      }
    else
      fallback_response(question)
    end
  end
end

Background Processing

class ProcessDocumentsJob < ApplicationJob
  def perform(file_paths)
    ragdoll = Ragdoll::Client.new

    file_paths.each do |path|
      ragdoll.add_file(path, process_immediately: true)
    end
  end
end

๐Ÿ› ๏ธ Command Line Tools

Thor Commands

# Document management
thor ragdoll:document:add /path/to/file.pdf --process_now
thor ragdoll:document:list --status completed --limit 20
thor ragdoll:document:show 123
thor ragdoll:document:delete 123 --confirm

# Import operations
thor ragdoll:import:import /docs --recursive --jobs 4

Rake Tasks

# Add documents
rake ragdoll:document:add[/path/to/file.pdf] PROCESS_NOW=true
TITLE="Manual" rake ragdoll:document:add[content.txt]

# Bulk operations
rake ragdoll:document:bulk:reprocess_failed
rake ragdoll:document:bulk:cleanup_orphaned
STATUS=failed rake ragdoll:document:bulk:delete_by_status[failed]

# List and search
LIMIT=50 rake ragdoll:document:list
rake ragdoll:document:show[123]

๐Ÿ“‹ Supported Document Types

Format Extension Features
PDF .pdf Text extraction, metadata, page info
DOCX .docx Paragraphs, tables, document properties
Text .txt, .md Plain text, markdown
HTML .html, .htm Tag stripping, content extraction
Data .json, .xml, .csv Structured data parsing

โš™๏ธ Configuration Options

Multi-Provider Configuration

Ragdoll.configure do |config|
  # Primary LLM provider for chat/completion
  config.llm_provider = :anthropic

  # Separate provider for embeddings (optional)
  config.embedding_provider = :openai

  # Provider-specific configurations
  config.llm_config = {
    openai: {
      api_key: ENV['OPENAI_API_KEY'],
      organization: ENV['OPENAI_ORGANIZATION'],  # optional
      project: ENV['OPENAI_PROJECT']              # optional
    },
    anthropic: {
      api_key: ENV['ANTHROPIC_API_KEY']
    },
    google: {
      api_key: ENV['GOOGLE_API_KEY'],
      project_id: ENV['GOOGLE_PROJECT_ID']
    },
    azure: {
      api_key: ENV['AZURE_API_KEY'],
      endpoint: ENV['AZURE_ENDPOINT'],
      api_version: ENV['AZURE_API_VERSION']
    },
    ollama: {
      endpoint: ENV['OLLAMA_ENDPOINT'] || 'http://localhost:11434'
    },
    huggingface: {
      api_key: ENV['HUGGINGFACE_API_KEY']
    }
  }
end

Model and Processing Settings

Ragdoll.configure do |config|
  # Embedding configuration
  config.embedding_model = 'text-embedding-3-small'
  config.max_embedding_dimensions = 3072  # supports variable dimensions
  config.default_model = 'gpt-4'  # for chat/completion

  # Text chunking settings
  config.chunk_size = 1000
  config.chunk_overlap = 200

  # Search and similarity settings
  config.search_similarity_threshold = 0.7
  config.max_search_results = 10

  # Analytics and performance
  config.enable_search_analytics = true
  config.cache_embeddings = true

  # Custom prompt template
  config.prompt_template = <<~TEMPLATE
    Context: {{context}}
    Question: {{prompt}}
    Answer:
  TEMPLATE
end

Provider Examples

# OpenAI Configuration
Ragdoll.configure do |config|
  config.llm_provider = :openai
  config.llm_config = {
    openai: { api_key: ENV['OPENAI_API_KEY'] }
  }
  config.embedding_model = 'text-embedding-3-small'
end

# Anthropic + OpenAI Embeddings
Ragdoll.configure do |config|
  config.llm_provider = :anthropic
  config.embedding_provider = :openai
  config.llm_config = {
    anthropic: { api_key: ENV['ANTHROPIC_API_KEY'] },
    openai: { api_key: ENV['OPENAI_API_KEY'] }
  }
end

# Local Ollama Setup
Ragdoll.configure do |config|
  config.llm_provider = :ollama
  config.llm_config = {
    ollama: { endpoint: 'http://localhost:11434' }
  }
  config.embedding_model = 'nomic-embed-text'
end

๐Ÿ—๏ธ Database Schema

Ragdoll creates three main tables:

  • ragdoll_documents - Document metadata and content
  • ragdoll_embeddings - Vector embeddings with pgvector (variable dimensions)
  • ragdoll_searches - Search analytics and performance tracking

Key Features

  • Variable Vector Dimensions: Supports different embedding models with different dimensions
  • Model Tracking: Tracks which embedding model was used for each vector
  • Performance Indexes: Optimized for similarity search and filtering
  • Search Analytics: Comprehensive search performance and usage tracking

๐Ÿ“Š Analytics and Monitoring

# Document statistics
stats = Ragdoll.client.stats
# => { total_documents: 150, total_embeddings: 1250, ... }

# Search analytics
analytics = Ragdoll::Search.analytics(days: 30)
# => {
#   total_searches: 500,
#   unique_queries: 350,
#   average_results: 8.5,
#   average_search_time: 0.15,
#   success_rate: 85.2,
#   most_common_queries: [...],
#   search_types: { semantic: 450, keyword: 50 },
#   models_used: { "text-embedding-3-small": 400, "text-embedding-3-large": 100 },
#   performance_stats: { fastest: 0.05, slowest: 2.3, median: 0.12 }
# }

# Performance monitoring
slow_searches = Ragdoll::Search.slow_searches(2.0)  # > 2 seconds
failed_searches = Ragdoll::Search.failed

# Health check
healthy = Ragdoll.client.healthy?
# => true/false

๐Ÿงช Testing

# spec/support/ragdoll_helpers.rb
module RagdollHelpers
  def setup_test_documents
    @ragdoll = Ragdoll::Client.new
    @doc = @ragdoll.add_text(
      "Rails is a web framework",
      title: "Rails Guide",
      process_immediately: true
    )
  end
end

# In your specs
RSpec.describe ChatController do
  include RagdollHelpers

  before { setup_test_documents }

  it "enhances prompts with context" do
    enhanced = Ragdoll.enhance_prompt("What is Rails?")
    expect(enhanced[:context_count]).to be > 0
  end
end

๐Ÿ“ฆ Dependencies

  • Rails 8.0+
  • PostgreSQL with pgvector extension
  • Sidekiq for background processing
  • ruby_llm for multi-provider LLM support
  • LLM Provider APIs (OpenAI, Anthropic, Google, etc.)

Supported LLM Providers

Provider Chat/Completion Embeddings Notes
OpenAI โœ… โœ… GPT models, text-embedding-3-*
Anthropic โœ… โŒ Claude models
Google โœ… โœ… Gemini models
Azure OpenAI โœ… โœ… Azure-hosted OpenAI
Ollama โœ… โœ… Local models
HuggingFace โœ… โœ… Various open-source models

๐Ÿค Contributing

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

๐Ÿ“„ License

This gem is available as open source under the terms of the MIT License.

๐Ÿ†˜ Support


Made with โค๏ธ for the Rails community

โญ Star this repo if you find it useful!