CAUTION: Ragdoll is still under development and may not be suitable for production use.
Ragdoll is a powerful Rails engine that adds Retrieval Augmented Generation (RAG) capabilities to any Rails application. It provides semantic search, document ingestion, and context-enhanced AI prompts using vector embeddings and PostgreSQL with pgvector. With support for multiple LLM providers through ruby_llm, you can use OpenAI, Anthropic, Google, Azure, Ollama, and more.
โจ Features
- ๐ Semantic Search - Vector similarity search with flexible embedding models and pgvector
- ๐ค Multi-Provider Support - OpenAI, Anthropic, Google, Azure, Ollama, HuggingFace via ruby_llm
- ๐ Multi-format Support - PDF, DOCX, text, HTML, JSON, XML, CSV document parsing
- ๐ง Context Enhancement - Automatically enhance AI prompts with relevant context
- โก Background Processing - Asynchronous document processing with Sidekiq
- ๐๏ธ Simple API - Clean, intuitive interface for Rails integration
- ๐ Analytics - Search analytics and document management insights
- ๐ง Configurable - Flexible chunking, embedding, and search parameters
- ๐ Flexible Vectors - Variable-length embeddings for different models
๐ Quick Start
Installation
Add Ragdoll to your Rails application:
# Gemfile
gem 'ragdoll'
bundle install
Database Setup
Ragdoll requires PostgreSQL with the pgvector extension:
# Run migrations
rails ragdoll:install:migrations
rails db:migrate
Configuration
# config/initializers/ragdoll.rb
Ragdoll.configure do |config|
# LLM Provider Configuration
config.llm_provider = :openai # or :anthropic, :google, :azure, :ollama, :huggingface
config.embedding_provider = :openai # optional, defaults to llm_provider
# Provider-specific API keys
config.llm_config = {
openai: { api_key: ENV['OPENAI_API_KEY'] },
anthropic: { api_key: ENV['ANTHROPIC_API_KEY'] },
google: { api_key: ENV['GOOGLE_API_KEY'], project_id: ENV['GOOGLE_PROJECT_ID'] }
}
# Embedding and processing settings
config.embedding_model = 'text-embedding-3-small'
config.chunk_size = 1000
config.search_similarity_threshold = 0.7
config.max_embedding_dimensions = 3072 # supports variable-length vectors
end
Basic Usage
# Add documents
Ragdoll.add_file('/path/to/manual.pdf')
Ragdoll.add_text('Your content here', title: 'Knowledge Base')
# Enhance AI prompts with context
enhanced = Ragdoll.enhance_prompt(
'How do I configure the database?',
context_limit: 5
)
# Use enhanced prompt with your AI service
ai_response = YourAI.complete(enhanced[:enhanced_prompt])
๐ API Reference
Context Enhancement for AI
The primary method for RAG applications - automatically finds relevant context and enhances prompts:
enhanced = Ragdoll.enhance_prompt(
"How do I deploy to production?",
context_limit: 3,
threshold: 0.8
)
# Returns:
{
enhanced_prompt: "...", # Prompt with context injected
original_prompt: "...", # Original user prompt
context_sources: [...], # Source documents
context_count: 2 # Number of context chunks
}
Semantic Search
# Search for similar content
results = Ragdoll.search(
"database configuration",
limit: 10,
threshold: 0.6,
filters: { document_type: 'pdf' }
)
# Get raw context without prompt enhancement
context = Ragdoll.client.get_context(
"API authentication",
limit: 5
)
Document Management
# Add documents
Ragdoll.add_file('/docs/manual.pdf')
Ragdoll.add_text('Content', title: 'Guide')
Ragdoll.add_directory('/knowledge-base', recursive: true)
# Manage documents
client = Ragdoll::Client.new
client.update_document(123, title: 'New Title')
client.delete_document(123)
client.list_documents(limit: 50)
# Bulk operations
client.reprocess_failed
client.add_directory('/docs', recursive: true)
๐๏ธ Rails Integration Examples
Chat Controller
class ChatController < ApplicationController
def ask
enhanced = Ragdoll.enhance_prompt(
params[:question],
context_limit: 5
)
ai_response = OpenAI.complete(enhanced[:enhanced_prompt])
render json: {
answer: ai_response,
sources: enhanced[:context_sources],
context_used: enhanced[:context_count] > 0
}
end
end
Support Bot Service
class SupportBot
def initialize
@ragdoll = Ragdoll::Client.new
end
def answer_question(question, category: nil)
filters = { document_type: 'pdf' } if category == 'manual'
context = @ragdoll.get_context(
question,
limit: 3,
threshold: 0.8,
filters: filters
)
if context[:total_chunks] > 0
prompt = build_prompt(question, context[:combined_context])
ai_response = call_ai_service(prompt)
{
answer: ai_response,
confidence: :high,
sources: context[:context_chunks]
}
else
fallback_response(question)
end
end
end
Background Processing
class ProcessDocumentsJob < ApplicationJob
def perform(file_paths)
ragdoll = Ragdoll::Client.new
file_paths.each do |path|
ragdoll.add_file(path, process_immediately: true)
end
end
end
๐ ๏ธ Command Line Tools
Thor Commands
# Document management
thor ragdoll:document:add /path/to/file.pdf --process_now
thor ragdoll:document:list --status completed --limit 20
thor ragdoll:document:show 123
thor ragdoll:document:delete 123 --confirm
# Import operations
thor ragdoll:import:import /docs --recursive --jobs 4
Rake Tasks
# Add documents
rake ragdoll:document:add[/path/to/file.pdf] PROCESS_NOW=true
TITLE="Manual" rake ragdoll:document:add[content.txt]
# Bulk operations
rake ragdoll:document:bulk:reprocess_failed
rake ragdoll:document:bulk:cleanup_orphaned
STATUS=failed rake ragdoll:document:bulk:delete_by_status[failed]
# List and search
LIMIT=50 rake ragdoll:document:list
rake ragdoll:document:show[123]
๐ Supported Document Types
Format | Extension | Features |
---|---|---|
.pdf |
Text extraction, metadata, page info | |
DOCX | .docx |
Paragraphs, tables, document properties |
Text |
.txt , .md
|
Plain text, markdown |
HTML |
.html , .htm
|
Tag stripping, content extraction |
Data |
.json , .xml , .csv
|
Structured data parsing |
โ๏ธ Configuration Options
Multi-Provider Configuration
Ragdoll.configure do |config|
# Primary LLM provider for chat/completion
config.llm_provider = :anthropic
# Separate provider for embeddings (optional)
config.embedding_provider = :openai
# Provider-specific configurations
config.llm_config = {
openai: {
api_key: ENV['OPENAI_API_KEY'],
organization: ENV['OPENAI_ORGANIZATION'], # optional
project: ENV['OPENAI_PROJECT'] # optional
},
anthropic: {
api_key: ENV['ANTHROPIC_API_KEY']
},
google: {
api_key: ENV['GOOGLE_API_KEY'],
project_id: ENV['GOOGLE_PROJECT_ID']
},
azure: {
api_key: ENV['AZURE_API_KEY'],
endpoint: ENV['AZURE_ENDPOINT'],
api_version: ENV['AZURE_API_VERSION']
},
ollama: {
endpoint: ENV['OLLAMA_ENDPOINT'] || 'http://localhost:11434'
},
huggingface: {
api_key: ENV['HUGGINGFACE_API_KEY']
}
}
end
Model and Processing Settings
Ragdoll.configure do |config|
# Embedding configuration
config.embedding_model = 'text-embedding-3-small'
config.max_embedding_dimensions = 3072 # supports variable dimensions
config.default_model = 'gpt-4' # for chat/completion
# Text chunking settings
config.chunk_size = 1000
config.chunk_overlap = 200
# Search and similarity settings
config.search_similarity_threshold = 0.7
config.max_search_results = 10
# Analytics and performance
config.enable_search_analytics = true
config.cache_embeddings = true
# Custom prompt template
config.prompt_template = <<~TEMPLATE
Context: {{context}}
Question: {{prompt}}
Answer:
TEMPLATE
end
Provider Examples
# OpenAI Configuration
Ragdoll.configure do |config|
config.llm_provider = :openai
config.llm_config = {
openai: { api_key: ENV['OPENAI_API_KEY'] }
}
config.embedding_model = 'text-embedding-3-small'
end
# Anthropic + OpenAI Embeddings
Ragdoll.configure do |config|
config.llm_provider = :anthropic
config.embedding_provider = :openai
config.llm_config = {
anthropic: { api_key: ENV['ANTHROPIC_API_KEY'] },
openai: { api_key: ENV['OPENAI_API_KEY'] }
}
end
# Local Ollama Setup
Ragdoll.configure do |config|
config.llm_provider = :ollama
config.llm_config = {
ollama: { endpoint: 'http://localhost:11434' }
}
config.embedding_model = 'nomic-embed-text'
end
๐๏ธ Database Schema
Ragdoll creates three main tables:
-
ragdoll_documents
- Document metadata and content -
ragdoll_embeddings
- Vector embeddings with pgvector (variable dimensions) -
ragdoll_searches
- Search analytics and performance tracking
Key Features
- Variable Vector Dimensions: Supports different embedding models with different dimensions
- Model Tracking: Tracks which embedding model was used for each vector
- Performance Indexes: Optimized for similarity search and filtering
- Search Analytics: Comprehensive search performance and usage tracking
๐ Analytics and Monitoring
# Document statistics
stats = Ragdoll.client.stats
# => { total_documents: 150, total_embeddings: 1250, ... }
# Search analytics
analytics = Ragdoll::Search.analytics(days: 30)
# => {
# total_searches: 500,
# unique_queries: 350,
# average_results: 8.5,
# average_search_time: 0.15,
# success_rate: 85.2,
# most_common_queries: [...],
# search_types: { semantic: 450, keyword: 50 },
# models_used: { "text-embedding-3-small": 400, "text-embedding-3-large": 100 },
# performance_stats: { fastest: 0.05, slowest: 2.3, median: 0.12 }
# }
# Performance monitoring
slow_searches = Ragdoll::Search.slow_searches(2.0) # > 2 seconds
failed_searches = Ragdoll::Search.failed
# Health check
healthy = Ragdoll.client.healthy?
# => true/false
๐งช Testing
# spec/support/ragdoll_helpers.rb
module RagdollHelpers
def setup_test_documents
@ragdoll = Ragdoll::Client.new
@doc = @ragdoll.add_text(
"Rails is a web framework",
title: "Rails Guide",
process_immediately: true
)
end
end
# In your specs
RSpec.describe ChatController do
include RagdollHelpers
before { setup_test_documents }
it "enhances prompts with context" do
enhanced = Ragdoll.enhance_prompt("What is Rails?")
expect(enhanced[:context_count]).to be > 0
end
end
๐ฆ Dependencies
- Rails 8.0+
- PostgreSQL with pgvector extension
- Sidekiq for background processing
- ruby_llm for multi-provider LLM support
- LLM Provider APIs (OpenAI, Anthropic, Google, etc.)
Supported LLM Providers
Provider | Chat/Completion | Embeddings | Notes |
---|---|---|---|
OpenAI | โ | โ | GPT models, text-embedding-3-* |
Anthropic | โ | โ | Claude models |
โ | โ | Gemini models | |
Azure OpenAI | โ | โ | Azure-hosted OpenAI |
Ollama | โ | โ | Local models |
HuggingFace | โ | โ | Various open-source models |
๐ค Contributing
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
๐ License
This gem is available as open source under the terms of the MIT License.
๐ Support
- ๐ Documentation
- ๐ Issues
- ๐ฌ Discussions
Made with โค๏ธ for the Rails community
โญ Star this repo if you find it useful!