Caution
Software Under Development by a Crazy Man
Evolved from multi-modal to unified text-based RAG architecture.
Ragdoll::CLI
Standalone command-line interface for the Ragdoll unified text-based RAG system. Converts all media types to searchable text and provides powerful cross-modal search capabilities through a simple CLI.
Installation
gem install ragdoll-cli
This will install the ragdoll
command-line tool.
Quick Start
-
Initialize configuration:
ragdoll init
-
Set your API key:
export OPENAI_API_KEY=your_api_key_here
-
Add documents:
ragdoll add docs/*.pdf --recursive
-
Search for content:
ragdoll search "What is machine learning?"
Commands
Configuration
# Initialize configuration
ragdoll init
# Show current configuration
ragdoll config show
# Set configuration values
ragdoll config set llm_provider openai
ragdoll config set chunk_size 1000
# Get configuration values
ragdoll config get embedding_model
# Show config file path
ragdoll config path
# Show database configuration and status
ragdoll config database
Document Management
# Add a single document
ragdoll add document.pdf
# Add multiple documents and directories
ragdoll add file1.pdf file2.txt ../docs
# Add files matching a pattern
ragdoll add "documents/*.pdf"
# Add recursively from directory (default: true)
ragdoll add "docs/" --recursive
# Filter by document type
ragdoll add "files/*" --type pdf
# Available types: pdf, docx, txt, md, html
# Skip confirmation prompts
ragdoll add docs/ --skip-confirmation
# Force addition of duplicate documents
ragdoll add document.pdf --force-duplicate
# Available types: pdf, docx, txt, md, html
Duplicate Detection
Ragdoll automatically detects and prevents duplicate documents from being processed:
# Normal behavior - duplicates are detected and skipped
ragdoll add document.pdf
ragdoll add document.pdf # Skipped (duplicate detected)
# Force addition of duplicates when needed
ragdoll add document.pdf --force-duplicate # Creates new document despite duplicate
# Batch processing safely handles mixed new/duplicate files
ragdoll add docs/*.pdf # Only processes new files, skips duplicates
Duplicate Detection Features:
- File-based detection: Compares file location, modification time, and SHA256 hash
- Content-based detection: Compares extracted text content and metadata
- Smart similarity: Detects duplicates even with minor differences (5% tolerance)
- Performance optimized: Uses database indexes for fast duplicate lookups
Search
# Basic semantic search (default)
ragdoll search "machine learning concepts"
# Full-text search for exact keywords
ragdoll search "neural networks" --search-type fulltext
# Hybrid search combining semantic and full-text
ragdoll search "AI algorithms" --search-type hybrid
# Customize hybrid search weights
ragdoll search "deep learning" --search-type hybrid --semantic-weight 0.6 --text-weight 0.4
# Limit number of results
ragdoll search "AI algorithms" --limit 5
# Set similarity threshold
ragdoll search "machine learning" --threshold 0.8
# Different output formats
ragdoll search "deep learning" --format json
ragdoll search "AI" --format plain
ragdoll search "ML" --format table # default
Search Types
- Semantic Search (default): Uses AI embeddings to find conceptually similar content
- Full-text Search: Uses PostgreSQL text search for exact keyword matching
- Hybrid Search: Combines both semantic and full-text search with configurable weights
# Semantic search - best for concepts and meaning
ragdoll search "How do neural networks learn?" --search-type semantic
# Full-text search - best for exact terms
ragdoll search "backpropagation algorithm" --search-type fulltext
# Hybrid search - best comprehensive results
ragdoll search "transformer architecture" --search-type hybrid --semantic-weight 0.7 --text-weight 0.3
Document Operations
# List all documents
ragdoll list
# Limit number of documents shown
ragdoll list --limit 10
# Different output formats
ragdoll list --format json
ragdoll list --format plain
# Check document status
ragdoll status <id>
# Show detailed document information
ragdoll show <id>
ragdoll show <id> --format json
# Update document metadata
ragdoll update <id> --title "New Title"
# Delete a document
ragdoll delete <id>
ragdoll delete <id> --force # Bypass confirmation
# Show system statistics
ragdoll stats
Retrieval Utilities
# Get context for RAG applications
ragdoll context "<query>" --limit 5
# Enhance a prompt with context
ragdoll enhance "<prompt>" --context_limit 5
Utilities
# Show version information
ragdoll version
# Show help
ragdoll help
ragdoll help import # Help for specific command
# Check system health
ragdoll health
Configuration
The CLI uses a YAML configuration file located at ~/.ragdoll/config.yml
. You can customize various settings:
llm_provider: openai
embedding_provider: openai
embedding_model: text-embedding-3-small
chunk_size: 1000
chunk_overlap: 200
search_similarity_threshold: 0.7
max_search_results: 10
storage_backend: file
storage_config:
directory: "~/.ragdoll"
api_keys:
openai: your_key_here
anthropic: your_key_here
Environment Variables
API keys can be set via environment variables (recommended):
export OPENAI_API_KEY=your_key_here
export ANTHROPIC_API_KEY=your_key_here
export GOOGLE_API_KEY=your_key_here
export AZURE_OPENAI_API_KEY=your_key_here
export HUGGINGFACE_API_KEY=your_key_here
export OLLAMA_ENDPOINT=http://localhost:11434
Custom Configuration Location
export RAGDOLL_CONFIG=/path/to/custom/config.yml
Storage
Documents and embeddings are stored in a PostgreSQL database managed by the ragdoll-core
gem for production performance. Configuration and log files are stored locally in ~/.ragdoll/
:
-
~/.ragdoll/config.yml
- Configuration settings -
~/.ragdoll/ragdoll.log
- Log file (if configured)
Supported Document Types
-
PDF files (
.pdf
) - Extracts text and metadata -
Microsoft Word (
.docx
) - Extracts text, tables, and metadata -
Text files (
.txt
) - Plain text import -
Markdown (
.md
,.markdown
) - Markdown document import -
HTML (
.html
,.htm
) - Strips HTML tags and imports text
Examples
Import a directory of documentation
# Import all markdown files from a docs directory
ragdoll import "docs/**/*.md" --recursive
# Import mixed document types
ragdoll import "knowledge-base/*" --recursive
Search and get enhanced prompts
# Semantic search for concepts
ragdoll search "How to configure SSL certificates?"
# Full-text search for specific terms
ragdoll search "SSL certificate configuration" --search-type fulltext
# Hybrid search for comprehensive results
ragdoll search "database optimization techniques" --search-type hybrid
# Get detailed results with custom formatting
ragdoll search "performance tuning" --format plain --limit 3
# Search with custom similarity threshold
ragdoll search "security best practices" --threshold 0.75 --search-type semantic
Manage your knowledge base
# See what's in your knowledge base
ragdoll stats
ragdoll list --limit 20
# Check status of a specific document
ragdoll status 123
# Update document title
ragdoll update 123 --title "Updated Document Title"
# Delete a document
ragdoll delete 123
Integration with Other Tools
The CLI is designed to work well with other command-line tools:
# Search and pipe to jq for JSON processing
ragdoll search "API documentation" --format json | jq '.results[0].content'
# Import files found by find command
find ./docs -name "*.pdf" -exec ragdoll import {} \;
# Use with xargs for batch processing
ls *.md | xargs -I {} ragdoll import {}
Troubleshooting
Common Issues
-
No API key configured:
Error: Missing API key Solution: Set OPENAI_API_KEY environment variable or add to config
-
No documents found:
ragdoll stats # Check if documents are imported ragdoll list # See what documents exist
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/MadBomber/ragdoll-cli.
License
The gem is available as open source under the terms of the MIT License.