Qualspec
LLM-judged qualitative testing for Ruby. Evaluate AI agents, compare models, and test subjective qualities that traditional assertions can't capture.
Installation
gem "qualspec"Configuration
Set your API key (required):
export QUALSPEC_API_KEY=your_openrouter_keyEnvironment Variables
| Variable | Description | Default |
|---|---|---|
QUALSPEC_API_KEY |
API key (required) | - |
QUALSPEC_API_URL |
API endpoint | https://openrouter.ai/api/v1 |
QUALSPEC_MODEL |
Default model for candidates | google/gemini-3-flash-preview |
QUALSPEC_JUDGE_MODEL |
Model used as judge | Same as QUALSPEC_MODEL
|
Quick Start
Compare Models (CLI)
# eval/comparison.rb
Qualspec.evaluation "Model Comparison" do
candidates do
candidate "gpt4", model: "openai/gpt-4"
candidate "claude", model: "anthropic/claude-3-sonnet"
end
scenario "helpfulness" do
prompt "How do I center a div in CSS?"
eval "provides a working solution"
eval "explains the approach"
end
end# Run comparison
qualspec eval/comparison.rb
# Generate HTML report
qualspec --html report.html eval/comparison.rbTest Your Agent (RSpec)
require "qualspec/rspec"
RSpec.describe MyAgent do
include Qualspec::RSpec::Helpers
it "responds helpfully" do
response = MyAgent.call("Hello")
result = qualspec_evaluate(response, "responds in a friendly manner")
expect(result).to be_passing
end
endDocumentation
- Getting Started
- Evaluation Suites - CLI for model comparison
- RSpec Integration - Testing your agents
- Rubrics - Builtin and custom evaluation criteria
- Configuration - All options
- Recording - VCR integration
License
MIT