Project

qualspec

0.0
The project is in a healthy, maintained state
Define qualitative evaluation criteria and let an LLM judge if responses pass. Perfect for testing AI agents, comparing models, and evaluating subjective qualities.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
 Dependencies

Development

~> 6.0
~> 3.0

Runtime

~> 2.0
 Project Readme

Qualspec

LLM-judged qualitative testing for Ruby. Evaluate AI agents, compare models, and test subjective qualities that traditional assertions can't capture.

Installation

gem "qualspec"

Configuration

Set your API key (required):

export QUALSPEC_API_KEY=your_openrouter_key

Environment Variables

Variable Description Default
QUALSPEC_API_KEY API key (required) -
QUALSPEC_API_URL API endpoint https://openrouter.ai/api/v1
QUALSPEC_MODEL Default model for candidates google/gemini-3-flash-preview
QUALSPEC_JUDGE_MODEL Model used as judge Same as QUALSPEC_MODEL

Quick Start

Compare Models (CLI)

# eval/comparison.rb
Qualspec.evaluation "Model Comparison" do
  candidates do
    candidate "gpt4", model: "openai/gpt-4"
    candidate "claude", model: "anthropic/claude-3-sonnet"
  end

  scenario "helpfulness" do
    prompt "How do I center a div in CSS?"
    eval "provides a working solution"
    eval "explains the approach"
  end
end
# Run comparison
qualspec eval/comparison.rb

# Generate HTML report
qualspec --html report.html eval/comparison.rb

Test Your Agent (RSpec)

require "qualspec/rspec"

RSpec.describe MyAgent do
  include Qualspec::RSpec::Helpers

  it "responds helpfully" do
    response = MyAgent.call("Hello")

    result = qualspec_evaluate(response, "responds in a friendly manner")
    expect(result).to be_passing
  end
end

Documentation

License

MIT