Project

lrama-fuzz

0.0
No release in over 3 years
Fuzz testing for lrama grammars
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
 Dependencies

Runtime

>= 0
 Project Readme

lrama-fuzz

Grammar-based fuzzer for lrama grammars. Generates random strings from any lrama grammar, with built-in profiles for fuzzing Ruby (via Prism or RubyVM) and JSON.

Installation

In your Gemfile:

gem "lrama-fuzz"

Or install directly:

gem install lrama-fuzz

Quick start

Ruby fuzzing (Prism)

require "lrama/fuzz"

session = Lrama::Fuzz.prism(ruby_src_dir: "/path/to/ruby", seed: 42)

# Generate a raw grammar derivation (may or may not be valid Ruby)
puts session.generate

# Generate a composed, valid Ruby program
puts session.compose

# Evolve programs over 10 generations, optimizing for complexity
best = session.evolve(10, population_size: 50)
best.each { |code, fitness| puts "#{fitness.round(2)}: #{code}" }

# Check grammar rule coverage
session.generate_full_coverage(max_attempts: 500)
cov = session.coverage
puts "#{cov.covered_count}/#{cov.total_count} rules (#{(cov.ratio * 100).round(1)}%)"

Ruby fuzzing (RubyVM)

Uses RubyVM::InstructionSequence.compile_parsey for validation instead of Prism. This tests the lrama-generated parser directly.

session = Lrama::Fuzz.rubyvm(ruby_src_dir: "/path/to/ruby", seed: 42)
puts session.compose

JSON fuzzing

Uses the JSON grammar from examples/json.y -- no external files needed.

session = Lrama::Fuzz.json(seed: 42)

puts session.generate                          # raw derivation
puts session.generate_valid(max_retries: 50)   # valid JSON document

Coverage-guided generation

Generate programs with validity feedback. The generator tracks which grammar rules have appeared in valid programs and biases future generation toward rules that haven't been tested in valid contexts yet.

session = Lrama::Fuzz.json(seed: 42)

# Generate 200 programs with automatic feedback
valid_programs = session.generate_guided(count: 200)
puts "#{valid_programs.size} valid out of 200"

# Check valid coverage (rules seen in valid programs)
cov = session.coverage
puts "Raw coverage:   #{(cov.ratio * 100).round(1)}%"
puts "Valid coverage: #{(cov.valid_ratio * 100).round(1)}%"

# Use the block form for per-program handling
session.generate_guided(count: 100) do |code, valid|
  File.write("corpus/#{Time.now.to_f}.json", code) if valid
end

The generator also uses valid coverage in its rule selection: after all rules have been expanded at least once, it prefers rules that haven't yet appeared in any valid program. This drives generation toward under-tested parts of the grammar.

Shrinking

Minimize a failing input to the smallest version that still triggers the bug. Uses delta debugging (line-level, then character-level).

# Standalone
small = Lrama::Fuzz::Shrinker.shrink(big_program) { |code| crashes?(code) }

# Via session
small = session.shrink(big_program) { |code| crashes?(code) }

CLI

$ lrama-fuzz --help
Usage: lrama-fuzz [options]

Generates programs from lrama grammars.

    --profile PROFILE            Profile: prism, rubyvm, json (default: prism)
    -d, --ruby-src-dir DIR       Path to Ruby source (default: $RUBY_SRC_DIR)
        --grammar-path PATH      Path to grammar file (json only)
    -n, --count N                Number of programs to generate (default: 10)
    -m, --mode MODE              Mode: generate, compose, evolve, coverage (default: generate)
    -g, --generations N          Generations for evolve mode (default: 10)
    -p, --population N           Population size for evolve mode (default: 50)
    -s, --seed N                 Random seed for reproducibility
    -h, --help                   Show this help

Examples:

# Generate 5 composed Ruby programs
RUBY_SRC_DIR=/path/to/ruby lrama-fuzz -m compose -n 5

# Generate valid JSON
lrama-fuzz --profile json -m generate -n 10

# Evolve Ruby programs for 20 generations
RUBY_SRC_DIR=/path/to/ruby lrama-fuzz -m evolve -g 20 -p 30

# Measure grammar rule coverage
RUBY_SRC_DIR=/path/to/ruby lrama-fuzz -m coverage -n 500

Custom grammars

You can fuzz any lrama grammar by using the core API directly.

require "lrama/fuzz"

# Parse a grammar
grammar = Lrama::Fuzz.parse("path/to/grammar.y")

# Define terminal generators -- each token name maps to a string or proc
terminals = {
  "NUMBER" => -> { rand(1..100).to_s },
  "STRING" => -> { %w[foo bar baz].sample }
}

# Create a generator
generator = Lrama::Fuzz::Generator.new(
  grammar,
  terminals: terminals,
  max_depth: 10,       # depth limit for derivation (default: 10)
  random: Random.new(42)
)

# Generate strings
10.times { puts generator.generate }

# Generate strings that pass a validator
valid = generator.generate_valid(max_retries: 100) { |s| valid?(s) }

# Track coverage
generator.generate_full_coverage(max_attempts: 500)
puts generator.coverage.ratio

Wrapping in a Session

For access to composition, evolution, and shrinking, wrap a generator in a Session:

session = Lrama::Fuzz::Session.new(
  generator,
  fitness: ->(code) { code.length > 10 ? 1.5 : 0.3 },
  validator: ->(code) { code.length > 0 },
  random: Random.new(42)
)

session.generate                                    # raw derivation
session.generate_valid                              # passes validator
session.evolve(10, population_size: 20)             # evolutionary optimization
session.shrink(code) { |c| some_predicate?(c) }     # delta debugging

Architecture

Lrama::Fuzz
  .prism(ruby_src_dir:, seed:)       # -> Session (Prism profile)
  .rubyvm(ruby_src_dir:, seed:)      # -> Session (RubyVM profile)
  .json(seed:)                       # -> Session (JSON profile)
  .parse(path)                       # -> Grammar
  .new(path, terminals:, **opts)     # -> Generator

Session                # unified interface
  #generate            # raw grammar derivation
  #generate_valid      # derivation that passes validator
  #generate_guided     # generate with validity feedback loop
  #compose             # template-composed valid program (Ruby only)
  #evolve(n)           # evolutionary optimization
  #shrink(code, &pred) # delta debugging minimizer
  #coverage            # grammar rule coverage tracker

Generator              # core derivation engine
  #generate            # random derivation from start symbol
  #generate_valid      # retry until validator passes
  #record_result       # feed back validity for coverage guidance
  #generate_full_coverage  # target uncovered rules

Coverage               # grammar rule coverage tracking
  #ratio               # raw coverage (rules expanded / reachable)
  #valid_ratio         # valid coverage (rules in valid programs / reachable)
  #uncovered_valid_rules  # rules not yet seen in valid programs

Profiles (provide fitness, validator, session factory):
  Prism                # validates with ::Prism.parse
  RubyVM               # validates with ::RubyVM::InstructionSequence.compile_parsey
  Json                 # validates with JSON.parse

Ruby                   # shared Ruby grammar infrastructure (classifier, composer)
Shrinker               # delta debugging minimizer
Joiner                 # token spacing/joining
Evolver                # genome-based evolutionary optimization
ComposedEvolver        # evolutionary optimization with composer