Serialbench: Ruby serialization library performance benchmarker

Overview

Serialbench is a comprehensive benchmarking suite that evaluates the performance of popular Ruby serialization libraries across multiple formats. It provides detailed performance comparisons and analysis to help developers make informed decisions when choosing serialization libraries for their Ruby applications.

Supported Formats: XML, JSON, YAML, TOML, and more

Key Metrics: Parsing speed, generation speed, memory usage, streaming capabilities, and feature completeness

Multi-Environment Support: Docker and ASDF-based multi-Ruby version benchmarking with automated result aggregation and HTML site generation

Supported serialization libraries

Format	Name	Version	Description
XML	Ox	v2.14.23	C extension XML parser
XML	LibXML	v4.1.2	Ruby bindings for libxml2
XML	Nokogiri	v1.18.8	XML/HTML parser with XPath and CSS selectors
XML	Oga	v3.4	Pure Ruby XML parser with XPath support
XML	REXML	v3.4.1	Ruby’s standard library XML parser
JSON	Oj	v3.16.11	JSON parser with multiple parsing modes
JSON	YAJL	v1.4.3	JSON library with streaming capabilities
JSON	JSON	v2.12.2	Ruby’s standard library JSON parser
YAML	Psych	v5.1.2	Ruby’s standard library YAML parser
YAML	Syck	v1.5.1.1	Legacy YAML parser
TOML	Tomlib	v0.7.3	TOML parser implemented in C
TOML	TOML-RB	v2.2.0	Pure Ruby TOML parser
TOML	tomlrb	v2.0.3	A Racc based TOML Ruby parser (Only supports parsing, no support for dumping/writing.)

Data formats and schema

Serialbench generates structured YAML output for benchmark results, with different formats for single-environment and multi-environment runs.

The data formats include:

Single benchmark results: Individual benchmark run output
Result set data structure: Multi-platform benchmark aggregation
JSON schema specification: Complete schema validation rules
Configuration file formats: Docker and ASDF configuration examples

Prerequisites

System requirements

Ruby: 3.0 or later (3.3+ recommended for best performance)
Operating system: Linux, macOS, or Windows
Architecture: x86_64 or ARM64

Library dependencies

System dependencies (required for some native extensions):

# macOS with Homebrew
$ brew install libxml2 libxslt

# Ubuntu/Debian
$ sudo apt-get install libxml2-dev libxslt1-dev build-essential

# CentOS/RHEL/Fedora
$ sudo yum install libxml2-devel libxslt-devel gcc gcc-c++

GitHub API token (optional but recommended)

Serialbench fetches Ruby build definitions from the GitHub API. While this works without authentication, GitHub imposes strict rate limits on unauthenticated requests (60 requests per hour). For better reliability, especially in CI/CD environments or when running multiple benchmarks, set the GITHUB_TOKEN environment variable.

Rate limits:

Without token: 60 requests per hour
With token: 1,000 requests per hour

Setting the token:

# For local usage
export GITHUB_TOKEN=ghp_your_token_here

# For GitHub Actions (automatic)
# GitHub Actions automatically provides ${{ secrets.GITHUB_TOKEN }}
# No manual configuration needed in workflows

Creating a token:

Go to https://github.com/settings/tokens
Click "Generate new token (classic)"
Select scope: public_repo (for accessing public repositories)
Copy the generated token
Store it securely in your environment

Note	The token is optional. Serialbench will work without it but may hit rate limits when updating Ruby build definitions frequently.

Installation

Add this line to your application’s Gemfile:

gem 'serialbench'

And then execute:

$ bundle install

Or install it yourself as:

$ gem install serialbench

Command line interface

Serialbench provides a comprehensive Thor-based CLI with four main subcommands for managing environments, benchmarks, result sets, and Ruby builds.

Main Commands Overview

$ serialbench
Serialbench - Benchmarking Framework for Ruby Serialization Libraries

USAGE:
  serialbench COMMAND [SUBCOMMAND] [OPTIONS]

COMMANDS:
  environment   Manage benchmark environments (Docker, ASDF, Local)
  benchmark     Manage individual benchmark runs
  resultset     Manage benchmark resultsets (collections of runs)
  ruby-build    Manage Ruby-Build definitions for validation
  version       Show version information
  help          Show this help message

EXAMPLES:
  # Create a Docker environment
  serialbench environment new docker-test docker

  # Run multi-environment benchmarks
  serialbench environment multi-execute asdf --config=serialbench-asdf.yml
  serialbench environment multi-execute docker --config=serialbench-docker.yml

  # Create and execute a benchmark
  serialbench benchmark create my-benchmark
  serialbench benchmark execute my-benchmark.yml

  # Create a result set for comparison
  serialbench resultset create comparison-set
  serialbench resultset add-result comparison-set results/my-benchmark

  # Generate static sites
  serialbench benchmark build-site results/my-benchmark
  serialbench resultset build-site resultsets/comparison-set

Environment management

The environment subcommand manages environment configurations and executes benchmarks across different Ruby environments.

$ serialbench environment help
Commands:
  serialbench environment execute ENVIRONMENT_CONFIG BENCHMARK_CONFIG RESULT_PATH  # Execute benchmark in environment
  serialbench environment help [COMMAND]                                           # Describe subcommands or one specific subcommand
  serialbench environment new NAME KIND RUBY_BUILD_TAG                             # Create a new environment configuration
  serialbench environment prepare ENVIRONMENT_CONFIG                               # Prepare environment for benchmarking

Benchmark management

The benchmark subcommand handles individual benchmark runs and site generation.

$ serialbench benchmark help
Commands:
  serialbench benchmark _docker_execute ENVIRONMENT_CONFIG_PATH BENCHMARK_CONFIG_PATH  # (Private) Execute a benchmark run
  serialbench benchmark build-site RUN_PATH [OUTPUT_DIR]                               # Generate HTML site for a run
  serialbench benchmark create [NAME]                                                  # Generate a run configuration file
  serialbench benchmark execute ENVIRONMENT_CONFIG_PATH BENCHMARK_CONFIG_PATH          # Execute a benchmark run
  serialbench benchmark help [COMMAND]                                                 # Describe subcommands or one specific subcommand
  serialbench benchmark list                                                           # List all available runs

The _docker_execute command is a private command used internally by the execute command to run benchmarks in Docker environments.

Result set management

The resultset subcommand manages collections of benchmark runs for comparison analysis.

$ serialbench resultset help
Commands:
  serialbench resultset add-result RESULT_PATH RESULTSET_PATH     # Add a run to a resultset
  serialbench resultset build-site RESULTSET_PATH [OUTPUT_DIR]    # Generate HTML site for a resultset
  serialbench resultset create NAME PATH                          # Create a new resultset
  serialbench resultset help [COMMAND]                            # Describe subcommands or one specific subcommand
  serialbench resultset list                                      # List all available resultsets
  serialbench resultset remove-result RESULTSET_PATH RESULT_PATH  # Remove a run from a resultset

ruby-build management

The ruby-build subcommand manages Ruby build definitions and version information.

Serialbench uses ruby-build definitions of Ruby interpreter types and versions for identification.

$ serialbench ruby-build help
Commands:
  serialbench ruby_build cache-info      # Show information about the Ruby-Build definitions cache
  serialbench ruby_build help [COMMAND]  # Describe subcommands or one specific subcommand
  serialbench ruby_build list [FILTER]   # List available Ruby-Build definitions
  serialbench ruby_build show TAG        # Show details for a specific Ruby-Build definition
  serialbench ruby_build suggest         # Suggest Ruby-Build tag for current Ruby version
  serialbench ruby_build update          # Update Ruby-Build definitions from GitHub
  serialbench ruby_build validate TAG    # Validate a Ruby-Build tag

Workflow examples

Docker-based testing

Note	This works.

# 1. Prepare Docker environment
$ bundle exec serialbench environment prepare config/environments/docker-ruby-3.1.yml

# 2. Run benchmark
$ bundle exec serialbench environment execute config/environments/docker-ruby-3.1.yml config/benchmarks/short.yml results/runs/docker-ruby-3.1-results

# 3. Create a resultset
$ bundle exec serialbench resultset create docker-comparison results/sets/docker-comparison

# 3a. (Optional) Build the site from the result if you want to visualize results
$ bundle exec serialbench benchmark build-site results/runs/docker-ruby-3.1-results/ --output_dir=_site_result

# 4. Add the result to the resultset
$ bundle exec serialbench resultset add-result results/sets/docker-comparison/ results/runs/docker-ruby-3.1-results/

# 5. Build the site from the resultset
$ bundle exec serialbench resultset build-site results/sets/docker-comparison/

# 6. Open the generated site
$ open _site/index.html

ASDF-based testing

Warning

THIS IS NOT YET WORKING.

# 1. Validate configuration
$ bundle exec serialbench benchmark validate serialbench-asdf.yml

# 2. Prepare Ruby environments
$ bundle exec serialbench benchmark prepare asdf --config=serialbench-asdf.yml

# 3. Run benchmarks across all Ruby versions
$ bundle exec serialbench benchmark execute asdf --config=serialbench-asdf.yml

# 4. Results are automatically merged and dashboard generated
$ open asdf-results/_site/index.html

Configuration Files

Environment configuration

Environment configuration files define how benchmarks are executed in different runtime environments.

Environment configuration for Docker (config/environments/docker-ruby-3.4.yml)

---
name: docker-ruby-3.4
kind: docker
created_at: '2025-06-13T15:18:43+08:00'
ruby_build_tag: "3.4.1"
description: Docker environment for Ruby 3.4 benchmarks
docker:
  image: 'ruby:3.4-slim'
  dockerfile: '../../docker/Dockerfile.ubuntu'

Environment configuration for ASDF (config/environments/asdf-ruby-3.3.yml)

---
name: ruby-332-asdf
kind: asdf
created_at: '2025-06-12T22:53:24+08:00'
ruby_build_tag: 3.3.2
description: ASDF environment
asdf:
  auto_install: true

Benchmark configuration

Benchmark configuration files control what tests to run and how to run them.

Short configuration (CI-friendly) (config/benchmarks/short.yml)

name: short-benchmark

data_sizes:
- small

formats:
- xml
- json
- yaml
- toml

iterations:
  small: 5
  medium: 2
  large: 1

operations:
- parse
- generate
- streaming

warmup: 2

Full configuration (Comprehensive) (config/benchmarks/full.yml)

name: full-benchmark

data_sizes:
- small
- medium
- large

formats:
- xml
- json
- yaml
- toml

iterations:
  small: 20
  medium: 5
  large: 2

operations:
- parse
- generate
- streaming
- memory

warmup: 3

Results structure

Individual run results

Results are stored in a structured directory format, with each run containing raw benchmark data and execution logs.

The directory is located at results/runs/{name}/, where {name} is the name of the environment used for the benchmark.

results/runs/docker-ruby-33-results/
├── results.yaml                    # Raw benchmark data
└── benchmark.log                   # Execution log

ResultSet structure

ResultSets aggregate multiple benchmark runs for comparison. They are stored in a structured directory format at results/sets/{name}/, where {name} is the name of the result set.

results/sets/ruby-version-comparison/
└── resultset.yml                  # Result set configuration

Benchmark categories

Parsing performance

Measures the time required to parse serialized data into Ruby objects.

Small files: ~1KB configuration-style documents
Medium files: ~1MB API responses with 1,000 records
Large files: ~10MB data exports with 10,000 records

Generation performance

Tests how quickly libraries can convert Ruby objects into serialized strings.

Streaming performance

Evaluates streaming event-based parsing performance for libraries that support it, which processes data sequentially and is memory-efficient for large files.

Memory usage analysis

Profiles memory allocation and retention during serialization operations using the memory_profiler gem.

Interactive Dashboard Features

The generated HTML sites provide comprehensive interactive dashboards with:

Format tabs: Dedicated views for XML, JSON, YAML, and TOML
Operation sections: Parsing, generation, streaming, and memory usage
Dynamic filtering: Platform, Ruby version, and environment selection
Real-time updates: Charts update instantly based on filter selections

Visualization Capabilities

Chart.js integration: Interactive performance charts with hover details
Multi-scale handling: Automatic Y-axis scaling for different performance ranges
Color-coded data: Consistent color schemes across serializers and environments
Responsive design: Optimized for desktop and mobile viewing

User Experience

Theme toggle: Light and dark mode with persistent preferences
Keyboard navigation: Full accessibility support
Fast loading: Optimized JavaScript for quick dashboard initialization
Export capabilities: JSON data export for further analysis

Development

Running Tests

$ bundle exec rake
$ bundle exec rspec

Adding a new serializers

To add support for additional serialization libraries:

Create a new serializer class in lib/serialbench/serializers/{format}/
Inherit from the appropriate base class (BaseXmlSerializer, BaseJsonSerializer, etc.)
Implement the required methods: parse, generate, name, version
Add the serializer to the registry in lib/serialbench/serializers.rb
Update documentation and tests

Contributing

Fork the repository
Create your feature branch (git checkout -b feature/my-new-feature)
Commit your changes (git commit -am 'Add some feature')
Push to the branch (git push origin feature/my-new-feature)
Create a new Pull Request

Known issues

Syck YAML serializer segmentation faults

The Syck YAML serializer at version 1.5+ is known to cause segmentation faults on Ruby 3.1 and later versions. Serialbench automatically detects this problematic configuration and:

Displays a warning message when Syck is detected on Ruby 3.1+
Skips Syck benchmarks to prevent crashes
Continues with other YAML serializers (Psych)

Syck overrides YAML constant

On occasion after Syck is loaded, the constant YAML may be redefined to Syck, which can cause issues in other parts of the codebase. This can cause YAML output to fail when using libraries that expect YAML to have the Psych API.

In benchmark_cli.rb there is therefore such code to ensure that YAML is defined as Psych when writing to file is needed:

# Restore YAML to use Psych for output, otherwise lutaml-model's to_yaml
# will have no output
Object.const_set(:YAML, Psych)

License and copyright

This gem is developed, maintained and funded by Ribose

The gem is available as open source under the terms of the 2-Clause BSD License.

serialbench

Runtime