Project

rzstd

0.0
No release in over 3 years
Ruby bindings (via Rust/magnus) for the Zstandard compressor with persistent ZSTD_CCtx / ZSTD_DCtx contexts that are reused across calls. Provides Zstd frame compress/decompress at module level and a stateful Dictionary class for dict-bound compression. Designed to be safe to call from multiple Ractors and competitive with rlz4 on small messages, where per-call context allocation in zstd-ruby dominates the cost.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
 Dependencies

Development

~> 5.0
~> 13.0

Runtime

~> 0.9
 Project Readme

rzstd

Gem Version License: MIT Ruby Rust

Ractor-safe Zstandard bindings for Ruby with persistent contexts.

rzstd provides Zstd frame compress/decompress at module level and a stateful Dictionary class for dict-bound compression. Internally it holds onto ZSTD_CCtx / ZSTD_DCtx state across calls instead of allocating fresh ~256 KB contexts every time, which is what makes it viable for small-message workloads where the upstream zstd-ruby gem loses to LZ4 purely on context-allocation overhead.

API mirrors rlz4 0.2.x:

require "rzstd"

# Module-level frame compression
ct = RZstd.compress("the quick brown fox", level: 3)  # level: kwarg, default 3
RZstd.decompress(ct)                                  # => "the quick brown fox"

# Negative levels enable Zstd's fast strategy (trades ratio for speed).
# Supported range: -131072..22. Typical useful range: -7..19.
RZstd.compress(payload, level: -3)                    # fast strategy, low ratio
RZstd.compress(payload, level: 19)                    # high ratio, slow

# Dict-bound compression
dict = RZstd::Dictionary.new(File.binread("schema.dict"), level: -3)
dict.id                                               # => u32 Dict_ID
dict.size                                             # => byte length
dict.compress("payload that shares the schema")
dict.decompress(ct)

# Dictionary training from sample payloads (wraps ZDICT_trainFromBuffer).
# Gather representative messages, then train a dictionary once and reuse
# it on both peers. Small-message workloads benefit the most.
samples = 1000.times.map { generate_sample_message }
dict_bytes = RZstd::Dictionary.train(samples, capacity: 64 * 1024)
dict = RZstd::Dictionary.new(dict_bytes)

Dictionary IDs

Dictionary#id returns a u32 following the Zstandard spec's Dictionary_ID semantics:

  • ZDICT-format dicts (the output of Dictionary.train, or any bytes starting with the ZDICT magic 0xEC30A437 LE): the id is read straight out of header bytes [4..7]. This is the same id zstd writes into every compressed frame header via ZSTD_c_dictIDFlag (on by default), so Dictionary#id and the on-wire frame Dictionary_ID always agree. Receivers can therefore route incoming frames to the right dictionary purely by parsing the frame header — no side channel required.
  • Raw-content dicts (opaque bytes with no ZDICT header): the spec requires the on-wire frame Dictionary_ID to be 0, so rzstd synthesises a local id from sha256(bytes) mapped into the public range 32_768..(2**31 - 1) — avoiding both reserved ranges (0..32_767, reserved for a future registrar, and >= 2**31). This id is useful as an in-process handle; it is not on the wire, so peers that need to agree on raw-content dicts must share them out-of-band.

Public constants RZstd::Dictionary::USER_DICT_ID_MIN / USER_DICT_ID_MAX / USER_DICT_ID_SIZE expose the private range for callers that generate their own ids.

Wrong-dict decoding is caught by the content checksum the encoder enables — a peer using the wrong dictionary raises RZstd::DecompressError instead of returning corrupt bytes.

Ractor safety

The extension is marked Ractor-safe. Dictionary instances are shareable. Module-level RZstd.compress / RZstd.decompress use a single global CCtx / DCtx behind a Mutex, which serializes calls across Ractors — if you need parallel throughput, give each Ractor its own Dictionary (each one owns its own per-instance contexts).