Project

rlz4

0.0
No release in over 3 years
Ruby bindings (via Rust/magnus) for the lz4_flex LZ4 implementation. Provides LZ4 frame-format compress/decompress at module level and a stateful Dictionary class for block-format compression with a shared dictionary. Designed to be safe to call from multiple Ractors, unlike existing Ruby LZ4 gems.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
 Dependencies

Development

~> 5.0
~> 13.0

Runtime

~> 0.9
 Project Readme

rlz4

Gem Version License: MIT Ruby Rust

Ractor-safe LZ4 bindings for Ruby, built as a Rust extension on top of lz4_flex via magnus.

Why?

The existing Ruby LZ4 gems are broken under Ractor:

rlz4 marks the extension Ractor-safe at load time and uses only owned, thread-safe state, so it can be called from any Ractor.

Install

# Gemfile
gem "rlz4"

Building requires a Rust toolchain (stable).

Usage

Frame format (module functions)

require "rlz4"

compressed   = RLZ4.compress("hello world" * 100)
decompressed = RLZ4.decompress(compressed)

# Wire format is standard LZ4 frame (magic number 04 22 4D 18),
# interoperable with any other LZ4 frame implementation.

Invalid input raises RLZ4::DecompressError (a StandardError subclass):

begin
  RLZ4.decompress("not a valid lz4 frame")
rescue RLZ4::DecompressError => e
  warn e.message
end

Dictionary compression

For workloads where many small messages share a common prefix (e.g. ZMQ messages with a fixed header), a shared dictionary massively improves the compression ratio. RLZ4::Dictionary#compress emits a real LZ4 frame (magic 04 22 4D 18) with the FLG.DictID bit set and the dictionary's Dict_ID written into the FrameDescriptor — interoperable with the reference lz4 CLI given the same dictionary file (lz4 -d -D dict.bin).

dict = RLZ4::Dictionary.new("schema=v1 type=message field1=")

compressed   = dict.compress("schema=v1 type=message field1=payload")
decompressed = dict.decompress(compressed)

dict.size  # => 30
dict.id    # => u32 Dict_ID

RLZ4::Dictionary is immutable after construction and can be shared across Ractors.

Dictionary IDs

Dictionary#id is a u32 derived from sha256(dict_bytes)[0..4] interpreted little-endian. The LZ4 frame spec defines Dict_ID as an application-defined field with no reserved ranges and no central registrar, so the full u32 space is usable.

The id is on the wire: Dictionary#compress sets FLG.DictID = 1 and writes the id into the FrameDescriptor. On decode, rlz4 parses the incoming frame's Dict_ID and asserts it matches Dictionary#id before touching the payload. Receivers that maintain multiple dictionaries can therefore route incoming frames to the right one purely by parsing the frame header — no out-of-band id channel needed.

LZ4 dictionaries are always raw bytes (unlike Zstd, there is no dict-file header format), so there is no header to parse an id out of. If you need sender and receiver to agree on an id without shipping it out-of-band, deriving it deterministically from the dict bytes — which is what Dictionary.new does — is the simplest option.

Dictionary training from a sample corpus is not supported: LZ4 has no equivalent of Zstd's ZDICT_trainFromBuffer. Dictionaries are supplied by the caller as raw bytes (typically a hand-picked prefix or a representative message).

Ractors

Both the module functions and RLZ4::Dictionary can be used from any Ractor. Example from the test suite:

ractors = 4.times.map do |i|
  Ractor.new(i) do |idx|
    pt = "ractor #{idx} payload " * 1000
    1000.times do
      ct = RLZ4.compress(pt)
      raise "mismatch" unless RLZ4.decompress(ct) == pt
    end
    :ok
  end
end
ractors.map(&:value) # => [:ok, :ok, :ok, :ok]

Non-goals

  • High-compression mode (LZ4_HC).
  • Streaming / chunked compression.
  • Preservation of string encoding on decompress (output is always binary).

License

MIT