Project

unibuf

0.0
The project is in a healthy, maintained state
A pure Ruby gem for parsing Protocol Buffers text format (textproto/txtpb) and FlatBuffers schema definitions with rich domain models
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
 Dependencies

Runtime

~> 2.5
~> 2.0
~> 1.4
 Project Readme

Unibuf: Universal Buffer Format Parser

Gem Version License Build Status

Purpose

Unibuf is a pure Ruby gem for parsing and manipulating multiple serialization formats including Protocol Buffers, FlatBuffers, and Cap’n Proto.

It provides fully object-oriented, specification-compliant parsers with rich domain models, comprehensive schema validation, binary format encoding/decoding, and complete round-trip serialization support.

Key features:

  • Protocol Buffers

    • Parse text format (.txtpb, .textproto)

    • Parse binary format (.binpb) with schema

    • Serialize to binary format (.binpb)

    • Parse Proto3 schemas (.proto)

    • Wire format encoding/decoding (varint, zigzag, all wire types)

  • FlatBuffers

    • Parse schemas (.fbs)

    • Parse binary format (.fb)

    • Serialize to binary format (.fb)

  • Cap’n Proto

    • Parse schemas (.capnp)

    • Parse binary format with segment management

    • Serialize to binary format with pointer encoding

    • Support for structs, enums, interfaces (RPC)

    • Generic types (List<T>)

    • Unions and annotations

  • Serialization and validation

    • Complete round-trip serialization for all formats

    • Schema-driven validation and deserialization

  • Developer usage

    • Rich domain models with 60+ behavioral classes

    • Complete CLI toolkit for all formats

    • Pure Ruby - no C/C++ dependencies

Installation

Add this line to your application’s Gemfile:

gem "unibuf"

And then execute:

bundle install

Or install it yourself as:

gem install unibuf

Features

  • Protocol Buffers support

  • FlatBuffers support

  • Cap’n Proto support

  • Schema-required design

  • Parsing text format

  • Parsing binary format

  • Schema-based validation

  • Wire format support

  • Round-trip serialization

  • Rich domain models

  • Command-line tools

Protocol Buffers

General

Full support for Protocol Buffers (protobuf) including text format parsing, binary format parsing/serialization, and Proto3 schema parsing.

See PROTOBUF.adoc for detailed documentation.

Parsing Protocol Buffers text format

require "unibuf"

# Load schema (recommended for validation)
schema = Unibuf.parse_schema("schema.proto")  # (1)

# Parse text format file
message = Unibuf.parse_textproto_file("data.txtpb")  # (2)

# Validate against schema
validator = Unibuf::Validators::SchemaValidator.new(schema)  # (3)
validator.validate!(message, "MessageType")  # (4)
  1. Load Proto3 schema from .proto file

  2. Parse Protocol Buffers text format

  3. Create validator with schema

  4. Validate message against schema

Parsing Protocol Buffers binary format

require "unibuf"

# 1. Load schema (REQUIRED for binary)
schema = Unibuf.parse_schema("schema.proto")  # (1)

# 2. Parse binary Protocol Buffer file
message = Unibuf.parse_binary_file("data.binpb", schema: schema)  # (2)

# 3. Access fields normally
puts message.find_field("name").value  # (3)
  1. Schema is mandatory for binary parsing

  2. Parse binary file with schema

  3. Access fields like text format

FlatBuffers

General

Complete support for Google FlatBuffers including schema parsing (.fbs files) and binary format parsing/serialization.

See FLATBUFFERS.adoc for detailed documentation.

Parsing FlatBuffers schema

require "unibuf"

# Parse FlatBuffers schema
schema = Unibuf.parse_flatbuffers_schema("schema.fbs")  # (1)

# Access schema structure
table = schema.find_table("Monster")  # (2)
table.fields.each { |f| puts "#{f.name}: #{f.type}" }  # (3)
  1. Parse .fbs schema file

  2. Find table definition

  3. Iterate through fields

Parsing FlatBuffers binary format

# Parse binary FlatBuffer
data = Unibuf.parse_flatbuffers_binary(binary_data, schema: schema)  # (1)

# Access data
puts data["name"]  # (2)
puts data["hp"]  # (3)
  1. Parse binary with schema

  2. Access string field

  3. Access numeric field

Cap’n Proto

General

Complete support for Cap’n Proto including schema parsing (.capnp files) and binary format parsing/serialization with segment management and pointer encoding.

See CAPNPROTO.adoc for detailed documentation.

Parsing Cap’n Proto schema

require "unibuf"

# Parse Cap'n Proto schema
schema = Unibuf.parse_capnproto_schema("addressbook.capnp")  # (1)

# Access schema structure
person = schema.find_struct("Person")  # (2)
person.fields.each { |f| puts "#{f.name} @#{f.ordinal} :#{f.type}" }  # (3)

# Access interfaces (RPC)
calc = schema.find_interface("Calculator")  # (4)
calc.methods.each { |m| puts "#{m.name} @#{m.ordinal}" }  # (5)
  1. Parse .capnp schema file

  2. Find struct definition

  3. Iterate through fields with ordinals

  4. Find interface definition (RPC)

  5. List RPC methods

Parsing Cap’n Proto binary format

# Parse binary Cap'n Proto data
parser = Unibuf::Parsers::Capnproto::BinaryParser.new(schema)  # (1)
data = parser.parse(binary_data, root_type: "Person")  # (2)

# Access data
puts data[:name]  # (3)
puts data[:email]  # (4)
  1. Create parser with schema

  2. Parse binary with root type

  3. Access text field

  4. Access another field

Serializing Cap’n Proto binary format

# Serialize to binary
serializer = Unibuf::Serializers::Capnproto::BinarySerializer.new(schema)  # (1)
binary = serializer.serialize(
  { id: 1, name: "Alice", email: "alice@example.com" },  # (2)
  root_type: "Person"  # (3)
)

# Write to file
File.binwrite("output.capnp.bin", binary)  # (4)
  1. Create serializer with schema

  2. Provide data as hash

  3. Specify root struct type

  4. Write binary output

Protocol Buffers text format

General

Parse human-readable Protocol Buffer text format files following the official specification.

See TXTPROTO.adoc for detailed documentation.

Parsing text format

require "unibuf"

# Load schema (recommended for validation)
schema = Unibuf.parse_schema("schema.proto")  # (1)

# Parse text format file
message = Unibuf.parse_textproto_file("data.txtpb")  # (2)

# Validate against schema
validator = Unibuf::Validators::SchemaValidator.new(schema)  # (3)
validator.validate!(message, "MessageType")  # (4)
  1. Load Proto3 schema from .proto file

  2. Parse Protocol Buffers text format

  3. Create validator with schema

  4. Validate message against schema

Parsing Protocol Buffers binary format

General

Parse binary Protocol Buffer data using wire format decoding with schema-driven deserialization.

The schema is REQUIRED for binary parsing because binary format only stores field numbers, not names or types.

Parsing binary format

require "unibuf"

# 1. Load schema (REQUIRED for binary)
schema = Unibuf.parse_schema("schema.proto")  # (1)

# 2. Parse binary Protocol Buffer file
message = Unibuf.parse_binary_file("data.binpb", schema: schema)  # (2)

# 3. Access fields normally
puts message.find_field("name").value  # (3)
  1. Schema is mandatory for binary parsing

  2. Parse binary file with schema

  3. Access fields like text format

Binary format from string

# Read binary data
binary_data = File.binread("data.binpb")

# Parse with schema
schema = Unibuf.parse_schema("schema.proto")
message = Unibuf.parse_binary(binary_data, schema: schema)

Supported wire types

The binary parser supports all Protocol Buffer wire types:

Varint (Type 0)

Variable-length integers: int32, int64, uint32, uint64, sint32, sint64, bool, enum

64-bit (Type 1)

Fixed 8-byte values: fixed64, sfixed64, double

Length-delimited (Type 2)

Variable-length data: string, bytes, embedded messages, packed repeated fields

32-bit (Type 5)

Fixed 4-byte values: fixed32, sfixed32, float

Protocol Buffers wire format

General

Unibuf implements complete Protocol Buffers wire format decoding according to the official specification.

Wire format features

Varint decoding

Efficiently decode variable-length integers used for most numeric types

ZigZag encoding

Proper handling of signed integers (sint32, sint64) with zigzag decoding

Fixed-width types

Decode 32-bit and 64-bit fixed-width values (fixed32, fixed64, float, double)

Length-delimited

Parse strings, bytes, and embedded messages with length prefixes

Schema-driven

Use schema to determine field types and deserialize correctly

Example wire format parsing

# Schema defines the structure
schema = Unibuf.parse_schema("schema.proto")

# Binary data uses wire format encoding
binary_data = File.binread("data.binpb")

# Parser uses schema to decode wire format
message = Unibuf.parse_binary(binary_data, schema: schema)

# Access decoded fields
message.field_names  # => ["name", "id", "enabled"]
message.find_field("id").value  # => Properly decoded integer

Schema-required design

General

Unibuf follows Protocol Buffers' and FlatBuffers' schema-driven architecture. The schema (.proto or .fbs file) defines the message structure and is REQUIRED for binary parsing and serialization.

This design ensures type safety and enables proper deserialization of binary formats.

Why schema is required

The schema defines:

  • Message/struct types and their fields

  • Field types, numbers, and ordinals

  • Field wire types for binary encoding

  • Repeated and optional fields

  • Nested message/struct structures

Binary Protocol Buffers, FlatBuffers, and Cap’n Proto cannot be parsed without a schema because the binary formats only store field identifiers, not field names or complete type information.

Schema-based validation

General

Validate Protocol Buffer messages (text or binary) against their Proto3 schemas.

Validating with schema

# Load schema
schema = Unibuf.parse_schema("schema.proto")  # (1)

# Parse message (text or binary)
message = Unibuf.parse_binary_file("data.binpb", schema: schema)  # (2)

# Validate
validator = Unibuf::Validators::SchemaValidator.new(schema)  # (3)
errors = validator.validate(message, "MessageType")  # (4)

if errors.empty?
  puts "✓ Valid!"  # (5)
else
  errors.each { |e| puts "  - #{e}" }  # (6)
end
  1. Parse the Proto3 schema

  2. Parse binary Protocol Buffer

  3. Create validator with schema

  4. Validate message

  5. Validation passed

  6. Show errors if any

Round-trip serialization

General

Unibuf supports complete round-trip serialization for text format, allowing you to parse, modify, and serialize back while preserving semantic equivalence.

Serializing to textproto format

# Parse (text or binary)
message = Unibuf.parse_textproto_file("input.txtpb")  # (1)

# Serialize to text format
textproto = message.to_textproto  # (2)

File.write("output.txtpb", textproto)  # (3)

# Verify round-trip
reparsed = Unibuf.parse_textproto(textproto)  # (4)
puts message == reparsed  # => true (5)
  1. Parse the original file

  2. Serialize to text format

  3. Write to file

  4. Parse the serialized output

  5. Verify semantic equivalence

Rich domain models

General

Unibuf provides rich domain models with comprehensive behavior.

Over 60 classes provide extensive functionality following object-oriented principles.

Message model

# Parse message (text or binary)
schema = Unibuf.parse_schema("schema.proto")
message = Unibuf.parse_binary_file("data.binpb", schema: schema)

# Classification (MECE)
message.nested?            # Has nested messages?
message.scalar_only?       # Only scalar fields?
message.maps?              # Contains maps?
message.repeated_fields?   # Has repeated fields?

# Queries
message.find_field("name")         # Find by name
message.find_fields("tags")        # Find all with name
message.field_names                # All field names
message.repeated_field_names       # Repeated field names

# Traversal
message.traverse_depth_first { |field| ... }
message.traverse_breadth_first { |field| ... }
message.depth  # Maximum nesting depth

# Validation
message.valid?             # Check validity
message.validate!          # Raise if invalid
message.validation_errors  # Get error list

Command-line tools

General

Complete CLI toolkit supporting both text and binary Protocol Buffer formats.

Schema is REQUIRED for proper message type identification.

Parse command

# Parse text format
unibuf parse data.txtpb --schema schema.proto --format json

# Parse binary format
unibuf parse data.binpb --schema schema.proto --format json

# Auto-detect format
unibuf parse data.pb --schema schema.proto --format yaml

# Specify message type
unibuf parse data.binpb --schema schema.proto --message-type FamilyProto

Validate command

# Validate text format
unibuf validate data.txtpb --schema schema.proto

# Validate binary format
unibuf validate data.binpb --schema schema.proto

# Specify message type
unibuf validate data.pb --schema schema.proto --message-type MessageType

Convert command

# Binary to JSON
unibuf convert data.binpb --schema schema.proto --to json

# Binary to text
unibuf convert data.binpb --schema schema.proto --to txtpb

# Text to JSON
unibuf convert data.txtpb --schema schema.proto --to json

Schema command

# Inspect schema
unibuf schema schema.proto

# Output as JSON
unibuf schema schema.proto --format json

Architecture

Component hierarchy

Unibuf
├── Parsers
│   ├── Textproto          Text format parser
│   │   ├── Grammar        Parslet grammar
│   │   ├── Processor      AST transformation
│   │   └── Parser         High-level API
│   ├── Proto3             Schema parser
│   │   ├── Grammar        Proto3 grammar
│   │   ├── Processor      Schema builder
│   │   └── Parser         Schema API
│   ├── Binary             Binary Protocol Buffers
│   │   └── WireFormatParser   Wire format decoder
│   ├── Flatbuffers        FlatBuffers parser
│   │   ├── Grammar        FBS grammar
│   │   ├── Processor      Schema builder
│   │   └── BinaryParser   Binary format
│   └── Capnproto          Cap'n Proto parser
│       ├── Grammar        Cap'n Proto grammar
│       ├── Processor      Schema builder
│       ├── SegmentReader  Segment management
│       ├── PointerDecoder Pointer decoding
│       ├── StructReader   Struct reading
│       ├── ListReader     List reading
│       └── BinaryParser   Binary format
├── Serializers
│   ├── BinarySerializer   Protocol Buffers binary
│   ├── Flatbuffers        FlatBuffers binary
│   │   └── BinarySerializer
│   └── Capnproto          Cap'n Proto binary
│       ├── SegmentBuilder Segment allocation
│       ├── PointerEncoder Pointer encoding
│       ├── StructWriter   Struct writing
│       ├── ListWriter     List writing
│       └── BinarySerializer
├── Models
│   ├── Message            Protocol Buffer message
│   ├── Field              Message field
│   ├── Schema             Proto3 schema
│   ├── MessageDefinition  Message type definition
│   ├── FieldDefinition    Field specification
│   ├── EnumDefinition     Enum type definition
│   ├── Flatbuffers        FlatBuffers models (6 classes)
│   ├── Capnproto          Cap'n Proto models (7 classes)
│   └── Values             Value type hierarchy (5 classes)
├── Validators
│   ├── TypeValidator      Type and range validation
│   └── SchemaValidator    Schema-based validation
└── CLI
    └── Commands           parse, validate, convert, schema

Development

Running tests

bundle exec rspec

Code style

bundle exec rubocop -A

Roadmap

Future work

Additional features

  • gRPC support (Protocol Buffers RPC)

  • Cap’n Proto RPC implementation

  • Performance optimizations

  • Additional Protocol Buffer features

  • Schema evolution tools

Contributing

Bug reports and pull requests are welcome at https://github.com/lutaml/unibuf.

Copyright Ribose Inc.

Licensed under the 3-clause BSD License.