Unibuf: Universal Buffer Format Parser
Purpose
Unibuf is a pure Ruby gem for parsing and manipulating multiple serialization formats including Protocol Buffers, FlatBuffers, and Cap’n Proto.
It provides fully object-oriented, specification-compliant parsers with rich domain models, comprehensive schema validation, binary format encoding/decoding, and complete round-trip serialization support.
Key features:
-
Protocol Buffers
-
Parse text format (
.txtpb,.textproto) -
Parse binary format (
.binpb) with schema -
Serialize to binary format (
.binpb) -
Parse Proto3 schemas (
.proto) -
Wire format encoding/decoding (varint, zigzag, all wire types)
-
-
FlatBuffers
-
Parse schemas (
.fbs) -
Parse binary format (
.fb) -
Serialize to binary format (
.fb)
-
-
Cap’n Proto
-
Parse schemas (
.capnp) -
Parse binary format with segment management
-
Serialize to binary format with pointer encoding
-
Support for structs, enums, interfaces (RPC)
-
Generic types (List<T>)
-
Unions and annotations
-
-
Serialization and validation
-
Complete round-trip serialization for all formats
-
Schema-driven validation and deserialization
-
-
Developer usage
-
Rich domain models with 60+ behavioral classes
-
Complete CLI toolkit for all formats
-
Pure Ruby - no C/C++ dependencies
-
Installation
Add this line to your application’s Gemfile:
gem "unibuf"And then execute:
bundle installOr install it yourself as:
gem install unibufFeatures
-
Protocol Buffers support
-
FlatBuffers support
-
Cap’n Proto support
-
Schema-required design
-
Parsing text format
-
Parsing binary format
-
Schema-based validation
-
Wire format support
-
Round-trip serialization
-
Rich domain models
-
Command-line tools
Protocol Buffers
General
Full support for Protocol Buffers (protobuf) including text format parsing, binary format parsing/serialization, and Proto3 schema parsing.
See PROTOBUF.adoc for detailed documentation.
Parsing Protocol Buffers text format
require "unibuf"
# Load schema (recommended for validation)
schema = Unibuf.parse_schema("schema.proto") # (1)
# Parse text format file
message = Unibuf.parse_textproto_file("data.txtpb") # (2)
# Validate against schema
validator = Unibuf::Validators::SchemaValidator.new(schema) # (3)
validator.validate!(message, "MessageType") # (4)-
Load Proto3 schema from .proto file
-
Parse Protocol Buffers text format
-
Create validator with schema
-
Validate message against schema
Parsing Protocol Buffers binary format
require "unibuf"
# 1. Load schema (REQUIRED for binary)
schema = Unibuf.parse_schema("schema.proto") # (1)
# 2. Parse binary Protocol Buffer file
message = Unibuf.parse_binary_file("data.binpb", schema: schema) # (2)
# 3. Access fields normally
puts message.find_field("name").value # (3)-
Schema is mandatory for binary parsing
-
Parse binary file with schema
-
Access fields like text format
FlatBuffers
General
Complete support for Google FlatBuffers including schema parsing (.fbs files)
and binary format parsing/serialization.
See FLATBUFFERS.adoc for detailed documentation.
Parsing FlatBuffers schema
require "unibuf"
# Parse FlatBuffers schema
schema = Unibuf.parse_flatbuffers_schema("schema.fbs") # (1)
# Access schema structure
table = schema.find_table("Monster") # (2)
table.fields.each { |f| puts "#{f.name}: #{f.type}" } # (3)-
Parse
.fbsschema file -
Find table definition
-
Iterate through fields
Parsing FlatBuffers binary format
# Parse binary FlatBuffer
data = Unibuf.parse_flatbuffers_binary(binary_data, schema: schema) # (1)
# Access data
puts data["name"] # (2)
puts data["hp"] # (3)-
Parse binary with schema
-
Access string field
-
Access numeric field
Cap’n Proto
General
Complete support for Cap’n Proto including schema parsing (.capnp files) and
binary format parsing/serialization with segment management and pointer
encoding.
See CAPNPROTO.adoc for detailed documentation.
Parsing Cap’n Proto schema
require "unibuf"
# Parse Cap'n Proto schema
schema = Unibuf.parse_capnproto_schema("addressbook.capnp") # (1)
# Access schema structure
person = schema.find_struct("Person") # (2)
person.fields.each { |f| puts "#{f.name} @#{f.ordinal} :#{f.type}" } # (3)
# Access interfaces (RPC)
calc = schema.find_interface("Calculator") # (4)
calc.methods.each { |m| puts "#{m.name} @#{m.ordinal}" } # (5)-
Parse
.capnpschema file -
Find struct definition
-
Iterate through fields with ordinals
-
Find interface definition (RPC)
-
List RPC methods
Parsing Cap’n Proto binary format
# Parse binary Cap'n Proto data
parser = Unibuf::Parsers::Capnproto::BinaryParser.new(schema) # (1)
data = parser.parse(binary_data, root_type: "Person") # (2)
# Access data
puts data[:name] # (3)
puts data[:email] # (4)-
Create parser with schema
-
Parse binary with root type
-
Access text field
-
Access another field
Serializing Cap’n Proto binary format
# Serialize to binary
serializer = Unibuf::Serializers::Capnproto::BinarySerializer.new(schema) # (1)
binary = serializer.serialize(
{ id: 1, name: "Alice", email: "alice@example.com" }, # (2)
root_type: "Person" # (3)
)
# Write to file
File.binwrite("output.capnp.bin", binary) # (4)-
Create serializer with schema
-
Provide data as hash
-
Specify root struct type
-
Write binary output
Protocol Buffers text format
General
Parse human-readable Protocol Buffer text format files following the official specification.
See TXTPROTO.adoc for detailed documentation.
Parsing text format
require "unibuf"
# Load schema (recommended for validation)
schema = Unibuf.parse_schema("schema.proto") # (1)
# Parse text format file
message = Unibuf.parse_textproto_file("data.txtpb") # (2)
# Validate against schema
validator = Unibuf::Validators::SchemaValidator.new(schema) # (3)
validator.validate!(message, "MessageType") # (4)-
Load Proto3 schema from .proto file
-
Parse Protocol Buffers text format
-
Create validator with schema
-
Validate message against schema
Parsing Protocol Buffers binary format
General
Parse binary Protocol Buffer data using wire format decoding with schema-driven deserialization.
The schema is REQUIRED for binary parsing because binary format only stores field numbers, not names or types.
Parsing binary format
require "unibuf"
# 1. Load schema (REQUIRED for binary)
schema = Unibuf.parse_schema("schema.proto") # (1)
# 2. Parse binary Protocol Buffer file
message = Unibuf.parse_binary_file("data.binpb", schema: schema) # (2)
# 3. Access fields normally
puts message.find_field("name").value # (3)-
Schema is mandatory for binary parsing
-
Parse binary file with schema
-
Access fields like text format
Binary format from string
# Read binary data
binary_data = File.binread("data.binpb")
# Parse with schema
schema = Unibuf.parse_schema("schema.proto")
message = Unibuf.parse_binary(binary_data, schema: schema)Supported wire types
The binary parser supports all Protocol Buffer wire types:
- Varint (Type 0)
-
Variable-length integers: int32, int64, uint32, uint64, sint32, sint64, bool, enum
- 64-bit (Type 1)
-
Fixed 8-byte values: fixed64, sfixed64, double
- Length-delimited (Type 2)
-
Variable-length data: string, bytes, embedded messages, packed repeated fields
- 32-bit (Type 5)
-
Fixed 4-byte values: fixed32, sfixed32, float
Protocol Buffers wire format
General
Unibuf implements complete Protocol Buffers wire format decoding according to the official specification.
Wire format features
- Varint decoding
-
Efficiently decode variable-length integers used for most numeric types
- ZigZag encoding
-
Proper handling of signed integers (sint32, sint64) with zigzag decoding
- Fixed-width types
-
Decode 32-bit and 64-bit fixed-width values (fixed32, fixed64, float, double)
- Length-delimited
-
Parse strings, bytes, and embedded messages with length prefixes
- Schema-driven
-
Use schema to determine field types and deserialize correctly
Example wire format parsing
# Schema defines the structure
schema = Unibuf.parse_schema("schema.proto")
# Binary data uses wire format encoding
binary_data = File.binread("data.binpb")
# Parser uses schema to decode wire format
message = Unibuf.parse_binary(binary_data, schema: schema)
# Access decoded fields
message.field_names # => ["name", "id", "enabled"]
message.find_field("id").value # => Properly decoded integerSchema-required design
General
Unibuf follows Protocol Buffers' and FlatBuffers' schema-driven architecture.
The schema (.proto or .fbs file) defines the message structure and is
REQUIRED for binary parsing and serialization.
This design ensures type safety and enables proper deserialization of binary formats.
Why schema is required
The schema defines:
-
Message/struct types and their fields
-
Field types, numbers, and ordinals
-
Field wire types for binary encoding
-
Repeated and optional fields
-
Nested message/struct structures
Binary Protocol Buffers, FlatBuffers, and Cap’n Proto cannot be parsed without a schema because the binary formats only store field identifiers, not field names or complete type information.
Schema-based validation
General
Validate Protocol Buffer messages (text or binary) against their Proto3 schemas.
Validating with schema
# Load schema
schema = Unibuf.parse_schema("schema.proto") # (1)
# Parse message (text or binary)
message = Unibuf.parse_binary_file("data.binpb", schema: schema) # (2)
# Validate
validator = Unibuf::Validators::SchemaValidator.new(schema) # (3)
errors = validator.validate(message, "MessageType") # (4)
if errors.empty?
puts "✓ Valid!" # (5)
else
errors.each { |e| puts " - #{e}" } # (6)
end-
Parse the Proto3 schema
-
Parse binary Protocol Buffer
-
Create validator with schema
-
Validate message
-
Validation passed
-
Show errors if any
Round-trip serialization
General
Unibuf supports complete round-trip serialization for text format, allowing you to parse, modify, and serialize back while preserving semantic equivalence.
Serializing to textproto format
# Parse (text or binary)
message = Unibuf.parse_textproto_file("input.txtpb") # (1)
# Serialize to text format
textproto = message.to_textproto # (2)
File.write("output.txtpb", textproto) # (3)
# Verify round-trip
reparsed = Unibuf.parse_textproto(textproto) # (4)
puts message == reparsed # => true (5)-
Parse the original file
-
Serialize to text format
-
Write to file
-
Parse the serialized output
-
Verify semantic equivalence
Rich domain models
General
Unibuf provides rich domain models with comprehensive behavior.
Over 60 classes provide extensive functionality following object-oriented principles.
Message model
# Parse message (text or binary)
schema = Unibuf.parse_schema("schema.proto")
message = Unibuf.parse_binary_file("data.binpb", schema: schema)
# Classification (MECE)
message.nested? # Has nested messages?
message.scalar_only? # Only scalar fields?
message.maps? # Contains maps?
message.repeated_fields? # Has repeated fields?
# Queries
message.find_field("name") # Find by name
message.find_fields("tags") # Find all with name
message.field_names # All field names
message.repeated_field_names # Repeated field names
# Traversal
message.traverse_depth_first { |field| ... }
message.traverse_breadth_first { |field| ... }
message.depth # Maximum nesting depth
# Validation
message.valid? # Check validity
message.validate! # Raise if invalid
message.validation_errors # Get error listCommand-line tools
General
Complete CLI toolkit supporting both text and binary Protocol Buffer formats.
Schema is REQUIRED for proper message type identification.
Parse command
# Parse text format
unibuf parse data.txtpb --schema schema.proto --format json
# Parse binary format
unibuf parse data.binpb --schema schema.proto --format json
# Auto-detect format
unibuf parse data.pb --schema schema.proto --format yaml
# Specify message type
unibuf parse data.binpb --schema schema.proto --message-type FamilyProtoValidate command
# Validate text format
unibuf validate data.txtpb --schema schema.proto
# Validate binary format
unibuf validate data.binpb --schema schema.proto
# Specify message type
unibuf validate data.pb --schema schema.proto --message-type MessageTypeConvert command
# Binary to JSON
unibuf convert data.binpb --schema schema.proto --to json
# Binary to text
unibuf convert data.binpb --schema schema.proto --to txtpb
# Text to JSON
unibuf convert data.txtpb --schema schema.proto --to jsonSchema command
# Inspect schema
unibuf schema schema.proto
# Output as JSON
unibuf schema schema.proto --format jsonArchitecture
Component hierarchy
Unibuf
├── Parsers
│ ├── Textproto Text format parser
│ │ ├── Grammar Parslet grammar
│ │ ├── Processor AST transformation
│ │ └── Parser High-level API
│ ├── Proto3 Schema parser
│ │ ├── Grammar Proto3 grammar
│ │ ├── Processor Schema builder
│ │ └── Parser Schema API
│ ├── Binary Binary Protocol Buffers
│ │ └── WireFormatParser Wire format decoder
│ ├── Flatbuffers FlatBuffers parser
│ │ ├── Grammar FBS grammar
│ │ ├── Processor Schema builder
│ │ └── BinaryParser Binary format
│ └── Capnproto Cap'n Proto parser
│ ├── Grammar Cap'n Proto grammar
│ ├── Processor Schema builder
│ ├── SegmentReader Segment management
│ ├── PointerDecoder Pointer decoding
│ ├── StructReader Struct reading
│ ├── ListReader List reading
│ └── BinaryParser Binary format
├── Serializers
│ ├── BinarySerializer Protocol Buffers binary
│ ├── Flatbuffers FlatBuffers binary
│ │ └── BinarySerializer
│ └── Capnproto Cap'n Proto binary
│ ├── SegmentBuilder Segment allocation
│ ├── PointerEncoder Pointer encoding
│ ├── StructWriter Struct writing
│ ├── ListWriter List writing
│ └── BinarySerializer
├── Models
│ ├── Message Protocol Buffer message
│ ├── Field Message field
│ ├── Schema Proto3 schema
│ ├── MessageDefinition Message type definition
│ ├── FieldDefinition Field specification
│ ├── EnumDefinition Enum type definition
│ ├── Flatbuffers FlatBuffers models (6 classes)
│ ├── Capnproto Cap'n Proto models (7 classes)
│ └── Values Value type hierarchy (5 classes)
├── Validators
│ ├── TypeValidator Type and range validation
│ └── SchemaValidator Schema-based validation
└── CLI
└── Commands parse, validate, convert, schemaDevelopment
Running tests
bundle exec rspecCode style
bundle exec rubocop -ARoadmap
Future work
Additional features
-
gRPC support (Protocol Buffers RPC)
-
Cap’n Proto RPC implementation
-
Performance optimizations
-
Additional Protocol Buffer features
-
Schema evolution tools
Contributing
Bug reports and pull requests are welcome at https://github.com/lutaml/unibuf.
Copyright and license
Copyright Ribose Inc.
Licensed under the 3-clause BSD License.