0.0
The project is in a healthy, maintained state
A pure Ruby parser for PostgreSQL pgoutput logical replication CopyData payloads.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
 Dependencies
 Project Readme

pgoutput-parser

A high-performance, Ractor-safe PostgreSQL pgoutput logical replication protocol parser written in pure Ruby.

pgoutput-parser parses PostgreSQL logical replication CopyData payloads into immutable protocol message objects. It focuses on the pgoutput wire format: transaction boundaries, relation metadata, DML message structure, tuple payload markers, and raw tuple bytes.

It intentionally does not convert PostgreSQL values into application-specific Ruby objects. That belongs to a higher-level decoder layer, such as a future pgoutput-decoder gem.


Requirements

  • Ruby 4+
  • PostgreSQL 10+

Features

  • Pure Ruby implementation
  • Ruby 4+
  • Ractor-safe parsed messages
  • Immutable protocol message objects
  • PostgreSQL logical replication protocol support
  • Relation metadata tracking
  • Binary-safe tuple parsing
  • RBS type signatures included
  • YARD documentation included
  • No runtime dependencies

The generated documentation also includes a project glossary: docs/glossary.md.


Why Another pgoutput Library?

This gem focuses exclusively on protocol parsing.

It intentionally separates:

  • Protocol parsing (pgoutput-parser)
  • Type decoding (pgoutput-decoder)
  • Replication transport/client management

This keeps the parser small, predictable, dependency-free, and faithful to PostgreSQL's wire format.


Supported MVP Scope

Supports the core non-streaming pgoutput logical replication messages:

  • Begin (B)
  • Message (M)
  • Origin (O)
  • Relation (R)
  • Type (Y)
  • Insert (I)
  • Update (U)
  • Delete (D)
  • Truncate (T)
  • Commit (C)

The currently supported message formats are stable across PostgreSQL 10 through PostgreSQL 18.

TupleData supports all base column markers:

Tuple Value Tag Meaning
n NULL
u Unchanged TOAST value
t Text-formatted raw value
b Binary-formatted raw value

Planned Support

Future releases may add support for:

  • Stream Start (S)
  • Stream Stop (E)
  • Stream Commit (c)
  • Stream Abort (A)
  • Two-Phase Commit messages

What This Gem Does

PostgreSQL CopyData payload
           │
           ▼
    pgoutput-parser
           │
           ▼
Immutable protocol messages

The parser understands:

  • Message tags and binary field sizes
  • Transaction begin metadata
  • Transaction commit metadata
  • Relation metadata
  • Column flags
  • Column names
  • PostgreSQL type OIDs
  • PostgreSQL type modifiers
  • Insert tuples
  • Update old-key tuples
  • Update old full tuples
  • Update new tuples
  • Delete old-key tuples
  • Delete old full tuples
  • Tuple value markers (n, u, t, b)

What This Gem Does Not Do

The parser does not perform application-level type decoding.

It does not convert:

  • UUID
  • JSONB
  • Timestamp
  • Numeric
  • Array
  • Range
  • PostGIS
  • Custom PostgreSQL types

Example:

value.raw
# => "2026-05-31 12:34:56+00"

The raw value is preserved exactly as received.

A higher-level decoder may later interpret it.


Non-goals

This project intentionally does not:

  • Manage replication slots
  • Open replication connections
  • Maintain WAL positions
  • Reconnect to PostgreSQL
  • Decode PostgreSQL types
  • Integrate with ActiveRecord
  • Publish events
  • Build CDC pipelines

Its sole responsibility is parsing pgoutput protocol messages.


Installation

Add this line to your Gemfile:

gem "pgoutput-parser"

Then run:

bundle install

Require the library:

require "pgoutput"

Quick Start

require "pgoutput"

stream = Pgoutput::RelationTracker.new

stream.process(relation_payload)

insert = stream.process(insert_payload)

insert.relation_id
# => 42

insert.tuple.first.raw
# => "7"

insert.tuple.first.oid
# => 23

Binary Tuple Values

When PostgreSQL publishes tuple values using binary format (b), the parser preserves the raw bytes exactly as received.

value.raw
# => "\x00\x00\x00\x07".b

The parser does not interpret binary values.


Update Messages

PostgreSQL Update messages may contain:

  • No old tuple
  • An old key tuple (K)
  • An old full tuple (O)

They always contain a new tuple (N).

update = stream.process(update_payload)

update.old_key_tuple
# => [Pgoutput::Messages::TupleValue, ...] or nil

update.old_tuple
# => [Pgoutput::Messages::TupleValue, ...] or nil

update.new_tuple
# => [Pgoutput::Messages::TupleValue, ...]

Update Tuple Example

update = stream.process(update_payload)

update.old_key_tuple
update.old_tuple
update.new_tuple

Delete Messages

PostgreSQL Delete messages contain either:

  • An old key tuple (K)
  • An old full tuple (O)
delete = stream.process(delete_payload)

delete.old_key_tuple
# => [Pgoutput::Messages::TupleValue, ...] or nil

delete.old_tuple
# => [Pgoutput::Messages::TupleValue, ...] or nil

Relation Metadata Tracking

RelationTracker keeps a local relation cache so tuple values can be associated with PostgreSQL column OIDs defined by preceding Relation (R) messages.

The tracker accepts an optional relation_cache: argument. The default is a plain Hash, but callers can inject Ratomic::Map for a Ractor-safe cache in experimental or parallel setups.

For a deeper guide, including stream-order behavior, tuple arity validation, and Ratomic::Map usage, see docs/relation_tracker.md.

No type conversion is performed.

Only protocol metadata is attached.

stream.process(relation_payload)

message = stream.process(update_payload)

message.new_tuple.map(&:oid)
# => [23, 25, 16]

The relation tracker itself is stateful and maintains relation metadata encountered in the replication stream.

If a DML tuple's value count does not match the cached relation column count, RelationTracker raises Pgoutput::TupleArityError. This keeps malformed payloads or mismatched stream state from being silently annotated with incomplete column metadata.


Ractor Safety

message = stream.process(update_payload)

Ractor.shareable?(message)
# => true

Passing parsed messages to a Ractor:

message = stream.process(update_payload)

result = Ractor.new(message) do |update|
  update.new_tuple.map(&:raw)
end.take

Architecture

PostgreSQL
      │
      ▼
CopyData payload
      │
      ▼
Pgoutput::BinaryParser
      │
      ▼
Parsed protocol message
      │
      ▼
Pgoutput::RelationTracker
      │
      ▼
Protocol message with relation metadata
      │
      ▼
Ractor-safe protocol message

Public API

Pgoutput::BinaryParser

Parses a single pgoutput payload without stream state.

message = Pgoutput::BinaryParser.new(payload).parse

Pgoutput::RelationTracker

Parses messages in stream order and remembers relation metadata.

stream = Pgoutput::RelationTracker.new

stream.process(relation_payload)

message = stream.process(insert_payload)

Optional Usage

RelationTracker is optional.

If relation metadata tracking is not required, payloads can be parsed directly:

message =
  Pgoutput::BinaryParser
    .new(payload)
    .parse

RelationTracker Lifecycle

A RelationTracker should be created per logical replication stream.

stream = Pgoutput::RelationTracker.new

The tracker maintains relation metadata discovered during the stream and therefore should not be reused across unrelated replication sessions.


Message Classes

Pgoutput::Messages::Begin
Pgoutput::Messages::Message
Pgoutput::Messages::Origin
Pgoutput::Messages::Relation
Pgoutput::Messages::Type
Pgoutput::Messages::Column
Pgoutput::Messages::TupleValue
Pgoutput::Messages::Insert
Pgoutput::Messages::Update
Pgoutput::Messages::Delete
Pgoutput::Messages::Truncate
Pgoutput::Messages::Commit

Type Signatures

RBS signatures are included:

sig/pgoutput.rbs

Run Steep:

bundle exec steep check

Testing

Run all tests:

bundle exec rake test

Run with coverage:

COVERAGE=true bundle exec rake test

Benchmarking

Run the parser throughput benchmark:

ruby benchmark/parser_throughput.rb

The benchmark reports single-process parser throughput, relation-tracker throughput, and Ractor-parallel throughput. It is intended to show both the single-thread baseline and the Ruby 4 Ractor path this parser is designed to support. Relation-tracker scenarios can also compare the default Hash relation cache with an optional Ratomic::Map cache.

Tune the run with environment variables:

Variable Default Description
PGOUTPUT_BENCH_ITERATIONS 100000 Iterations per selected scenario.
PGOUTPUT_BENCH_WARMUP 1000 Warmup iterations before timing.
PGOUTPUT_BENCH_RACTORS 2 or CPU count, whichever is lower Number of Ractor workers for Ractor scenarios.
PGOUTPUT_BENCH_SCENARIOS all Comma-separated scenarios: binary, tracker_dml, tracker_mixed, ractor_binary, ractor_tracker, or all.
PGOUTPUT_BENCH_RELATION_CACHE hash Comma-separated relation-cache backends for tracker scenarios: hash, ratomic, or all.

Examples:

PGOUTPUT_BENCH_ITERATIONS=10000 ruby benchmark/parser_throughput.rb
PGOUTPUT_BENCH_SCENARIOS=binary,tracker_mixed ruby benchmark/parser_throughput.rb
PGOUTPUT_BENCH_RACTORS=4 PGOUTPUT_BENCH_SCENARIOS=ractor_binary,ractor_tracker ruby benchmark/parser_throughput.rb
PGOUTPUT_BENCH_RELATION_CACHE=all PGOUTPUT_BENCH_SCENARIOS=tracker_mixed,ractor_tracker ruby benchmark/parser_throughput.rb

Sample Ruby 4 output:

pgoutput-parser throughput
iterations=1000 warmup=10 ractors=2 scenarios=tracker_mixed,ractor_tracker relation_cache=hash,ratomic ruby=4.0.5
RelationTracker hash               7000 messages in   0.163s        42891 msg/s
RelationTracker ratomic            7000 messages in   0.131s        53579 msg/s
Ractor RelationTracker hash       14000 messages in   0.197s        71097 msg/s
Ractor RelationTracker ratomic      14000 messages in   0.146s        96190 msg/s

Interpret the Ractor rows as aggregate throughput across workers. They are not a replacement for the single-process rows; they demonstrate the parser's shareable-message design under parallel execution.


Development

Generate YARD documentation:

bundle exec yard doc

Ecosystem Direction

This gem is the protocol layer.

pgoutput-parser
      │
      ▼
Protocol messages
      │
      ▼
pgoutput-decoder
      │
      ▼
Application objects

pgoutput-parser should remain small, dependency-free, binary-safe, and faithful to PostgreSQL's wire format.


License

MIT.