Toon Format 🖼️📦

A Ruby gem implementing TOON (Token-Oriented Object Notation) – the compact, human-readable serialization format that slashes LLM token usage by 30-60% vs JSON while staying lossless.

Perfect for API responses, database exports, and LLM prompts!

💡 Inspired by: This gem is based on the TOON format specification and provides a complete Ruby implementation.

🚀 Why TOON Format?

graph LR
  JSON[JSON: 100% tokens] -->|30-60% savings| TOON[TOON: 40-70% tokens]
  TOON -->|lossless| JSON
  subgraph LLM
    Prompt[Your LLM Prompt]
  end
  TOON -.->|Cheaper/Faster| Prompt

Key Wins:

🏆 Token Reduction: 30-60% fewer tokens for LLM contexts
🔄 Bidirectional: encode/decode with 100% round-trip fidelity
📊 Smart Tabular Arrays: Auto-optimizes uniform data (e.g., DB records)
🛡️ Secure by Design: Depth limits, circular refs, no eval
⚡ Fast: ~2x JSON speed
🎛️ CLI + Rails: Ready for production

📦 Installation

Requirements:

Ruby 3.0 or higher
Tested on Ruby 3.0, 3.1, 3.2, 3.3, 3.4

Add to your Gemfile:

gem 'toon-format'

Then install:

bundle install

Or install directly:

gem install toon-format

⚡ Quick Start

require 'toon_format'

# Encode
data = { name: 'Alice', age: 30 }
toon = ToonFormat.encode(data)
# => "name: Alice\nage: 30"

# Decode
original = ToonFormat.decode(toon)
# => {:name=>"Alice", :age=>30}

# Tabular magic ✨
users = [{id:1, name:'Alice'}, {id:2, name:'Bob'}]
ToonFormat.encode(users)
# => "[2,]{id,name}:\n1,Alice\n2,Bob"

🛠️ How It Works: Encoding Flow

flowchart TD
    Data[Ruby Data] --> Type{Check Type}
    Type -->|Primitive| Prim["null/true/false/num/str"]
    Type -->|Hash| Obj["key: value\n..."]
    Type -->|Array| Tab{Uniform?<br/>All Hashes +<br/>Primitive Values?}
    Tab -->|Yes| Table["[N,]{id,name,...}:\nrow1\nrow2"]
    Tab -->|No| List["[N]:\n  item1\n  item2"]
    Prim --> Output[TOON String]
    Obj --> Output
    Table --> Output
    List --> Output

🏗️ Architecture

graph TB
    subgraph 'Public API'
        Main[lib/toon_format.rb<br/>encode/decode/estimate_savings]
    end
    subgraph 'Core'
        Enc[encoder.rb]
        Dec[decoder.rb]
        Pars[parser.rb]
        Val[validator.rb]
        Err[errors.rb]
    end
    subgraph 'Integrations'
        Rails[rails/extensions.rb<br/>ActiveRecord#to_toon]
        CLI[exe/toon-format]
    end
    Main --> Enc
    Main --> Dec
    Dec --> Pars
    Dec --> Val
    Main -.-> Rails
    Main -.-> CLI

✨ Advanced Usage

Token Savings Estimator

stats = ToonFormat.estimate_savings(data)
# => {json_tokens: 1234, toon_tokens: 789, savings_percent: 36.1}

Custom Options

ToonFormat.encode(data, delimiter: '|', indent: 4, length_marker: false)

Strict Decoding

ToonFormat.decode(toon, strict: false)  # Skip validation

🚂 Rails Integration

Auto-extends ActiveRecord:

user.to_toon(only: [:id, :name])

🔧 CLI Tool

# Encode JSON → TOON
toon-format encode data.json > data.toon

# Decode
toon-format decode data.toon > data.json

# Stats
toon-format stats data.json
# JSON: 1,234 tokens | TOON: 789 | Savings: 36.1%

# Pipe it!
cat api.json | toon-format encode

Options: --output FILE --no-strict --delimiter '|' --indent 4 --no-length-marker

📈 Benchmarks

Quick Results

Scenario	Speed vs JSON	Token Savings
Tabular Data (100 records)	2-3x faster	~52% 🎯
Simple Objects	1-2x faster	~14%
Nested Structures	Similar	~22%
Large Datasets (1000+)	1.5-2x faster	40-70% 🚀

Comprehensive Benchmark Suite

We have 11 specialized benchmarks covering:

⚡ Performance: Encode/decode speed, scalability (1-10k records)
📊 Comparisons: vs JSON, YAML, MessagePack, CSV
🌍 Real-World: API responses, DB exports, LLM contexts
🔍 Advanced: Memory usage, validation overhead, deep nesting
🔄 Fidelity: Round-trip tests, data integrity

Run all benchmarks:

ruby benchmark/run_all_benchmarks.rb

Run individual benchmarks:

ruby benchmark/token_reduction_benchmark.rb  # Token savings
ruby benchmark/scalability_benchmark.rb      # 1-10k records
ruby benchmark/real_world_benchmark.rb       # Practical scenarios
ruby benchmark/format_comparison_benchmark.rb # vs other formats

See benchmark/README.md for details.

🛡️ Security

MAX_DEPTH=100
MAX_ARRAY_SIZE=100_000
Circular reference detection
UTF-8 validation
No eval

📊 Status

✅ v0.1.0: Core features + 83% coverage (42+ specs)
🔄 Next: Complex nesting, 95% coverage

🤝 Contributing

Fork & clone
bin/setup
bundle exec rspec
bundle exec rubocop -a
PR away! 🎉

See CONTRIBUTING.md for guidelines.

🌐 Resources & Links

TOON Format

📖 TOON Format Repository - Original TOON format
📋 TOON Specification - Format specification
💎 This Ruby Implementation

This Gem

📝 Changelog
🤝 Contributing
📊 Benchmarks
🏗️ Architecture

🙏 Acknowledgments

This gem is inspired by and implements the TOON format specification, created to optimize token usage for LLM contexts. Special thanks to the TOON format community for developing this innovative serialization approach.

📄 License

MIT

⭐ Star on GitHub & try it in your LLM pipelines! 🚀