Toon Format 🖼️📦
A Ruby gem implementing TOON (Token-Oriented Object Notation) – the compact, human-readable serialization format that slashes LLM token usage by 30-60% vs JSON while staying lossless.
Perfect for API responses, database exports, and LLM prompts!
💡 Inspired by: This gem is based on the TOON format specification and provides a complete Ruby implementation.
🚀 Why TOON Format?
graph LR
JSON[JSON: 100% tokens] -->|30-60% savings| TOON[TOON: 40-70% tokens]
TOON -->|lossless| JSON
subgraph LLM
Prompt[Your LLM Prompt]
end
TOON -.->|Cheaper/Faster| Prompt
Key Wins:
- 🏆 Token Reduction: 30-60% fewer tokens for LLM contexts
- 🔄 Bidirectional:
encode/decodewith 100% round-trip fidelity - 📊 Smart Tabular Arrays: Auto-optimizes uniform data (e.g., DB records)
- 🛡️ Secure by Design: Depth limits, circular refs, no
eval - ⚡ Fast: ~2x JSON speed
- 🎛️ CLI + Rails: Ready for production
📦 Installation
Requirements:
- Ruby 3.0 or higher
- Tested on Ruby 3.0, 3.1, 3.2, 3.3, 3.4
Add to your Gemfile:
gem 'toon-format'Then install:
bundle installOr install directly:
gem install toon-format⚡ Quick Start
require 'toon_format'
# Encode
data = { name: 'Alice', age: 30 }
toon = ToonFormat.encode(data)
# => "name: Alice\nage: 30"
# Decode
original = ToonFormat.decode(toon)
# => {:name=>"Alice", :age=>30}
# Tabular magic ✨
users = [{id:1, name:'Alice'}, {id:2, name:'Bob'}]
ToonFormat.encode(users)
# => "[2,]{id,name}:\n1,Alice\n2,Bob"🛠️ How It Works: Encoding Flow
flowchart TD
Data[Ruby Data] --> Type{Check Type}
Type -->|Primitive| Prim["null/true/false/num/str"]
Type -->|Hash| Obj["key: value\n..."]
Type -->|Array| Tab{Uniform?<br/>All Hashes +<br/>Primitive Values?}
Tab -->|Yes| Table["[N,]{id,name,...}:\nrow1\nrow2"]
Tab -->|No| List["[N]:\n item1\n item2"]
Prim --> Output[TOON String]
Obj --> Output
Table --> Output
List --> Output
🏗️ Architecture
graph TB
subgraph 'Public API'
Main[lib/toon_format.rb<br/>encode/decode/estimate_savings]
end
subgraph 'Core'
Enc[encoder.rb]
Dec[decoder.rb]
Pars[parser.rb]
Val[validator.rb]
Err[errors.rb]
end
subgraph 'Integrations'
Rails[rails/extensions.rb<br/>ActiveRecord#to_toon]
CLI[exe/toon-format]
end
Main --> Enc
Main --> Dec
Dec --> Pars
Dec --> Val
Main -.-> Rails
Main -.-> CLI
✨ Advanced Usage
Token Savings Estimator
stats = ToonFormat.estimate_savings(data)
# => {json_tokens: 1234, toon_tokens: 789, savings_percent: 36.1}Custom Options
ToonFormat.encode(data, delimiter: '|', indent: 4, length_marker: false)Strict Decoding
ToonFormat.decode(toon, strict: false) # Skip validation🚂 Rails Integration
Auto-extends ActiveRecord:
user.to_toon(only: [:id, :name])🔧 CLI Tool
# Encode JSON → TOON
toon-format encode data.json > data.toon
# Decode
toon-format decode data.toon > data.json
# Stats
toon-format stats data.json
# JSON: 1,234 tokens | TOON: 789 | Savings: 36.1%
# Pipe it!
cat api.json | toon-format encodeOptions: --output FILE --no-strict --delimiter '|' --indent 4 --no-length-marker
📈 Benchmarks
Quick Results
| Scenario | Speed vs JSON | Token Savings |
|---|---|---|
| Tabular Data (100 records) | 2-3x faster | ~52% 🎯 |
| Simple Objects | 1-2x faster | ~14% |
| Nested Structures | Similar | ~22% |
| Large Datasets (1000+) | 1.5-2x faster | 40-70% 🚀 |
Comprehensive Benchmark Suite
We have 11 specialized benchmarks covering:
- ⚡ Performance: Encode/decode speed, scalability (1-10k records)
- 📊 Comparisons: vs JSON, YAML, MessagePack, CSV
- 🌍 Real-World: API responses, DB exports, LLM contexts
- 🔍 Advanced: Memory usage, validation overhead, deep nesting
- 🔄 Fidelity: Round-trip tests, data integrity
Run all benchmarks:
ruby benchmark/run_all_benchmarks.rbRun individual benchmarks:
ruby benchmark/token_reduction_benchmark.rb # Token savings
ruby benchmark/scalability_benchmark.rb # 1-10k records
ruby benchmark/real_world_benchmark.rb # Practical scenarios
ruby benchmark/format_comparison_benchmark.rb # vs other formatsSee benchmark/README.md for details.
🛡️ Security
MAX_DEPTH=100MAX_ARRAY_SIZE=100_000- Circular reference detection
- UTF-8 validation
- No
eval
📊 Status
- ✅ v0.1.0: Core features + 83% coverage (42+ specs)
- 🔄 Next: Complex nesting, 95% coverage
🤝 Contributing
- Fork & clone
bin/setupbundle exec rspecbundle exec rubocop -a- PR away! 🎉
See CONTRIBUTING.md for guidelines.
🌐 Resources & Links
TOON Format
- 📖 TOON Format Repository - Original TOON format
- 📋 TOON Specification - Format specification
- 💎 This Ruby Implementation
This Gem
- 📝 Changelog
- 🤝 Contributing
- 📊 Benchmarks
- 🏗️ Architecture
🙏 Acknowledgments
This gem is inspired by and implements the TOON format specification, created to optimize token usage for LLM contexts. Special thanks to the TOON format community for developing this innovative serialization approach.
📄 License
⭐ Star on GitHub & try it in your LLM pipelines! 🚀