Project

xml-c14n

0.0
There's a lot of open issues
No release in over a year
Library for XML canonicalization
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
 Dependencies

Development

Runtime

 Project Readme

Canon: Semantic comparison for serialization formats

Table of Contents
  • Purpose
  • Installation
  • Quick start
    • Format documents
    • Compare documents
    • Use in tests
    • Command-line interface
  • Documentation
    • Using Canon
    • Understanding Canon
    • Features
    • Advanced topics
  • Features
    • Canonicalization
    • Semantic comparison
    • Smart diff output
    • Enhanced diff features
    • Input validation
  • Examples
    • Ruby API example
    • CLI example
    • RSpec example
  • Architecture
  • Development
  • Contributing
  • Copyright and license

Purpose

Canon provides canonicalization, pretty-printing, and semantic comparison for serialization formats (XML, HTML, JSON, YAML). It produces standardized forms suitable for comparison, testing, digital signatures, and human-readable output.

Key features:

  • Format support: XML, HTML, JSON, YAML

  • Canonicalization: W3C XML C14N 1.1, sorted JSON/YAML keys

  • Semantic comparison: Compare meaning, not formatting

  • Multiple interfaces: Ruby API, CLI, RSpec matchers

  • Smart diff output: By-line or by-object modes with syntax highlighting

Installation

Add to your application’s Gemfile:

gem 'canon'

Then execute:

$ bundle install

Or install directly:

$ gem install canon

Quick start

Format documents

require 'canon'

# Canonical form (compact)
Canon.format('<root><b>2</b><a>1</a></root>', :xml)
# => "<root><a>1</a><b>2</b></root>"

# Pretty-print (human-readable)
require 'canon/pretty_printer/xml'
Canon::Xml::PrettyPrinter.new(indent: 2).format(xml_input)

Compare documents

require 'canon/comparison'

xml1 = '<root><a>1</a><b>2</b></root>'
xml2 = '<root>  <b>2</b>  <a>1</a>  </root>'

Canon::Comparison.equivalent?(xml1, xml2)
# => true (semantically equivalent despite formatting differences)

Use in tests

require 'canon/rspec_matchers'

RSpec.describe 'XML generation' do
  it 'generates correct XML' do
    expect(actual_xml).to be_xml_equivalent_to(expected_xml)
  end
end

Command-line interface

# Format a file
$ canon format input.xml --mode pretty

# Compare files
$ canon diff file1.xml file2.xml --verbose

# Get help
$ canon help

Documentation

Using Canon

Understanding Canon

Features

Advanced topics

Features

Canonicalization

XML: W3C Canonical XML Version 1.1 specification with namespace declaration ordering, attribute ordering, character encoding normalization, and proper handling of xml:base, xml:lang, xml:space, and xml:id attributes.

HTML: Consistent formatting for HTML 4/5 and XHTML with automatic detection and appropriate formatting rules.

JSON/YAML: Alphabetically sorted keys at all levels with consistent formatting.

Semantic comparison

Compare documents based on meaning, not formatting:

  • Whitespace normalization options

  • Attribute/key order handling

  • Comment handling

  • Multiple match dimensions with behaviors

  • Predefined match profiles (strict, rendered, spec_friendly, content_only)

See Match options for details.

Smart diff output

By-line mode: Traditional line-by-line diff with:

  • DOM-guided semantic matching for XML

  • Syntax-aware token highlighting

  • Context lines around changes

  • Whitespace visualization

By-object mode: Tree-based semantic diff with:

  • Visual tree structure using box-drawing characters

  • Shows only what changed (additions, removals, modifications)

  • Color-coded output

See Diff modes for details.

Enhanced diff features

  • Color-coded output: Red (normative deletions), green (normative additions), yellow (normative structure), cyan (informative diffs)

  • Whitespace visualization: Make invisible characters visible with CJK-safe Unicode symbols

  • Non-ASCII detection: Warnings for unexpected Unicode characters

  • Customizable: Character maps, context lines, grouping options

Input validation

Comprehensive validation with clear error messages showing exact line and column numbers for syntax errors in XML, HTML, JSON, and YAML.

See Input validation for details.

Examples

Ruby API example

require 'canon/comparison'

# Compare with custom options
Canon::Comparison.equivalent?(doc1, doc2,
  match: {
    text_content: :normalize,
    structural_whitespace: :ignore,
    comments: :ignore
  },
  verbose: true
)

CLI example

# Compare with semantic diff
$ canon diff file1.xml file2.xml \
  --verbose \
  --text-content normalize \
  --structural-whitespace ignore

RSpec example

# Configure globally
Canon::RSpecMatchers.configure do |config|
  config.xml.match.profile = :spec_friendly
  config.xml.diff.use_color = true
end

# Use in tests
RSpec.describe 'XML generation' do
  it 'generates correct structure' do
    expect(actual_xml).to be_xml_equivalent_to(expected_xml)
  end
end

Architecture

Canon follows an orchestrator pattern with MECE (Mutually Exclusive, Collectively Exhaustive) principles:

Comparison module (Canon::Comparison): Format detection, validation, and delegation to format-specific comparators (XML, HTML, JSON, YAML).

DiffFormatter module (Canon::DiffFormatter): Diff mode detection and delegation to mode-specific formatters (by-line, by-object).

Three-phase comparison:

  1. Preprocessing: Optional document normalization (c14n, normalize, format)

  2. Semantic matching: Configurable match dimensions with behaviors

  3. Diff rendering: Formatted output with visualization

See Match architecture for details.

Development

After checking out the repo, run bin/setup to install dependencies. Then run rake spec to run the tests. You can also run bin/console for an interactive prompt.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/lutaml/canon.

Copyright Ribose. BSD-2-Clause License.