Project

moxml

0.0
There's a lot of open issues
Moxml is a unified XML manipulation library that provides a common API.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
 Dependencies
 Project Readme

Moxml: Modern XML processing for Ruby

Build Status

Contents
  • Introduction and purpose
  • Supported XML libraries
    • Feature table
  • Adapter comparison
    • Feature compatibility matrix
    • Adapter selection guide
  • Getting started
    • Installation
    • Basic document creation
  • Real-world examples
  • Working with documents
    • Using the builder pattern
    • Direct document manipulation
    • Fluent interface API
      • Chainable element methods
      • Convenience query methods
      • Quick element creation
      • Practical fluent example
  • XML objects and their methods
    • Document object
    • Element object
    • Text object
    • CDATA object
    • Comment object
    • Processing instruction object
    • Attribute object
    • Namespace object
    • Node traversal and inspection
  • Advanced features
    • XPath querying and node mapping
      • Nokogiri, Oga, REXML, LibXML
      • Ox
      • HeadedOx
    • Namespace handling
    • Accessing native implementation
  • Error handling
    • Error class hierarchy
    • Enhanced error context
    • Error types and usage
      • ParseError
      • XPathError
      • ValidationError
      • NamespaceError
      • AdapterError
      • SerializationError
      • DocumentStructureError
      • AttributeError
      • NotImplementedError
    • Best practices for error handling
  • Configuration
    • General
    • Default adapter selection
  • Thread safety
  • Performance considerations
    • Memory management
    • Efficient querying
  • Best practices
    • Document creation
    • Node manipulation
  • Specific adapter limitations
    • Ox adapter
      • XPath limitations
    • HeadedOx adapter
      • General
      • Features
      • Architecture
      • Known limitations
      • When to Use HeadedOx
      • XPath capabilities
      • What XPath queries work in HeadedOx
    • LibXML adapter
    • Other adapters
  • Development and testing
    • Skipping benchmarks
    • Running tests with coverage
    • Generating performance benchmark reports
      • Running the benchmark report
      • Benchmark categories
      • Report contents
      • Viewing the report
      • Example output
  • Contributing
  • License

Introduction and purpose

Moxml provides a unified, modern XML processing interface for Ruby applications. It offers a consistent API that abstracts away the underlying XML implementation details while maintaining high performance through efficient node mapping and native XPath querying.

Key features:

  • Intuitive, Ruby-idiomatic API for XML manipulation

  • Consistent interface across different XML libraries

  • Efficient node mapping for XPath queries

  • Support for all XML node types and features

  • Easy switching between XML processing engines

  • Clean separation between interface and implementation

Supported XML libraries

Moxml supports the following XML libraries:

REXML

REXML, a pure Ruby XML parser distributed with standard Ruby. Not the fastest, but always available.

Nokogiri

(default) Nokogiri, a widely used implementation which wraps around the performant libxml2 C library.

Oga

Oga, a pure Ruby XML parser. Recommended when you need a pure Ruby solution say for Opal.

Ox

Ox, a fast XML parser.

LibXML

libxml-ruby, Ruby bindings for the performant libxml2 C library. Alternative to Nokogiri with similar performance characteristics.

Feature table

Moxml exercises its best effort to provide a consistent interface across basic XML features, various XML libraries have different features and capabilities.

The following table summarizes the features supported by each library.

Note
The checkmarks indicate support for the feature, while the footnotes provide additional context for specific features.
Feature Nokogiri Oga REXML LibXML Ox

HeadedOx

Parsing, serializing

Node manipulation

✅ See NOTE 1.

✅ See NOTE 1.

Basic XPath

Uses Ox-specific API locate. See NOTE 2.

✅ Full XPath 1.0. See NOTE 3.

XPath with namespaces

Note
Ox/HeadedOx: Text node replacement may fail in some cases due to internal node structure.
Note
Limited XPath support via locate() method. See adapter limitations section.
Note
HeadedOx provides full XPath 1.0 support via a pure Ruby XPath engine layered on top of Ox’s C parser. See HeadedOx documentation for details.

Adapter comparison

Feature compatibility matrix

Feature/Operation Nokogiri Oga REXML LibXML Ox HeadedOx

Core Operations

Parse XML string

✅ Full

✅ Full

✅ Full

✅ Full

✅ Full

✅ Full

Parse XML file/IO

✅ Full

✅ Full

✅ Full

✅ Full

✅ Full

✅ Full

Serialize to XML

✅ Full

✅ Full

✅ Full

✅ Full

✅ Full

✅ Full

Element Operations

Create elements

✅ Full

✅ Full

✅ Full

✅ Full

✅ Full

✅ Full

Get/set attributes

✅ Full

✅ Full

✅ Full

✅ Full

✅ Full

✅ Full

Add/remove children

✅ Full

✅ Full

✅ Full

✅ Full

✅ Full

✅ Full

Replace nodes

✅ Full

✅ Full

✅ Full

✅ Full

⚠️ Limited1

⚠️ Limited1

Namespace Operations

Add namespaces

✅ Full

✅ Full

✅ Full

✅ Full

✅ Full

✅ Full

Default namespaces

✅ Full

✅ Full

✅ Full

✅ Full

⚠️ Basic

⚠️ Basic

Namespace inheritance

✅ Full

✅ Full

✅ Full

✅ Full

❌ None

❌ None5

Namespaced attributes

✅ Full

✅ Full

✅ Full

✅ Full

⚠️ Limited

⚠️ Limited5

XPath Queries

Basic paths (//element)

✅ Full

✅ Full

✅ Full

✅ Full

✅ Full

✅ Full

Attribute predicates ([@id])

✅ Full

✅ Full

✅ Full

✅ Full

⚠️ Existence only2

✅ Full

Attribute values ([@id='123'])

✅ Full

✅ Full

✅ Full

✅ Full

❌ None3

✅ Full

Logical operators ([@a and @b])

✅ Full

✅ Full

✅ Full

✅ Full

❌ None

✅ Full

Position predicates ([1], [last()])

✅ Full

✅ Full

✅ Full

✅ Full

❌ None

✅ Full

Text predicates ([text()='x'])

✅ Full

✅ Full

✅ Full

✅ Full

❌ None

✅ Full

Namespace-aware queries

✅ Full

✅ Full

✅ Full

✅ Full

❌ None

⚠️ Basic5

Parent axis (..)

✅ Full

✅ Full

✅ Full

✅ Full

❌ None

✅ Full

Sibling axes

✅ Full

✅ Full

✅ Full

✅ Full

❌ None

❌ None5

XPath functions (count(), etc.)

✅ Full

✅ Full

✅ Full

✅ Full

❌ None

✅ All 27

Special Content

CDATA sections

✅ Full

✅ Full

✅ Full

✅ Full

✅ Full

✅ Full

Comments

✅ Full

✅ Full

✅ Full

✅ Full

✅ Full

✅ Full

Processing instructions

✅ Full

✅ Full

✅ Full

✅ Full

✅ Full

✅ Full

DOCTYPE declarations

✅ Full

✅ Full

✅ Full

⚠️ Limited4

✅ Full

✅ Full

Performance

Parse speed

Fast

Fast

Medium

Fast

Very Fast

Very Fast

Serialize speed

Fast

Fast

Medium

Medium

Very Fast

Very Fast

Memory usage

Good

Medium

Medium

Good

Excellent

Excellent

Thread safety

✅ Yes

✅ Yes

✅ Yes

✅ Yes

✅ Yes

✅ Yes

1 Ox/HeadedOx: Text node replacement may fail in some cases due to internal node structure
2 Ox: //book[@id] works (returns all book elements), but doesn’t filter by attribute existence
3 HeadedOx: Full XPath 1.0 with all 27 functions and 6 axes. Pure Ruby XPath engine on Ox’s C parser. 99.20% pass rate. See docs/headed-ox.adoc
4 Ox: Use .find { |el| el["id"] == "123" } instead of XPath attribute value predicates
5 LibXML: DOCTYPE parsing works, serialization is limited (no round-trip preservation)
6 HeadedOx limitations: Namespace introspection and 7 axes not implemented. See docs/HEADED_OX_LIMITATIONS.md

Adapter selection guide

Choose Nokogiri when:

  • You need industry-standard compatibility

  • Large community support is important

  • C extension performance is acceptable

  • Cross-platform deployment is required

Choose Oga when:

  • Pure Ruby environment is required (JRuby, TruffleRuby)

  • Best test coverage is needed (98%)

  • No C extensions are allowed

  • Memory usage is not the primary concern

Choose REXML when:

  • Standard library only (no external gems)

  • Maximum portability is required

  • Small to medium documents

  • Deployment simplicity is critical

Choose LibXML when:

  • Alternative to Nokogiri is desired

  • Full namespace support is required

  • Good performance with correctness

  • Native C extension is acceptable

Choose Ox when:

  • Maximum parsing speed is critical

  • Simple document structures (limited nesting)

  • XPath usage is minimal or absent

  • Memory efficiency is paramount

Choose HeadedOx when:

  • Need Ox’s fast parsing with full XPath support

  • Want comprehensive XPath 1.0 features (functions, predicates)

  • Prefer pure Ruby XPath implementation for debugging

  • Need more XPath capabilities than standard Ox provides

  • Memory efficiency is important but XPath features are required

Caution
Ox’s custom XPath engine supports common patterns but may not handle complex XPath expressions. Test thoroughly if your use case requires advanced XPath.

Getting started

Installation

Install the gem and at least one supported XML library:

# In your Gemfile
gem 'moxml'
gem 'nokogiri'  # Or 'oga', 'rexml', 'ox', or 'libxml-ruby'

Basic document creation

doc = Moxml.new.create_document

# Add XML declaration
doc.add_child(doc.create_declaration("1.0", "UTF-8"))

# Create root element with namespace
root = doc.create_element('book')
root.add_namespace('dc', 'http://purl.org/dc/elements/1.1/')
doc.add_child(root)

# Add content
title = doc.create_element('dc:title')
title.text = 'XML Processing with Ruby'
root.add_child(title)

# Output formatted XML
puts doc.to_xml(indent: 2)

Real-world examples

Practical, runnable examples demonstrating Moxml usage in common scenarios are available in the examples directory.

These examples include:

RSS Parser

Parse RSS/Atom feeds with XPath queries and namespace handling

Web Scraper

Extract data from HTML/XML using DOM navigation and table parsing

API Client

Build and parse XML API requests/responses with SOAP

Each example is:

  • Fully documented with detailed README

  • Self-contained and runnable

  • Demonstrates best practices

  • Includes sample data files

  • Shows comprehensive error handling

Run any example directly:

ruby examples/rss_parser/rss_parser.rb
ruby examples/web_scraper/web_scraper.rb
ruby examples/api_client/api_client.rb

See the examples README for complete documentation and learning paths.

Working with documents

Using the builder pattern

The builder pattern provides a clean DSL for creating XML documents:

doc = Moxml::Builder.new(Moxml.new).build do
  declaration version: "1.0", encoding: "UTF-8"

  element 'library', xmlns: 'http://example.org/library' do
    element 'book' do
      element 'title' do
        text 'Ruby Programming'
      end

      element 'author' do
        text 'Jane Smith'
      end

      comment 'Publication details'
      element 'published', year: '2024'

      cdata '<custom>metadata</custom>'
    end
  end
end

Direct document manipulation

doc = Moxml.new.create_document

# Add declaration
doc.add_child(doc.create_declaration("1.0", "UTF-8"))

# Create root with namespace
root = doc.create_element('library')
root.add_namespace(nil, 'http://example.org/library')
root.add_namespace("dc", "http://purl.org/dc/elements/1.1/")
doc.add_child(root)

# Add elements with attributes
book = doc.create_element('book')
book['id'] = 'b1'
book['type'] = 'technical'
root.add_child(book)

# Add mixed content
book.add_child(doc.create_comment('Book details'))
title = doc.create_element('title')
title.text = 'Ruby Programming'
book.add_child(title)

Fluent interface API

Moxml provides a fluent, chainable API for creating and manipulating XML documents with improved developer experience:

# Old way - verbose and less readable
element = doc.create_element('book')
element.add_namespace("dc", "http://purl.org/dc/elements/1.1/")
element["id"] = "123"
element["type"] = "article"
child = doc.create_element("title")
child.text = "Hello"
element.add_child(child)

# New way - fluent and chainable
element = doc.create_element('book')
  .with_namespace("dc", "http://purl.org/dc/elements/1.1/")
  .set_attributes(id: "123", type: "article")
  .with_child(doc.create_element("title").tap { |t| t.text = "Hello" })

Chainable element methods

# with_namespace - add namespace and return self
element.with_namespace("dc", "http://purl.org/dc/elements/1.1/")

# set_attributes - set multiple attributes at once
element.set_attributes(id: "123", title: "Ruby", year: "2024")

# with_child - add child and return self
element.with_child(doc.create_element("author"))

# Chain multiple operations
element
  .with_namespace("dc", "http://purl.org/dc/elements/1.1/")
  .set_attributes(id: "123", type: "technical")
  .with_child(doc.create_element("title"))
  .with_child(doc.create_element("author"))

Convenience query methods

# find_element - alias for at_xpath
first_book = doc.root.find_element("//book")

# find_all - returns array of matching elements
all_books = doc.root.find_all("//book")

# Document-level find methods
first_title = doc.find("//title")
all_titles = doc.find_all("//title")

Quick element creation

# add_element - create, configure, and add element in one call
book = doc.add_element("book", id: "123", title: "Ruby") do |elem|
  elem.text = "Ruby Programming Guide"
end

Practical fluent example

doc = Moxml.new.create_document

# Build a complete book entry with fluent API
doc.add_element("library") do |library|
  library
    .with_namespace("dc", "http://purl.org/dc/elements/1.1/")
    .with_child(
      doc.create_element("book")
        .set_attributes(id: "b1", isbn: "978-0-123456-78-9")
        .with_child(doc.create_element("dc:title").tap { |t| t.text = "Ruby Programming" })
        .with_child(doc.create_element("dc:creator").tap { |c| c.text = "Jane Smith" })
        .with_child(doc.create_element("dc:date").tap { |d| d.text = "2024" })
    )
end

puts doc.to_xml(indent: 2)

XML objects and their methods

Document object

The Document object represents an XML document and serves as the root container for all XML nodes.

# Creating a document
doc = Moxml.new.create_document
doc = Moxml.new.parse(xml_string)

# Document properties and methods
doc.encoding               # Get document encoding
doc.encoding = "UTF-8"     # Set document encoding
doc.version                # Get XML version
doc.version = "1.1"        # Set XML version
doc.standalone             # Get standalone declaration
doc.standalone = "yes"     # Set standalone declaration

# Document structure
doc.root                  # Get root element
doc.children              # Get all top-level nodes
doc.add_child(node)       # Add a child node
doc.remove_child(node)    # Remove a child node

# Node creation methods
doc.create_element(name)    # Create new element
doc.create_text(content)    # Create text node
doc.create_cdata(content)   # Create CDATA section
doc.create_comment(content) # Create comment
doc.create_processing_instruction(target, content) # Create PI

# Document querying
doc.xpath(expression)      # Find nodes by XPath
doc.at_xpath(expression)   # Find first node by XPath

# Serialization
doc.to_xml(options)        # Convert to XML string

# Convenience methods
doc.add_element(name, attributes = {}, &block) # Create, configure, and add element
doc.find(xpath)                                # Alias for at_xpath
doc.find_all(xpath)                            # Returns array of matching elements

Element object

Elements are the primary structural components of an XML document, representing tags with attributes and content.

# Element properties
element.name               # Get element name
element.name = "new_name"  # Set element name
element.text              # Get text content
element.text = "content"   # Set text content
element.inner_text        # Get text content for current node only
element.inner_xml         # Get inner XML content
element.inner_xml = xml   # Set inner XML content

# Attributes
element[name]             # Get attribute value
element[name] = value     # Set attribute value
element.attributes        # Get all attributes
element.remove_attribute(name) # Remove attribute

# Namespace handling
element.namespace         # Get element's namespace
element.namespace = ns     # Set element's namespace
element.add_namespace(prefix, uri) # Add new namespace
element.namespaces        # Get all namespace definitions

# Node structure
element.parent            # Get parent node
element.children          # Get child nodes
element.add_child(node)   # Add child node
element.remove_child(node) # Remove child node
element.add_previous_sibling(node) # Add sibling before
element.add_next_sibling(node)    # Add sibling after
element.replace(node)     # Replace with another node
element.remove           # Remove from document

# Node type checking
element.element?         # Returns true
element.text?           # Returns false
element.cdata?          # Returns false
element.comment?        # Returns false
element.processing_instruction? # Returns false

# Node querying
element.xpath(expression)  # Find nodes by XPath
element.at_xpath(expression) # Find first node by XPath

# Convenience methods
element.with_namespace(prefix, uri) # Add namespace and return self
element.set_attributes(hash)        # Set multiple attributes, returns self
element.with_child(node)            # Add child and return self
element.find_element(xpath)         # Alias for at_xpath
element.find_all(xpath)             # Returns array of matching elements

Text object

Text nodes represent character data in the XML document.

# Creating text nodes
text = doc.create_text("content")

# Text properties
text.content             # Get text content
text.content = "new"     # Set text content

# Node type checking
text.text?              # Returns true

# Structure
text.parent             # Get parent node
text.remove             # Remove from document
text.replace(node)      # Replace with another node

CDATA object

CDATA sections contain text that should not be parsed as markup.

# Creating CDATA sections
cdata = doc.create_cdata("<raw>content</raw>")

# CDATA properties
cdata.content           # Get CDATA content
cdata.content = "new"   # Set CDATA content

# Node type checking
cdata.cdata?           # Returns true

# Structure
cdata.parent           # Get parent node
cdata.remove           # Remove from document
cdata.replace(node)    # Replace with another node

Comment object

Comments contain human-readable notes in the XML document.

# Creating comments
comment = doc.create_comment("Note")

# Comment properties
comment.content         # Get comment content
comment.content = "new" # Set comment content

# Node type checking
comment.comment?        # Returns true

# Structure
comment.parent          # Get parent node
comment.remove          # Remove from document
comment.replace(node)   # Replace with another node

Processing instruction object

Processing instructions provide instructions to applications processing the XML.

# Creating processing instructions
pi = doc.create_processing_instruction("xml-stylesheet",
  'type="text/xsl" href="style.xsl"')

# PI properties
pi.target              # Get PI target
pi.target = "new"      # Set PI target
pi.content             # Get PI content
pi.content = "new"     # Set PI content

# Node type checking
pi.processing_instruction? # Returns true

# Structure
pi.parent             # Get parent node
pi.remove             # Remove from document
pi.replace(node)      # Replace with another node

Attribute object

Attributes represent name-value pairs on elements.

# Attribute properties
attr.name              # Get attribute name
attr.name = "new"      # Set attribute name
attr.value            # Get attribute value
attr.value = "new"     # Set attribute value

# Namespace handling
attr.namespace         # Get attribute's namespace
attr.namespace = ns    # Set attribute's namespace

# Node type checking
attr.attribute?        # Returns true

Namespace object

Namespaces define XML namespaces used in the document.

# Namespace properties
ns.prefix             # Get namespace prefix
ns.uri               # Get namespace URI

# Formatting
ns.to_s              # Format as xmlns declaration

# Node type checking
ns.namespace?        # Returns true

Node traversal and inspection

Each node type provides methods for traversing the document structure:

node.parent              # Get parent node
node.children            # Get child nodes
node.next_sibling        # Get next sibling
node.previous_sibling    # Get previous sibling

# Convenience accessors
node.first_child         # Get first child
node.last_child          # Get last child
node.has_children?       # Check if node has children

# Node manipulation
node.clone              # Deep copy of node
node.dup                # Alias for clone

# Query methods
node.find(xpath)        # Alias for at_xpath
node.find_all(xpath)    # Returns array of matching elements

# Type checking
node.element?          # Is it an element?
node.text?             # Is it a text node?
node.cdata?            # Is it a CDATA section?
node.comment?          # Is it a comment?
node.processing_instruction? # Is it a PI?
node.attribute?        # Is it an attribute?
node.namespace?        # Is it a namespace?

# Node information
node.document          # Get owning document

Advanced features

XPath querying and node mapping

Nokogiri, Oga, REXML, LibXML

Moxml provides efficient XPath querying by leveraging the native XML library’s implementation while maintaining consistent node mapping:

# Find all book elements
books = doc.xpath('//book')
# Returns Moxml::Element objects mapped to native nodes

# Find with namespaces
titles = doc.xpath('//dc:title',
  'dc' => 'http://purl.org/dc/elements/1.1/')

# Find first matching node
first_book = doc.at_xpath('//book')

# Chain queries
doc.xpath('//book').each do |book|
  # Each book is a mapped Moxml::Element
  title = book.at_xpath('.//title')
  puts "#{book['id']}: #{title.text}"
end

Ox

The native Ox’s query method locate resembles XPath but has a different syntax.

HeadedOx

HeadedOx provides comprehensive (but not fully) XPath 1.0 support via a pure Ruby XPath engine layered on top of Ox.

Namespace handling

# Add namespace to element
element.add_namespace('dc', 'http://purl.org/dc/elements/1.1/')

# Create element in namespace
title = doc.create_element('dc:title')
title.text = 'Document Title'

# Query with namespaces
doc.xpath('//dc:title',
  'dc' => 'http://purl.org/dc/elements/1.1/')

Accessing native implementation

While not typically needed, you can access the underlying XML library’s nodes:

# Get native node
native_node = element.native

# Get adapter being used
adapter = element.context.config.adapter

# Create from native node
element = Moxml::Element.new(native_node, context)

Error handling

Moxml provides comprehensive error classes with enhanced context and helpful hints for debugging. Each error class includes specific attributes relevant to the error type and provides detailed error messages with suggestions.

Error class hierarchy

All Moxml errors inherit from [Moxml::Error](lib/moxml/error.rb:4), which itself inherits from StandardError.

Moxml::Error (< StandardError)
├── ParseError           # XML parsing failures
├── XPathError          # XPath expression errors
├── ValidationError     # XML validation failures
├── NamespaceError      # Namespace-related errors
├── AdapterError        # Adapter loading/operation errors
├── SerializationError  # XML serialization failures
├── DocumentStructureError # Invalid document structure
├── AttributeError      # Attribute operation errors
└── NotImplementedError # Unimplemented adapter features

Enhanced error context

Each error class provides contextual information to aid debugging:

begin
  doc = context.parse(invalid_xml, strict: true)
rescue Moxml::ParseError => e
  # Enhanced parse errors include:
  puts e.line      # Line number where error occurred
  puts e.column    # Column number where error occurred
  puts e.source    # Excerpt of problematic XML
  puts e.to_s      # Full message with hints
  # Output includes helpful hint:
  # "Hint: Check XML syntax and ensure all tags are properly closed"
end

Error types and usage

ParseError

Raised when XML parsing fails. Includes line/column information when available.

begin
  doc = Moxml.new.parse("<invalid>", strict: true)
rescue Moxml::ParseError => e
  puts "Parse failed at line #{e.line}, column #{e.column}"
  puts e.to_s  # Includes hint for resolution
end

XPathError

Raised when XPath expression evaluation fails.

begin
  results = doc.xpath("//invalid[[[")
rescue Moxml::XPathError => e
  puts "Expression: #{e.expression}"
  puts "Adapter: #{e.adapter}"
  puts e.to_s  # Includes syntax verification hint
end

ValidationError

Raised when XML content violates XML specifications.

begin
  # Invalid XML version
  doc.version = "2.0"
rescue Moxml::ValidationError => e
  puts "Constraint: #{e.constraint}"  # "version"
  puts "Value: #{e.value}"            # "2.0"
  puts e.to_s  # Includes allowed values
end

NamespaceError

Raised when namespace operations fail.

begin
  element.add_namespace("ns", "invalid-uri")
rescue Moxml::NamespaceError => e
  puts "Prefix: #{e.prefix}"   # "ns"
  puts "URI: #{e.uri}"          # "invalid-uri"
  puts "Element: #{e.element}"  # Element reference
  puts e.to_s  # Includes registration hint
end

AdapterError

Raised when adapter loading or operations fail.

begin
  Moxml::Config.new.adapter = :nonexistent
rescue Moxml::AdapterError => e
  puts "Adapter: #{e.adapter_name}"      # :nonexistent
  puts "Operation: #{e.operation}"       # "set_adapter"
  puts "Native Error: #{e.native_error}" # Original error
  puts e.to_s  # Includes installation hint
end

SerializationError

Raised when XML serialization fails.

begin
  xml_output = node.to_xml
rescue Moxml::SerializationError => e
  puts "Node: #{e.node}"
  puts "Adapter: #{e.adapter}"
  puts e.to_s  # Includes structure validation hint
end

DocumentStructureError

Raised when attempting invalid document structure operations.

begin
  doc.root.add_child(invalid_node)
rescue Moxml::DocumentStructureError => e
  puts "Operation: #{e.attempted_operation}"
  puts "State: #{e.current_state}"
  puts e.to_s  # Includes XML spec reference hint
end

AttributeError

Raised when attribute operations fail.

begin
  element["123invalid"] = "value"  # Invalid attribute name
rescue Moxml::AttributeError => e
  puts "Attribute: #{e.attribute_name}"
  puts "Element: #{e.element}"
  puts "Value: #{e.value}"
  puts e.to_s  # Includes naming rules hint
end

NotImplementedError

Raised when an adapter doesn’t support a requested feature.

begin
  # Some operation not supported by current adapter
  result = adapter.unsupported_method
rescue Moxml::NotImplementedError => e
  puts "Feature: #{e.feature}"
  puts "Adapter: #{e.adapter}"
  puts e.to_s  # Includes adapter capability hint
end

Best practices for error handling

# Catch specific errors for targeted handling
begin
  doc = Moxml.new.parse(xml_string, strict: true)
  results = doc.xpath("//book[@id='123']")
rescue Moxml::ParseError => e
  # Handle parsing errors
  logger.error("XML parsing failed: #{e.to_s}")
  # e.to_s includes hints for fixing the issue
rescue Moxml::XPathError => e
  # Handle XPath errors
  logger.error("XPath query failed: #{e.expression}")
rescue Moxml::NamespaceError => e
  # Handle namespace errors
  logger.error("Namespace error: #{e.prefix}:#{e.uri}")
rescue Moxml::Error => e
  # Catch-all for other Moxml errors
  logger.error("XML processing error: #{e.message}")
end

All error messages include helpful hints for resolving common issues. Use the [to_s](lib/moxml/error.rb:16) method to get the full error message with context and hints.

Configuration

General

Moxml can be configured globally or per instance.

# Global configuration
Moxml.configure do |config|
  config.default_adapter = :nokogiri
  config.strict = true
  config.encoding = 'UTF-8'
end

# Instance configuration
moxml = Moxml.new do |config|
  config.adapter = :oga
  config.strict = false
end

Default adapter selection

To select a non-default adapter, set it before processing any input using the following syntax.

Moxml::Config.default_adapter = <adapter-symbol>

Where, <adapter-symbol> is one of the following:

:rexml

REXML

:nokogiri

Nokogiri (default)

:oga

Oga

:ox

Ox

:libxml

LibXML

:headed_ox

HeadedOx (Ox parser + full XPath engine)

Thread safety

Moxml is thread-safe when used properly. Each instance maintains its own state and can be used safely in concurrent operations:

class XmlProcessor
  def initialize
    @mutex = Mutex.new
    @context = Moxml.new
  end

  def process(xml)
    @mutex.synchronize do
      doc = @context.parse(xml)
      # Modify document
      doc.to_xml
    end
  end
end

Performance considerations

Memory management

Moxml maintains a node registry to ensure consistent object mapping:

doc = context.parse(large_xml)
# Process document
doc = nil  # Allow garbage collection of document and registry
GC.start   # Force garbage collection if needed

Efficient querying

Use specific XPath expressions for better performance:

# More efficient - specific path
doc.xpath('//book/title')

# Less efficient - requires full document scan
doc.xpath('//title')

# Most efficient - direct child access
root.xpath('./*/title')

Best practices

Document creation

# Preferred - using builder pattern
doc = Moxml::Builder.new(Moxml.new).build do
  declaration version: "1.0", encoding: "UTF-8"
  element 'root' do
    element 'child' do
      text 'content'
    end
  end
end

# Alternative - direct manipulation
doc = Moxml.new.create_document
doc.add_child(doc.create_declaration("1.0", "UTF-8"))
root = doc.create_element('root')
doc.add_child(root)

Node manipulation

# Preferred - chainable operations
element
  .add_namespace('dc', 'http://purl.org/dc/elements/1.1/')
  .add_child(doc.create_text('content'))

# Preferred - clear node type checking
if node.element?
  node.add_namespace('dc', 'http://purl.org/dc/elements/1.1/')
  node.add_child(doc.create_text('content'))
end

Specific adapter limitations

Ox adapter

XPath limitations

The Ox adapter uses a custom "XPath-to-locate" translation engine.

The following XPath features are NOT supported:

  • Attribute value predicates: //book[@id='123']

  • Logical operators: //book[@id and @title]

  • Position predicates: //book[1], //book[last()]

  • Text predicates: //book[text()='Title']

  • Namespace queries: //ns:element

  • Parent axis: //child/..

  • Sibling axes: following-sibling::*

  • XPath functions: count(), concat(), etc. ❌

Workaround: Use Ruby enumerable methods after basic queries:

# Instead of: doc.xpath("//book[@id='123']")
# Use:
doc.xpath("//book").find { |book| book["id"] == "123" }
Important
For complete XPath 1.0 specification with zero limitations today, use Nokogiri or Oga adapters.

HeadedOx adapter

General

The HeadedOx adapter combines Ox’s fast C-based XML parsing with Moxml’s comprehensive pure Ruby XPath 1.0 engine.

HeadedOx provides full XPath 1.0 functionality through a pure Ruby XPath engine layered on top of Ox’s fast C parser, allowing comprehensive XPath queries unhampered by the locate() method of the default Ox implementation.

Note
Trivia: the "Headed Ox" implementation allows the Ox to head in the right direction to find the desired nodes through its comprehensive XPath layer.
Note
The HeadedOx adapter is added in v0.2.0.

For complete architectural details and implementation guide, see HeadedOx Documentation.

# Use HeadedOx adapter
context = Moxml.new(:headed_ox)
doc = context.parse(xml_string)

# Full XPath 1.0 support - All 27 functions work
books = doc.xpath('//book[@price < 20]')
count = doc.xpath('count(//book)')
titles = doc.xpath('//book/title[contains(., "Ruby")]')
cheap = doc.xpath('//book[@price <= sum(//book/@price) div count(//book)]')
Important
For complete XPath 1.0 specification with zero limitations today, use Nokogiri or Oga adapters.

Features

  • Fast XML parsing (Ox C extension) - Same speed as standard Ox

  • 6 of 13 XPath axes (46% - covers 80% of common usage patterns)

  • Complex XPath predicates with numeric/string/boolean expressions

  • Basic namespace-aware XPath queries (Ox namespace limitations apply)

  • Expression compilation and caching (1000-entry LRU cache)

  • Document construction and serialization through Ox

Architecture

HeadedOx is a hybrid adapter that layers Moxml’s pure Ruby XPath engine on top of Ox’s fast C parser:

Architecture layers of HeadedOx
┌─────────────────────────────────────────┐
│     Moxml Unified API                   │
│  (Document, Element, Node, Builder)     │
└──────────────┬──────────────────────────┘
               │
┌──────────────▼──────────────────────────┐
│     HeadedOx Adapter Layer              │
│  (Delegates to Ox + XPath Engine)       │
└──────────────┬──────────────────────────┘
               │
      ┌────────┴─────────┐
      ├───────────┐      │
┌─────▼────┐ ┌────▼──────▼─────────────┐
│  Ox Gem  │ │ Moxml XPath Engine      │
│ (C Parse)│ │ (Pure Ruby)             │
└──────────┘ │  • Lexer (Tokenize)     │
             │  • Parser (AST Build)   │
             │  • Compiler (Ruby Gen)  │
             │  • Cache (1000 entries) │
             └─────────────────────────┘

Known limitations

The following 16 test failures represent architectural boundaries in the Ox gem, not bugs in HeadedOx:

  • ✗ Attribute wildcard syntax (@*) - Ox API limitation

  • ✗ Namespace introspection methods - Ox doesn’t expose namespace data

  • ✗ Parent node setter - Ox C struct immutability

  • ✗ CDATA end marker escaping - Complex nested ]]> sequences

  • ✗ Complex namespace inheritance - Ox parses but doesn’t track

  • ✗ Namespaced attribute access - element["ns:attr"] pattern

Important
These are Ox limitations, not HeadedOx bugs.
  • Detailed analysis of each limitation with examples

  • Workarounds and alternative approaches

  • Exact Ox API enhancements required for full compatibility

  • When to use HeadedOx vs other adapters decision guide

  • Future roadmap if Ox adds namespace introspection API

When to Use HeadedOx

You can use HeadedOx instead of Ox for all XML parsing needs, except when certain advanced XPath features are required.

  • Need fast parsing + comprehensive XPath beyond Ox’s locate()

  • XPath functions are critical (count, sum, contains, substring, etc.)

  • Complex predicates required ([@price < average], [position() = last()])

  • Prefer pure Ruby XPath for debugging and customization

  • Basic namespace queries are sufficient

  • Document structure is mostly read-only

  • Performance matters but XPath features are non-negotiable

When not to use HeadedOx:

  • Need all 13 XPath axes (especially ancestor, sibling, following/preceding)

  • Advanced namespace operations required (introspection, complex inheritance)

  • Complex DOM modifications needed (parent node mutation)

  • CDATA escaping for nested markers is critical

  • Full Nokogiri feature parity required

XPath capabilities

Category XPath 1.0 Support Details

Functions

All XPath 1.0 standard functions fully implemented and tested: String (10), Numeric (6), Boolean (4), Node (4), Position (2), Special (1)

Axes

6/13 axes (46%)

✓ Implemented: child, self, parent, descendant, descendant-or-self (//), attribute (@)

✗ Missing: ancestor, sibling families, following/preceding families, namespace Coverage: 80% of real-world XPath usage patterns

Operators

All comparison (=, !=, <, >, ⇐, >=), arithmetic (+, -, *, div, mod), logical (and, or), and union (|) operators

Predicates

✅ of Core

Position predicates [1], [last()], boolean predicates, operator predicates, complex nested expressions

Parsing

✅ Complete

Uses Ox’s C parser for maximum speed - fastest of all adapters

Caching

✅ LRU Cache

1000-entry cache for compiled XPath expressions - significant performance boost for repeated queries

What XPath queries work in HeadedOx

Note
This table is of v0.2.0.

The following XPath patterns are fully functional:

# Descendant searches
doc.xpath('//book')                        # ✅ Works
doc.xpath('//book/title')                  # ✅ Works

# Attribute selection
doc.xpath('//book/@price')                 # ✅ Works
doc.xpath('//@price')                      # ✅ Works

# Predicates with operators
doc.xpath('//book[@price < 20]')           # ✅ Works
doc.xpath('//book[1]')                     # ✅ Works (position)
doc.xpath('//book[last()]')                # ✅ Works (last position)
doc.xpath('//book[@price=10 or @price=30]')  # ✅ Works (logical)

# All 27 XPath 1.0 functions
doc.xpath('count(//book)')                           # ✅ Returns Float
doc.xpath('sum(//book/@price)')                      # ✅ Returns Float
doc.xpath('string(//title[1])')                      # ✅ Returns String
doc.xpath('concat("Price: ", //book/@price)')        # ✅ String concatenation
doc.xpath('contains(//title, "Ruby")')               # ✅ Boolean search
doc.xpath('substring(//title, 1, 5)')                # ✅ String extraction
doc.xpath('normalize-space(//title)')                # ✅ Whitespace handling
doc.xpath('boolean(//book[@price])')                 # ✅ Boolean conversion
doc.xpath('floor(//book/@price)')                    # ✅ Numeric rounding
doc.xpath('starts-with(//title, "Ruby")')            # ✅ Prefix checking

# Complex queries with function composition
doc.xpath('//book[@price < 25]/title')                # ✅ Chained paths
doc.xpath('//book[contains(title, "Ruby")]')          # ✅ Functions in predicates
doc.xpath('//book[position() = last()]')              # ✅ Position functions
doc.xpath('//book[string-length(title) > 10]')        # ✅ String functions
doc.xpath('//book[@price < sum(//book/@price) div count(//book)]') # ✅ Complex arithmetic

LibXML adapter

DOCTYPE Limitations:

  • DOCTYPE parsing works

  • DOCTYPE round-trip preservation is limited

  • DOCTYPE cannot be reliably re-serialized after parsing

Performance:

  • Serialization speed: ~120 ips (slower than target)

  • Parsing speed: Good

  • For high-throughput serialization, consider Ox or Nokogiri

Other adapters

Nokogiri, Oga, REXML:

All three adapters have near-complete feature support with only minor edge case limitations. Use these adapters when you need full XPath and namespace support.

Development and testing

Skipping benchmarks

Benchmark tests can be slow and are not needed for regular test runs. To speed up local development, you can skip benchmark tests using the SKIP_BENCHMARKS environment variable.

Syntax:

SKIP_BENCHMARKS=1 bundle exec rspec

This will skip all benchmark tests while running the regular test suite.

To run benchmarks explicitly:

bundle exec rspec spec/moxml/examples/xpath_benchmark_spec.rb

Or use the rake task:

rake benchmark:xpath
Note
The rake benchmark:xpath task always runs benchmarks regardless of the SKIP_BENCHMARKS environment variable setting.

Running tests with coverage

To run the test suite with code coverage tracking:

COVERAGE=true bundle exec rspec

After running, view the coverage report:

open coverage/index.html

The coverage configuration includes:

  • Minimum overall coverage: 90%

  • Minimum per-file coverage: 80%

  • Organized groups for Core, Adapters, and Utilities

  • Filters for spec and vendor directories

Generating performance benchmark reports

Moxml provides a comprehensive benchmark reporting system that measures and compares all adapters across multiple dimensions.

Running the benchmark report

To generate a complete performance report for all adapters:

rake benchmark:report

Or run the script directly:

bundle exec ruby benchmarks/generate_report.rb

This will benchmark all available adapters and generate a detailed report at [benchmarks/PERFORMANCE_REPORT.md](benchmarks/PERFORMANCE_REPORT.md).

Benchmark categories

The report includes the following benchmark categories:

Parsing benchmarks:

  • Simple XML (< 1KB)

  • Medium XML (10KB, 50 elements with namespaces)

  • Large XML (145KB, 500 elements)

  • Complex nested structures

Serialization benchmarks:

  • Simple documents

  • Documents with namespaces

  • Documents with mixed content

XPath benchmarks:

  • Simple queries (//element)

  • Complex queries with predicates (//element[@attribute])

  • Namespace-aware queries (//ns:element)

Memory benchmarks:

  • Memory usage per document parse

  • Memory usage for large documents

Report contents

The generated report includes:

  • Summary table comparing all adapters with grades

  • Detailed performance metrics for each benchmark

  • ASCII performance visualization charts

  • Adapter selection recommendations based on results

  • Complete test environment details (Ruby version, platform, gem versions)

  • Best performers for each category

Viewing the report

After generation, the report is available at:

cat benchmarks/PERFORMANCE_REPORT.md

Or open it in your preferred Markdown viewer.

Note
The generated report is machine-specific and excluded from git via .gitignore. Results will vary based on your hardware, OS, and Ruby version.

Example output

The summary table shows comparative performance:

| Adapter | Parse (ips) | Serialize (ips) | XPath (ips) | Memory (MB) | Grade |
|---------|-------------|-----------------|-------------|-------------|-------|
| Nokogiri | 76 | 13900 | 64958 | -0.1 ⭐ | A |
| Ox | 289 | 39203 | 9640 | 0.0 ⭐⭐⭐⭐⭐ | A+ |
...

Performance visualizations help quickly identify the best adapter for specific needs:

Parsing (Medium XML):
  Nokogiri   █████████████ 76 ips
  Ox         ██████████████████████████████████████████████████ 289 ips
...

Contributing

  1. Fork the repository

  2. Create your feature branch (git checkout -b feature/my-new-feature)

  3. Commit your changes (git commit -am 'Add some feature')

  4. Push to the branch (git push origin feature/my-new-feature)

  5. Create a new Pull Request

License

Copyright Ribose.

This project is licensed under the BSD-2-Clause License. See the LICENSE.md file for details.