Moxml: Modern XML processing for Ruby

Contents

Introduction and purpose
Supported XML libraries
- General
- Feature table
Adapter comparison
- Feature compatibility matrix
- Adapter selection guide
Getting started
- Installation
- Basic document creation
Real-world examples
Working with documents
- Using the builder pattern
- Direct document manipulation
- Fluent interface API
- SAX (Event-Driven) Parsing
XML objects and their methods
- Node identity
Advanced features
- XPath querying
- Namespace handling
Error handling
Configuration
Thread safety
Performance considerations
Best practices
Specific adapter limitations
- Ox adapter
- HeadedOx adapter
  - LibXML adapter
- Other adapters
Development and testing
Contributing
License

Introduction and purpose

Moxml provides a unified, modern XML processing interface for Ruby applications. It offers a consistent API that abstracts away the underlying XML implementation details while maintaining high performance through efficient node mapping and native XPath querying.

Key features:

Intuitive, Ruby-idiomatic API for XML manipulation
Consistent interface across different XML libraries
Efficient node mapping for XPath queries
Support for all XML node types and features
Easy switching between XML processing engines
Clean separation between interface and implementation

Supported XML libraries

General

Moxml supports the following XML libraries:

REXML: REXML, a pure Ruby XML parser distributed with standard Ruby. Not the fastest, but always available.
Nokogiri: (default) Nokogiri, a widely used implementation which wraps around the performant libxml2 C library.
Oga: Oga, a pure Ruby XML parser. Recommended when you need a pure Ruby solution say for Opal.
Ox: Ox, a fast XML parser.
LibXML: libxml-ruby, Ruby bindings for the performant libxml2 C library. Alternative to Nokogiri with similar performance characteristics.

Feature table

Moxml exercises its best effort to provide a consistent interface across basic XML features, various XML libraries have different features and capabilities.

The following table summarizes the features supported by each library.

Note	The checkmarks indicate support for the feature, while the footnotes provide additional context for specific features.

Feature	Nokogiri	Oga	REXML	LibXML	Ox
HeadedOx	Parsing, serializing	✅	✅	✅	✅
✅	✅	SAX parsing	✅ Full (10/10 events)	✅ Full (10/10 events)	✅ Full (10/10 events)
✅ Full (10/10 events)	⚠️ Core (4/10 events) See NOTE 7.	⚠️ Core (4/10 events) See NOTE 7.	Node manipulation	✅	✅
✅	✅	✅ See NOTE 1.	✅ See NOTE 1.	Basic XPath	✅
✅	✅	✅	Uses Ox-specific API `locate`. See NOTE 2.	✅ Full XPath 1.0. See NOTE 3.	XPath with namespaces
✅	✅	❌	✅	Uses Ox-specific API `locate`. See NOTE 2.	⚠️ Basic. See NOTE 3.

Note	Ox/HeadedOx: Text node replacement may fail in some cases due to internal node structure.

Note	Limited XPath support via `locate()` method. See adapter limitations section.

Note	HeadedOx provides full XPath 1.0 support via a pure Ruby XPath engine layered on top of Ox’s C parser. See HeadedOx documentation for details.

Note	Ox/HeadedOx SAX: Only core events supported (start_element, end_element, characters, errors). No separate CDATA, comment, or processing instruction events.

Adapter comparison

Feature compatibility matrix

Feature/Operation	Nokogiri	Oga	REXML	LibXML	Ox	HeadedOx
Core Operations
Parse XML string	✅ Full	✅ Full	✅ Full	✅ Full	✅ Full	✅ Full
Parse XML file/IO	✅ Full	✅ Full	✅ Full	✅ Full	✅ Full	✅ Full
Serialize to XML	✅ Full	✅ Full	✅ Full	✅ Full	✅ Full	✅ Full
Element Operations
Create elements	✅ Full	✅ Full	✅ Full	✅ Full	✅ Full	✅ Full
Get/set attributes	✅ Full	✅ Full	✅ Full	✅ Full	✅ Full	✅ Full
Add/remove children	✅ Full	✅ Full	✅ Full	✅ Full	✅ Full	✅ Full
Replace nodes	✅ Full	✅ Full	✅ Full	✅ Full	⚠️ Limited¹	⚠️ Limited¹
Namespace Operations
Add namespaces	✅ Full	✅ Full	✅ Full	✅ Full	✅ Full	✅ Full
Default namespaces	✅ Full	✅ Full	✅ Full	✅ Full	⚠️ Basic	⚠️ Basic
Namespace inheritance	✅ Full	✅ Full	✅ Full	✅ Full	❌ None	❌ None⁵
Namespaced attributes	✅ Full	✅ Full	✅ Full	✅ Full	⚠️ Limited	⚠️ Limited⁵
XPath Queries
Basic paths (`//element`)	✅ Full	✅ Full	✅ Full	✅ Full	✅ Full	✅ Full
Attribute predicates (`[@id]`)	✅ Full	✅ Full	✅ Full	✅ Full	⚠️ Existence only²	✅ Full
Attribute values (`[@id='123']`)	✅ Full	✅ Full	✅ Full	✅ Full	❌ None³	✅ Full
Logical operators (`[@a and @b]`)	✅ Full	✅ Full	✅ Full	✅ Full	❌ None	✅ Full
Position predicates (`[1]`, `[last()]`)	✅ Full	✅ Full	✅ Full	✅ Full	❌ None	✅ Full
Text predicates (`[text()='x']`)	✅ Full	✅ Full	✅ Full	✅ Full	❌ None	✅ Full
Namespace-aware queries	✅ Full	✅ Full	✅ Full	✅ Full	❌ None	⚠️ Basic⁵
Parent axis (`..`)	✅ Full	✅ Full	✅ Full	✅ Full	❌ None	✅ Full
Sibling axes	✅ Full	✅ Full	✅ Full	✅ Full	❌ None	❌ None⁵
XPath functions (`count()`, etc.)	✅ Full	✅ Full	✅ Full	✅ Full	❌ None	✅ All 27
Special Content
CDATA sections	✅ Full	✅ Full	✅ Full	✅ Full	✅ Full	✅ Full
Comments	✅ Full	✅ Full	✅ Full	✅ Full	✅ Full	✅ Full
Processing instructions	✅ Full	✅ Full	✅ Full	✅ Full	✅ Full	✅ Full
DOCTYPE declarations	✅ Full	✅ Full	✅ Full	✅ Full	✅ Full	✅ Full
Performance
Parse speed	Fast	Fast	Medium	Fast	Very Fast	Very Fast
Serialize speed	Fast	Fast	Medium	Medium	Very Fast	Very Fast
Memory usage	Good	Medium	Medium	Good	Excellent	Excellent
Thread safety	✅ Yes	✅ Yes	✅ Yes	✅ Yes	✅ Yes	✅ Yes

+ ¹ Ox/HeadedOx: Text node replacement may fail in some cases due to internal node structure
² Ox: //book[@id] works (returns all book elements), but doesn’t filter by attribute existence
³ HeadedOx: Full XPath 1.0 with all 27 functions and 6 axes. Pure Ruby XPath engine on Ox’s C parser. 99.20% pass rate. See docs/headed-ox.adoc
⁴ Ox: Use .find { |el| el["id"] == "123" } instead of XPath attribute value predicates
⁵ HeadedOx limitations: Namespace introspection and 7 axes not implemented. See docs/HEADED_OX_LIMITATIONS.md

Adapter selection guide

Choose Nokogiri when:

You need industry-standard compatibility
Large community support is important
C extension performance is acceptable
Cross-platform deployment is required

Choose Oga when:

Pure Ruby environment is required (JRuby, TruffleRuby)
Best test coverage is needed (98%)
No C extensions are allowed
Memory usage is not the primary concern

Choose REXML when:

Standard library only (no external gems)
Maximum portability is required
Small to medium documents
Deployment simplicity is critical

Choose LibXML when:

Alternative to Nokogiri is desired
Full namespace support is required
Good performance with correctness
Native C extension is acceptable

Choose Ox when:

Maximum parsing speed is critical
Simple document structures (limited nesting)
XPath usage is minimal or absent
Memory efficiency is paramount

Choose HeadedOx when:

Need Ox’s fast parsing with full XPath support
Want comprehensive XPath 1.0 features (functions, predicates)
Prefer pure Ruby XPath implementation for debugging
Need more XPath capabilities than standard Ox provides
Memory efficiency is important but XPath features are required

Caution

Ox’s custom XPath engine supports common patterns but cannot handle complex XPath expressions. Test thoroughly if your use case requires advanced XPath.

TODO: We should throw errors when unsupported XPath features are used with Ox or HeadedOx to prevent silent failures.

Getting started

Installation

Install the gem and at least one supported XML library:

# In your Gemfile
gem 'moxml'
gem 'nokogiri'  # Or 'oga', 'rexml', 'ox', or 'libxml-ruby'

Basic document creation

doc = Moxml.new.create_document

# Add XML declaration
doc.add_child(doc.create_declaration("1.0", "UTF-8"))

# Create root element with namespace
root = doc.create_element('book')
root.add_namespace('dc', 'http://purl.org/dc/elements/1.1/')
doc.add_child(root)

# Add content
title = doc.create_element('dc:title')
title.text = 'XML Processing with Ruby'
root.add_child(title)

# Output formatted XML
puts doc.to_xml(indent: 2)

Real-world examples

Practical, runnable examples demonstrating Moxml usage in common scenarios are available in the examples directory.

These examples include:

RSS Parser: Parse RSS/Atom feeds with XPath queries and namespace handling
Web Scraper: Extract data from HTML/XML using DOM navigation and table parsing
API Client: Build and parse XML API requests/responses with SOAP

Each example is:

Fully documented with detailed README
Self-contained and runnable
Demonstrates best practices
Includes sample data files
Shows comprehensive error handling

Run any example directly:

ruby examples/rss_parser/rss_parser.rb
ruby examples/web_scraper/web_scraper.rb
ruby examples/api_client/api_client.rb

See the examples README for complete documentation and learning paths.

Working with documents

Using the builder pattern

The builder pattern provides a clean DSL for creating XML documents:

doc = Moxml::Builder.new(Moxml.new).build do
  declaration version: "1.0", encoding: "UTF-8"

  element 'library', xmlns: 'http://example.org/library' do
    element 'book' do
      element 'title' do
        text 'Ruby Programming'
      end

      element 'author' do
        text 'Jane Smith'
      end

      comment 'Publication details'
      element 'published', year: '2024'

      cdata '<custom>metadata</custom>'
    end
  end
end

Direct document manipulation

doc = Moxml.new.create_document

# Add declaration
doc.add_child(doc.create_declaration("1.0", "UTF-8"))

# Create root with namespace
root = doc.create_element('library')
root.add_namespace(nil, 'http://example.org/library')
root.add_namespace("dc", "http://purl.org/dc/elements/1.1/")
doc.add_child(root)

# Add elements with attributes
book = doc.create_element('book')
book['id'] = 'b1'
book['type'] = 'technical'
root.add_child(book)

# Add mixed content
book.add_child(doc.create_comment('Book details'))
title = doc.create_element('title')
title.text = 'Ruby Programming'
book.add_child(title)

Fluent interface API

Moxml provides a fluent, chainable API for improved developer experience:

element = doc.create_element('book')
  .set_attributes(id: "123", type: "technical")
  .with_namespace("dc", "http://purl.org/dc/elements/1.1/")
  .with_child(doc.create_element("title"))

For complete fluent API documentation including all chainable methods, convenience methods, and practical examples, see Working with Documents Guide.

SAX (Event-Driven) Parsing

SAX (Simple API for XML) provides memory-efficient, event-driven XML parsing for large documents.

When to use SAX:

Processing very large XML files (>100MB)
Memory-constrained environments
Streaming data extraction
Need to process data as it arrives

Quick example:

class BookExtractor < Moxml::SAX::ElementHandler
  attr_reader :books

  def initialize
    super
    @books = []
  end

  def on_start_element(name, attributes = {}, namespaces = {})
    super
    @books << { id: attributes["id"] } if name == "book"
  end
end

handler = BookExtractor.new
Moxml.new.sax_parse(xml_string, handler)
puts handler.books.inspect

For complete SAX documentation including all handler types, event methods, adapter support, and best practices, see SAX Parsing Guide.

XML objects and their methods

For complete node API reference including traversal methods, manipulation, queries, type checking, and node information, see Node API Reference.

Node identity

Moxml provides a consistent #identifier method across all node types to safely identify nodes:

element = doc.at_xpath("//book")
puts element.identifier  # => "book"

attr = element.attribute("id")
puts attr.identifier     # => "id"

The #identifier method returns the primary identifier for each node type (tag name for elements, attribute name for attributes, target for processing instructions, or nil for content nodes).

Important

Always use type-safe patterns when working with mixed node types. See the Node API Consistency Guide for complete documentation on safe coding patterns, API surface by node type, and migration guidelines.

Advanced features

XPath querying

Moxml provides efficient XPath querying with consistent node mapping:

# Find all book elements
books = doc.xpath('//book')

# Find with namespaces
titles = doc.xpath('//dc:title', 'dc' => 'http://purl.org/dc/elements/1.1/')

# Find first matching node
first_book = doc.at_xpath('//book')

Namespace handling

# Add namespace to element
element.add_namespace('dc', 'http://purl.org/dc/elements/1.1/')

# Create element in namespace
title = doc.create_element('dc:title')

For complete documentation on XPath querying, namespace handling, and accessing native implementations, see Advanced Features Guide.

Error handling

Moxml provides comprehensive error classes with enhanced context for debugging:

begin
  doc = Moxml.new.parse(xml_string, strict: true)
  results = doc.xpath("//book[@id='123']")
rescue Moxml::ParseError => e
  puts "Parse failed at line #{e.line}: #{e.message}"
rescue Moxml::XPathError => e
  puts "XPath error: #{e.expression}"
rescue Moxml::Error => e
  puts "XML processing error: #{e.message}"
end

For complete error class hierarchy, error types, best practices, and debugging techniques, see Error Handling Guide.

Configuration

Moxml can be configured globally or per instance:

# Global configuration
Moxml.configure do |config|
  config.default_adapter = :nokogiri
  config.strict = true
  config.encoding = 'UTF-8'
end

# Instance configuration
context = Moxml.new do |config|
  config.adapter = :oga
  config.strict = false
end

For all configuration options, adapter selection, serialization options, and environment-based configuration, see Configuration Guide.

Thread safety

For complete information on thread-safe patterns, context management, and concurrent processing, see the Thread Safety Guide.

Performance considerations

For detailed performance optimization strategies, memory management best practices, and efficient querying patterns, see the Performance Considerations Guide.

Best practices

For comprehensive best practices covering XPath queries, adapter selection, error handling, namespace handling, memory management, thread safety, performance optimization, and testing strategies, see Best Practices Guide.

Specific adapter limitations

Ox adapter

The Ox adapter provides maximum parsing speed but has XPath limitations.

XPath limitations:

No attribute value predicates: //book[@id='123'] ❌
No logical operators, position predicates, text predicates ❌
No namespace queries, parent axis, sibling axes ❌
No XPath functions ❌

Workaround: Use Ruby enumerable methods:

# Instead of: doc.xpath("//book[@id='123']")
doc.xpath("//book").find { |book| book["id"] == "123" }

For complete Ox adapter documentation including all limitations and workarounds, see Ox Adapter Guide.

HeadedOx adapter

The HeadedOx adapter combines Ox’s fast C-based XML parsing with Moxml’s comprehensive pure Ruby XPath 1.0 engine.

Status: Production-ready v1.2 (99.20% pass rate, 1,992/2,008 tests)

Key features:

Fast XML parsing (Ox C extension)
All 27 XPath 1.0 functions
6 XPath axes (child, descendant, parent, attribute, self, descendant-or-self)
Expression caching for performance
Pure Ruby XPath engine (debuggable)

When to use:

Need Ox’s fast parsing with comprehensive XPath
Want XPath functions (count, sum, contains, etc.)
Prefer pure Ruby XPath for debugging
Basic namespace queries are sufficient

# Use HeadedOx adapter
context = Moxml.new(:headed_ox)
doc = context.parse(xml_string)

# Full XPath 1.0 support
books = doc.xpath('//book[@price < 20]')
count = doc.xpath('count(//book)')
titles = doc.xpath('//book/title[contains(., "Ruby")]')

For complete HeadedOx documentation including architecture, XPath capabilities, known limitations, and usage examples, see HeadedOx Adapter Guide and Limitations Documentation.

LibXML adapter

Performance:

Serialization speed: ~120 ips (slower than target)
Parsing speed: Good
For high-throughput serialization, consider Ox or Nokogiri

Other adapters

Nokogiri, Oga, REXML:

All three adapters have near-complete feature support with only minor edge case limitations. Use these adapters when you need full XPath and namespace support.

Development and testing

For complete information on development setup, testing strategies, benchmarking, and coverage reporting, see the Development and Testing Guide.

Contributing

Fork the repository
Create your feature branch (git checkout -b feature/my-new-feature)
Commit your changes (git commit -am 'Add some feature')
Push to the branch (git push origin feature/my-new-feature)
Create a new Pull Request

License

This project is licensed under the Ribose 3-Clause BSD License. See the LICENSE.md file for details.