Moxml: Modern XML processing for Ruby
- Introduction and purpose
- Supported XML libraries
- Feature table
- Adapter comparison
- Feature compatibility matrix
- Adapter selection guide
- Getting started
- Installation
- Basic document creation
- Real-world examples
- Working with documents
- Using the builder pattern
- Direct document manipulation
- Fluent interface API
- Chainable element methods
- Convenience query methods
- Quick element creation
- Practical fluent example
- XML objects and their methods
- Document object
- Element object
- Text object
- CDATA object
- Comment object
- Processing instruction object
- Attribute object
- Namespace object
- Node traversal and inspection
- Advanced features
- XPath querying and node mapping
- Nokogiri, Oga, REXML, LibXML
- Ox
- HeadedOx
- Namespace handling
- Accessing native implementation
- XPath querying and node mapping
- Error handling
- Error class hierarchy
- Enhanced error context
- Error types and usage
- ParseError
- XPathError
- ValidationError
- NamespaceError
- AdapterError
- SerializationError
- DocumentStructureError
- AttributeError
- NotImplementedError
- Best practices for error handling
- Configuration
- General
- Default adapter selection
- Thread safety
- Performance considerations
- Memory management
- Efficient querying
- Best practices
- Document creation
- Node manipulation
- Specific adapter limitations
- Ox adapter
- XPath limitations
- HeadedOx adapter
- General
- Features
- Architecture
- Known limitations
- When to Use HeadedOx
- XPath capabilities
- What XPath queries work in HeadedOx
- LibXML adapter
- Other adapters
- Ox adapter
- Development and testing
- Skipping benchmarks
- Running tests with coverage
- Generating performance benchmark reports
- Running the benchmark report
- Benchmark categories
- Report contents
- Viewing the report
- Example output
- Contributing
- License
Introduction and purpose
Moxml provides a unified, modern XML processing interface for Ruby applications. It offers a consistent API that abstracts away the underlying XML implementation details while maintaining high performance through efficient node mapping and native XPath querying.
Key features:
-
Intuitive, Ruby-idiomatic API for XML manipulation
-
Consistent interface across different XML libraries
-
Efficient node mapping for XPath queries
-
Support for all XML node types and features
-
Easy switching between XML processing engines
-
Clean separation between interface and implementation
Supported XML libraries
Moxml supports the following XML libraries:
- REXML
-
REXML, a pure Ruby XML parser distributed with standard Ruby. Not the fastest, but always available.
- Nokogiri
-
(default) Nokogiri, a widely used implementation which wraps around the performant libxml2 C library.
- Oga
-
Oga, a pure Ruby XML parser. Recommended when you need a pure Ruby solution say for Opal.
- Ox
-
Ox, a fast XML parser.
- LibXML
-
libxml-ruby, Ruby bindings for the performant libxml2 C library. Alternative to Nokogiri with similar performance characteristics.
Feature table
Moxml exercises its best effort to provide a consistent interface across basic XML features, various XML libraries have different features and capabilities.
The following table summarizes the features supported by each library.
|
Note
|
The checkmarks indicate support for the feature, while the footnotes provide additional context for specific features. |
| Feature | Nokogiri | Oga | REXML | LibXML | Ox |
|---|---|---|---|---|---|
HeadedOx |
Parsing, serializing |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
Node manipulation |
✅ |
✅ |
✅ |
✅ |
✅ See NOTE 1. |
✅ See NOTE 1. |
Basic XPath |
✅ |
✅ |
✅ |
✅ |
Uses Ox-specific API |
✅ Full XPath 1.0. See NOTE 3. |
XPath with namespaces |
✅ |
|
Note
|
Ox/HeadedOx: Text node replacement may fail in some cases due to internal node structure. |
|
Note
|
Limited XPath support via locate() method. See adapter limitations
section.
|
|
Note
|
HeadedOx provides full XPath 1.0 support via a pure Ruby XPath engine layered on top of Ox’s C parser. See HeadedOx documentation for details. |
Adapter comparison
Feature compatibility matrix
| Feature/Operation | Nokogiri | Oga | REXML | LibXML | Ox | HeadedOx |
|---|---|---|---|---|---|---|
Core Operations |
||||||
Parse XML string |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
Parse XML file/IO |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
Serialize to XML |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
Element Operations |
||||||
Create elements |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
Get/set attributes |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
Add/remove children |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
Replace nodes |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
⚠️ Limited1 |
⚠️ Limited1 |
Namespace Operations |
||||||
Add namespaces |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
Default namespaces |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
⚠️ Basic |
⚠️ Basic |
Namespace inheritance |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
❌ None |
❌ None5 |
Namespaced attributes |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
⚠️ Limited |
⚠️ Limited5 |
XPath Queries |
||||||
Basic paths ( |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
Attribute predicates ( |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
⚠️ Existence only2 |
✅ Full |
Attribute values ( |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
❌ None3 |
✅ Full |
Logical operators ( |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
❌ None |
✅ Full |
Position predicates ( |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
❌ None |
✅ Full |
Text predicates ( |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
❌ None |
✅ Full |
Namespace-aware queries |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
❌ None |
⚠️ Basic5 |
Parent axis ( |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
❌ None |
✅ Full |
Sibling axes |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
❌ None |
❌ None5 |
XPath functions ( |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
❌ None |
✅ All 27 |
Special Content |
||||||
CDATA sections |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
Comments |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
Processing instructions |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
DOCTYPE declarations |
✅ Full |
✅ Full |
✅ Full |
⚠️ Limited4 |
✅ Full |
✅ Full |
Performance |
||||||
Parse speed |
Fast |
Fast |
Medium |
Fast |
Very Fast |
Very Fast |
Serialize speed |
Fast |
Fast |
Medium |
Medium |
Very Fast |
Very Fast |
Memory usage |
Good |
Medium |
Medium |
Good |
Excellent |
Excellent |
Thread safety |
✅ Yes |
✅ Yes |
✅ Yes |
✅ Yes |
✅ Yes |
✅ Yes |
1 Ox/HeadedOx: Text node replacement may fail in some cases due to internal node structure
2 Ox: //book[@id] works (returns all book elements), but doesn’t filter by attribute existence
3 HeadedOx: Full XPath 1.0 with all 27 functions and 6 axes. Pure Ruby XPath engine on Ox’s C parser. 99.20% pass rate. See docs/headed-ox.adoc
4 Ox: Use .find { |el| el["id"] == "123" } instead of XPath attribute value predicates
5 LibXML: DOCTYPE parsing works, serialization is limited (no round-trip preservation)
6 HeadedOx limitations: Namespace introspection and 7 axes not implemented. See docs/HEADED_OX_LIMITATIONS.md
Adapter selection guide
Choose Nokogiri when:
-
You need industry-standard compatibility
-
Large community support is important
-
C extension performance is acceptable
-
Cross-platform deployment is required
Choose Oga when:
-
Pure Ruby environment is required (JRuby, TruffleRuby)
-
Best test coverage is needed (98%)
-
No C extensions are allowed
-
Memory usage is not the primary concern
Choose REXML when:
-
Standard library only (no external gems)
-
Maximum portability is required
-
Small to medium documents
-
Deployment simplicity is critical
Choose LibXML when:
-
Alternative to Nokogiri is desired
-
Full namespace support is required
-
Good performance with correctness
-
Native C extension is acceptable
Choose Ox when:
-
Maximum parsing speed is critical
-
Simple document structures (limited nesting)
-
XPath usage is minimal or absent
-
Memory efficiency is paramount
Choose HeadedOx when:
-
Need Ox’s fast parsing with full XPath support
-
Want comprehensive XPath 1.0 features (functions, predicates)
-
Prefer pure Ruby XPath implementation for debugging
-
Need more XPath capabilities than standard Ox provides
-
Memory efficiency is important but XPath features are required
|
Caution
|
Ox’s custom XPath engine supports common patterns but may not handle complex XPath expressions. Test thoroughly if your use case requires advanced XPath. |
Getting started
Installation
Install the gem and at least one supported XML library:
# In your Gemfile
gem 'moxml'
gem 'nokogiri' # Or 'oga', 'rexml', 'ox', or 'libxml-ruby'Basic document creation
doc = Moxml.new.create_document
# Add XML declaration
doc.add_child(doc.create_declaration("1.0", "UTF-8"))
# Create root element with namespace
root = doc.create_element('book')
root.add_namespace('dc', 'http://purl.org/dc/elements/1.1/')
doc.add_child(root)
# Add content
title = doc.create_element('dc:title')
title.text = 'XML Processing with Ruby'
root.add_child(title)
# Output formatted XML
puts doc.to_xml(indent: 2)Real-world examples
Practical, runnable examples demonstrating Moxml usage in common scenarios are available in the examples directory.
These examples include:
- RSS Parser
-
Parse RSS/Atom feeds with XPath queries and namespace handling
- Web Scraper
-
Extract data from HTML/XML using DOM navigation and table parsing
- API Client
-
Build and parse XML API requests/responses with SOAP
Each example is:
-
Fully documented with detailed README
-
Self-contained and runnable
-
Demonstrates best practices
-
Includes sample data files
-
Shows comprehensive error handling
Run any example directly:
ruby examples/rss_parser/rss_parser.rb
ruby examples/web_scraper/web_scraper.rb
ruby examples/api_client/api_client.rbSee the examples README for complete documentation and learning paths.
Working with documents
Using the builder pattern
The builder pattern provides a clean DSL for creating XML documents:
doc = Moxml::Builder.new(Moxml.new).build do
declaration version: "1.0", encoding: "UTF-8"
element 'library', xmlns: 'http://example.org/library' do
element 'book' do
element 'title' do
text 'Ruby Programming'
end
element 'author' do
text 'Jane Smith'
end
comment 'Publication details'
element 'published', year: '2024'
cdata '<custom>metadata</custom>'
end
end
endDirect document manipulation
doc = Moxml.new.create_document
# Add declaration
doc.add_child(doc.create_declaration("1.0", "UTF-8"))
# Create root with namespace
root = doc.create_element('library')
root.add_namespace(nil, 'http://example.org/library')
root.add_namespace("dc", "http://purl.org/dc/elements/1.1/")
doc.add_child(root)
# Add elements with attributes
book = doc.create_element('book')
book['id'] = 'b1'
book['type'] = 'technical'
root.add_child(book)
# Add mixed content
book.add_child(doc.create_comment('Book details'))
title = doc.create_element('title')
title.text = 'Ruby Programming'
book.add_child(title)Fluent interface API
Moxml provides a fluent, chainable API for creating and manipulating XML documents with improved developer experience:
# Old way - verbose and less readable
element = doc.create_element('book')
element.add_namespace("dc", "http://purl.org/dc/elements/1.1/")
element["id"] = "123"
element["type"] = "article"
child = doc.create_element("title")
child.text = "Hello"
element.add_child(child)
# New way - fluent and chainable
element = doc.create_element('book')
.with_namespace("dc", "http://purl.org/dc/elements/1.1/")
.set_attributes(id: "123", type: "article")
.with_child(doc.create_element("title").tap { |t| t.text = "Hello" })Chainable element methods
# with_namespace - add namespace and return self
element.with_namespace("dc", "http://purl.org/dc/elements/1.1/")
# set_attributes - set multiple attributes at once
element.set_attributes(id: "123", title: "Ruby", year: "2024")
# with_child - add child and return self
element.with_child(doc.create_element("author"))
# Chain multiple operations
element
.with_namespace("dc", "http://purl.org/dc/elements/1.1/")
.set_attributes(id: "123", type: "technical")
.with_child(doc.create_element("title"))
.with_child(doc.create_element("author"))Convenience query methods
# find_element - alias for at_xpath
first_book = doc.root.find_element("//book")
# find_all - returns array of matching elements
all_books = doc.root.find_all("//book")
# Document-level find methods
first_title = doc.find("//title")
all_titles = doc.find_all("//title")Quick element creation
# add_element - create, configure, and add element in one call
book = doc.add_element("book", id: "123", title: "Ruby") do |elem|
elem.text = "Ruby Programming Guide"
endPractical fluent example
doc = Moxml.new.create_document
# Build a complete book entry with fluent API
doc.add_element("library") do |library|
library
.with_namespace("dc", "http://purl.org/dc/elements/1.1/")
.with_child(
doc.create_element("book")
.set_attributes(id: "b1", isbn: "978-0-123456-78-9")
.with_child(doc.create_element("dc:title").tap { |t| t.text = "Ruby Programming" })
.with_child(doc.create_element("dc:creator").tap { |c| c.text = "Jane Smith" })
.with_child(doc.create_element("dc:date").tap { |d| d.text = "2024" })
)
end
puts doc.to_xml(indent: 2)XML objects and their methods
Document object
The Document object represents an XML document and serves as the root container for all XML nodes.
# Creating a document
doc = Moxml.new.create_document
doc = Moxml.new.parse(xml_string)
# Document properties and methods
doc.encoding # Get document encoding
doc.encoding = "UTF-8" # Set document encoding
doc.version # Get XML version
doc.version = "1.1" # Set XML version
doc.standalone # Get standalone declaration
doc.standalone = "yes" # Set standalone declaration
# Document structure
doc.root # Get root element
doc.children # Get all top-level nodes
doc.add_child(node) # Add a child node
doc.remove_child(node) # Remove a child node
# Node creation methods
doc.create_element(name) # Create new element
doc.create_text(content) # Create text node
doc.create_cdata(content) # Create CDATA section
doc.create_comment(content) # Create comment
doc.create_processing_instruction(target, content) # Create PI
# Document querying
doc.xpath(expression) # Find nodes by XPath
doc.at_xpath(expression) # Find first node by XPath
# Serialization
doc.to_xml(options) # Convert to XML string
# Convenience methods
doc.add_element(name, attributes = {}, &block) # Create, configure, and add element
doc.find(xpath) # Alias for at_xpath
doc.find_all(xpath) # Returns array of matching elementsElement object
Elements are the primary structural components of an XML document, representing tags with attributes and content.
# Element properties
element.name # Get element name
element.name = "new_name" # Set element name
element.text # Get text content
element.text = "content" # Set text content
element.inner_text # Get text content for current node only
element.inner_xml # Get inner XML content
element.inner_xml = xml # Set inner XML content
# Attributes
element[name] # Get attribute value
element[name] = value # Set attribute value
element.attributes # Get all attributes
element.remove_attribute(name) # Remove attribute
# Namespace handling
element.namespace # Get element's namespace
element.namespace = ns # Set element's namespace
element.add_namespace(prefix, uri) # Add new namespace
element.namespaces # Get all namespace definitions
# Node structure
element.parent # Get parent node
element.children # Get child nodes
element.add_child(node) # Add child node
element.remove_child(node) # Remove child node
element.add_previous_sibling(node) # Add sibling before
element.add_next_sibling(node) # Add sibling after
element.replace(node) # Replace with another node
element.remove # Remove from document
# Node type checking
element.element? # Returns true
element.text? # Returns false
element.cdata? # Returns false
element.comment? # Returns false
element.processing_instruction? # Returns false
# Node querying
element.xpath(expression) # Find nodes by XPath
element.at_xpath(expression) # Find first node by XPath
# Convenience methods
element.with_namespace(prefix, uri) # Add namespace and return self
element.set_attributes(hash) # Set multiple attributes, returns self
element.with_child(node) # Add child and return self
element.find_element(xpath) # Alias for at_xpath
element.find_all(xpath) # Returns array of matching elementsText object
Text nodes represent character data in the XML document.
# Creating text nodes
text = doc.create_text("content")
# Text properties
text.content # Get text content
text.content = "new" # Set text content
# Node type checking
text.text? # Returns true
# Structure
text.parent # Get parent node
text.remove # Remove from document
text.replace(node) # Replace with another nodeCDATA object
CDATA sections contain text that should not be parsed as markup.
# Creating CDATA sections
cdata = doc.create_cdata("<raw>content</raw>")
# CDATA properties
cdata.content # Get CDATA content
cdata.content = "new" # Set CDATA content
# Node type checking
cdata.cdata? # Returns true
# Structure
cdata.parent # Get parent node
cdata.remove # Remove from document
cdata.replace(node) # Replace with another nodeComment object
Comments contain human-readable notes in the XML document.
# Creating comments
comment = doc.create_comment("Note")
# Comment properties
comment.content # Get comment content
comment.content = "new" # Set comment content
# Node type checking
comment.comment? # Returns true
# Structure
comment.parent # Get parent node
comment.remove # Remove from document
comment.replace(node) # Replace with another nodeProcessing instruction object
Processing instructions provide instructions to applications processing the XML.
# Creating processing instructions
pi = doc.create_processing_instruction("xml-stylesheet",
'type="text/xsl" href="style.xsl"')
# PI properties
pi.target # Get PI target
pi.target = "new" # Set PI target
pi.content # Get PI content
pi.content = "new" # Set PI content
# Node type checking
pi.processing_instruction? # Returns true
# Structure
pi.parent # Get parent node
pi.remove # Remove from document
pi.replace(node) # Replace with another nodeAttribute object
Attributes represent name-value pairs on elements.
# Attribute properties
attr.name # Get attribute name
attr.name = "new" # Set attribute name
attr.value # Get attribute value
attr.value = "new" # Set attribute value
# Namespace handling
attr.namespace # Get attribute's namespace
attr.namespace = ns # Set attribute's namespace
# Node type checking
attr.attribute? # Returns trueNamespace object
Namespaces define XML namespaces used in the document.
# Namespace properties
ns.prefix # Get namespace prefix
ns.uri # Get namespace URI
# Formatting
ns.to_s # Format as xmlns declaration
# Node type checking
ns.namespace? # Returns trueNode traversal and inspection
Each node type provides methods for traversing the document structure:
node.parent # Get parent node
node.children # Get child nodes
node.next_sibling # Get next sibling
node.previous_sibling # Get previous sibling
# Convenience accessors
node.first_child # Get first child
node.last_child # Get last child
node.has_children? # Check if node has children
# Node manipulation
node.clone # Deep copy of node
node.dup # Alias for clone
# Query methods
node.find(xpath) # Alias for at_xpath
node.find_all(xpath) # Returns array of matching elements
# Type checking
node.element? # Is it an element?
node.text? # Is it a text node?
node.cdata? # Is it a CDATA section?
node.comment? # Is it a comment?
node.processing_instruction? # Is it a PI?
node.attribute? # Is it an attribute?
node.namespace? # Is it a namespace?
# Node information
node.document # Get owning documentAdvanced features
XPath querying and node mapping
Nokogiri, Oga, REXML, LibXML
Moxml provides efficient XPath querying by leveraging the native XML library’s implementation while maintaining consistent node mapping:
# Find all book elements
books = doc.xpath('//book')
# Returns Moxml::Element objects mapped to native nodes
# Find with namespaces
titles = doc.xpath('//dc:title',
'dc' => 'http://purl.org/dc/elements/1.1/')
# Find first matching node
first_book = doc.at_xpath('//book')
# Chain queries
doc.xpath('//book').each do |book|
# Each book is a mapped Moxml::Element
title = book.at_xpath('.//title')
puts "#{book['id']}: #{title.text}"
endOx
The native Ox’s query method
locate resembles
XPath but has a different syntax.
HeadedOx
HeadedOx provides comprehensive (but not fully) XPath 1.0 support via a pure Ruby XPath engine layered on top of Ox.
Namespace handling
# Add namespace to element
element.add_namespace('dc', 'http://purl.org/dc/elements/1.1/')
# Create element in namespace
title = doc.create_element('dc:title')
title.text = 'Document Title'
# Query with namespaces
doc.xpath('//dc:title',
'dc' => 'http://purl.org/dc/elements/1.1/')Accessing native implementation
While not typically needed, you can access the underlying XML library’s nodes:
# Get native node
native_node = element.native
# Get adapter being used
adapter = element.context.config.adapter
# Create from native node
element = Moxml::Element.new(native_node, context)Error handling
Moxml provides comprehensive error classes with enhanced context and helpful hints for debugging. Each error class includes specific attributes relevant to the error type and provides detailed error messages with suggestions.
Error class hierarchy
All Moxml errors inherit from [Moxml::Error](lib/moxml/error.rb:4), which itself
inherits from StandardError.
Moxml::Error (< StandardError)
├── ParseError # XML parsing failures
├── XPathError # XPath expression errors
├── ValidationError # XML validation failures
├── NamespaceError # Namespace-related errors
├── AdapterError # Adapter loading/operation errors
├── SerializationError # XML serialization failures
├── DocumentStructureError # Invalid document structure
├── AttributeError # Attribute operation errors
└── NotImplementedError # Unimplemented adapter featuresEnhanced error context
Each error class provides contextual information to aid debugging:
begin
doc = context.parse(invalid_xml, strict: true)
rescue Moxml::ParseError => e
# Enhanced parse errors include:
puts e.line # Line number where error occurred
puts e.column # Column number where error occurred
puts e.source # Excerpt of problematic XML
puts e.to_s # Full message with hints
# Output includes helpful hint:
# "Hint: Check XML syntax and ensure all tags are properly closed"
endError types and usage
ParseError
Raised when XML parsing fails. Includes line/column information when available.
begin
doc = Moxml.new.parse("<invalid>", strict: true)
rescue Moxml::ParseError => e
puts "Parse failed at line #{e.line}, column #{e.column}"
puts e.to_s # Includes hint for resolution
endXPathError
Raised when XPath expression evaluation fails.
begin
results = doc.xpath("//invalid[[[")
rescue Moxml::XPathError => e
puts "Expression: #{e.expression}"
puts "Adapter: #{e.adapter}"
puts e.to_s # Includes syntax verification hint
endValidationError
Raised when XML content violates XML specifications.
begin
# Invalid XML version
doc.version = "2.0"
rescue Moxml::ValidationError => e
puts "Constraint: #{e.constraint}" # "version"
puts "Value: #{e.value}" # "2.0"
puts e.to_s # Includes allowed values
endNamespaceError
Raised when namespace operations fail.
begin
element.add_namespace("ns", "invalid-uri")
rescue Moxml::NamespaceError => e
puts "Prefix: #{e.prefix}" # "ns"
puts "URI: #{e.uri}" # "invalid-uri"
puts "Element: #{e.element}" # Element reference
puts e.to_s # Includes registration hint
endAdapterError
Raised when adapter loading or operations fail.
begin
Moxml::Config.new.adapter = :nonexistent
rescue Moxml::AdapterError => e
puts "Adapter: #{e.adapter_name}" # :nonexistent
puts "Operation: #{e.operation}" # "set_adapter"
puts "Native Error: #{e.native_error}" # Original error
puts e.to_s # Includes installation hint
endSerializationError
Raised when XML serialization fails.
begin
xml_output = node.to_xml
rescue Moxml::SerializationError => e
puts "Node: #{e.node}"
puts "Adapter: #{e.adapter}"
puts e.to_s # Includes structure validation hint
endDocumentStructureError
Raised when attempting invalid document structure operations.
begin
doc.root.add_child(invalid_node)
rescue Moxml::DocumentStructureError => e
puts "Operation: #{e.attempted_operation}"
puts "State: #{e.current_state}"
puts e.to_s # Includes XML spec reference hint
endAttributeError
Raised when attribute operations fail.
begin
element["123invalid"] = "value" # Invalid attribute name
rescue Moxml::AttributeError => e
puts "Attribute: #{e.attribute_name}"
puts "Element: #{e.element}"
puts "Value: #{e.value}"
puts e.to_s # Includes naming rules hint
endNotImplementedError
Raised when an adapter doesn’t support a requested feature.
begin
# Some operation not supported by current adapter
result = adapter.unsupported_method
rescue Moxml::NotImplementedError => e
puts "Feature: #{e.feature}"
puts "Adapter: #{e.adapter}"
puts e.to_s # Includes adapter capability hint
endBest practices for error handling
# Catch specific errors for targeted handling
begin
doc = Moxml.new.parse(xml_string, strict: true)
results = doc.xpath("//book[@id='123']")
rescue Moxml::ParseError => e
# Handle parsing errors
logger.error("XML parsing failed: #{e.to_s}")
# e.to_s includes hints for fixing the issue
rescue Moxml::XPathError => e
# Handle XPath errors
logger.error("XPath query failed: #{e.expression}")
rescue Moxml::NamespaceError => e
# Handle namespace errors
logger.error("Namespace error: #{e.prefix}:#{e.uri}")
rescue Moxml::Error => e
# Catch-all for other Moxml errors
logger.error("XML processing error: #{e.message}")
endAll error messages include helpful hints for resolving common issues. Use the
[to_s](lib/moxml/error.rb:16) method to get the full error message with
context and hints.
Configuration
General
Moxml can be configured globally or per instance.
# Global configuration
Moxml.configure do |config|
config.default_adapter = :nokogiri
config.strict = true
config.encoding = 'UTF-8'
end
# Instance configuration
moxml = Moxml.new do |config|
config.adapter = :oga
config.strict = false
endDefault adapter selection
To select a non-default adapter, set it before processing any input using the following syntax.
Moxml::Config.default_adapter = <adapter-symbol>Where, <adapter-symbol> is one of the following:
:rexml-
REXML
:nokogiri-
Nokogiri (default)
:oga-
Oga
:ox-
Ox
:libxml-
LibXML
:headed_ox-
HeadedOx (Ox parser + full XPath engine)
Thread safety
Moxml is thread-safe when used properly. Each instance maintains its own state and can be used safely in concurrent operations:
class XmlProcessor
def initialize
@mutex = Mutex.new
@context = Moxml.new
end
def process(xml)
@mutex.synchronize do
doc = @context.parse(xml)
# Modify document
doc.to_xml
end
end
endPerformance considerations
Memory management
Moxml maintains a node registry to ensure consistent object mapping:
doc = context.parse(large_xml)
# Process document
doc = nil # Allow garbage collection of document and registry
GC.start # Force garbage collection if neededEfficient querying
Use specific XPath expressions for better performance:
# More efficient - specific path
doc.xpath('//book/title')
# Less efficient - requires full document scan
doc.xpath('//title')
# Most efficient - direct child access
root.xpath('./*/title')Best practices
Document creation
# Preferred - using builder pattern
doc = Moxml::Builder.new(Moxml.new).build do
declaration version: "1.0", encoding: "UTF-8"
element 'root' do
element 'child' do
text 'content'
end
end
end
# Alternative - direct manipulation
doc = Moxml.new.create_document
doc.add_child(doc.create_declaration("1.0", "UTF-8"))
root = doc.create_element('root')
doc.add_child(root)Node manipulation
# Preferred - chainable operations
element
.add_namespace('dc', 'http://purl.org/dc/elements/1.1/')
.add_child(doc.create_text('content'))
# Preferred - clear node type checking
if node.element?
node.add_namespace('dc', 'http://purl.org/dc/elements/1.1/')
node.add_child(doc.create_text('content'))
endSpecific adapter limitations
Ox adapter
XPath limitations
The Ox adapter uses a custom "XPath-to-locate" translation engine.
The following XPath features are NOT supported:
-
Attribute value predicates:
//book[@id='123']❌ -
Logical operators:
//book[@id and @title]❌ -
Position predicates:
//book[1],//book[last()]❌ -
Text predicates:
//book[text()='Title']❌ -
Namespace queries:
//ns:element❌ -
Parent axis:
//child/..❌ -
Sibling axes:
following-sibling::*❌ -
XPath functions:
count(),concat(), etc. ❌
Workaround: Use Ruby enumerable methods after basic queries:
# Instead of: doc.xpath("//book[@id='123']")
# Use:
doc.xpath("//book").find { |book| book["id"] == "123" }|
Important
|
For complete XPath 1.0 specification with zero limitations today, use Nokogiri or Oga adapters. |
HeadedOx adapter
General
The HeadedOx adapter combines Ox’s fast C-based XML parsing with Moxml’s comprehensive pure Ruby XPath 1.0 engine.
HeadedOx provides full XPath 1.0 functionality through a pure Ruby XPath engine
layered on top of Ox’s fast C parser, allowing comprehensive XPath queries
unhampered by the locate() method of the default Ox implementation.
|
Note
|
Trivia: the "Headed Ox" implementation allows the Ox to head in the right direction to find the desired nodes through its comprehensive XPath layer. |
|
Note
|
The HeadedOx adapter is added in v0.2.0. |
For complete architectural details and implementation guide, see HeadedOx Documentation.
# Use HeadedOx adapter
context = Moxml.new(:headed_ox)
doc = context.parse(xml_string)
# Full XPath 1.0 support - All 27 functions work
books = doc.xpath('//book[@price < 20]')
count = doc.xpath('count(//book)')
titles = doc.xpath('//book/title[contains(., "Ruby")]')
cheap = doc.xpath('//book[@price <= sum(//book/@price) div count(//book)]')|
Important
|
For complete XPath 1.0 specification with zero limitations today, use Nokogiri or Oga adapters. |
Features
-
Fast XML parsing (Ox C extension) - Same speed as standard Ox
-
6 of 13 XPath axes (46% - covers 80% of common usage patterns)
-
Complex XPath predicates with numeric/string/boolean expressions
-
Basic namespace-aware XPath queries (Ox namespace limitations apply)
-
Expression compilation and caching (1000-entry LRU cache)
-
Document construction and serialization through Ox
Architecture
HeadedOx is a hybrid adapter that layers Moxml’s pure Ruby XPath engine on top of Ox’s fast C parser:
┌─────────────────────────────────────────┐
│ Moxml Unified API │
│ (Document, Element, Node, Builder) │
└──────────────┬──────────────────────────┘
│
┌──────────────▼──────────────────────────┐
│ HeadedOx Adapter Layer │
│ (Delegates to Ox + XPath Engine) │
└──────────────┬──────────────────────────┘
│
┌────────┴─────────┐
├───────────┐ │
┌─────▼────┐ ┌────▼──────▼─────────────┐
│ Ox Gem │ │ Moxml XPath Engine │
│ (C Parse)│ │ (Pure Ruby) │
└──────────┘ │ • Lexer (Tokenize) │
│ • Parser (AST Build) │
│ • Compiler (Ruby Gen) │
│ • Cache (1000 entries) │
└─────────────────────────┘Known limitations
The following 16 test failures represent architectural boundaries in the Ox gem, not bugs in HeadedOx:
-
✗ Attribute wildcard syntax (
@*) - Ox API limitation -
✗ Namespace introspection methods - Ox doesn’t expose namespace data
-
✗ Parent node setter - Ox C struct immutability
-
✗ CDATA end marker escaping - Complex nested
]]>sequences -
✗ Complex namespace inheritance - Ox parses but doesn’t track
-
✗ Namespaced attribute access -
element["ns:attr"]pattern
|
Important
|
These are Ox limitations, not HeadedOx bugs. |
See HEADED_OX_LIMITATIONS.md for:
-
Detailed analysis of each limitation with examples
-
Workarounds and alternative approaches
-
Exact Ox API enhancements required for full compatibility
-
When to use HeadedOx vs other adapters decision guide
-
Future roadmap if Ox adds namespace introspection API
When to Use HeadedOx
You can use HeadedOx instead of Ox for all XML parsing needs, except when certain advanced XPath features are required.
-
Need fast parsing + comprehensive XPath beyond Ox’s
locate() -
XPath functions are critical (count, sum, contains, substring, etc.)
-
Complex predicates required (
[@price < average],[position() = last()]) -
Prefer pure Ruby XPath for debugging and customization
-
Basic namespace queries are sufficient
-
Document structure is mostly read-only
-
Performance matters but XPath features are non-negotiable
When not to use HeadedOx:
-
Need all 13 XPath axes (especially ancestor, sibling, following/preceding)
-
Advanced namespace operations required (introspection, complex inheritance)
-
Complex DOM modifications needed (parent node mutation)
-
CDATA escaping for nested markers is critical
-
Full Nokogiri feature parity required
For complete details, see HeadedOx Implementation Guide and HeadedOx Limitations Documentation.
XPath capabilities
| Category | XPath 1.0 Support | Details |
|---|---|---|
Functions |
✅ |
All XPath 1.0 standard functions fully implemented and tested: String (10), Numeric (6), Boolean (4), Node (4), Position (2), Special (1) |
Axes |
6/13 axes (46%) |
✓ Implemented: child, self, parent, descendant, descendant-or-self (//), attribute (@) ✗ Missing: ancestor, sibling families, following/preceding families, namespace Coverage: 80% of real-world XPath usage patterns |
Operators |
✅ |
All comparison (=, !=, <, >, ⇐, >=), arithmetic (+, -, *, div, mod), logical (and, or), and union (|) operators |
Predicates |
✅ of Core |
Position predicates |
Parsing |
✅ Complete |
Uses Ox’s C parser for maximum speed - fastest of all adapters |
Caching |
✅ LRU Cache |
1000-entry cache for compiled XPath expressions - significant performance boost for repeated queries |
What XPath queries work in HeadedOx
|
Note
|
This table is of v0.2.0. |
The following XPath patterns are fully functional:
# Descendant searches
doc.xpath('//book') # ✅ Works
doc.xpath('//book/title') # ✅ Works
# Attribute selection
doc.xpath('//book/@price') # ✅ Works
doc.xpath('//@price') # ✅ Works
# Predicates with operators
doc.xpath('//book[@price < 20]') # ✅ Works
doc.xpath('//book[1]') # ✅ Works (position)
doc.xpath('//book[last()]') # ✅ Works (last position)
doc.xpath('//book[@price=10 or @price=30]') # ✅ Works (logical)
# All 27 XPath 1.0 functions
doc.xpath('count(//book)') # ✅ Returns Float
doc.xpath('sum(//book/@price)') # ✅ Returns Float
doc.xpath('string(//title[1])') # ✅ Returns String
doc.xpath('concat("Price: ", //book/@price)') # ✅ String concatenation
doc.xpath('contains(//title, "Ruby")') # ✅ Boolean search
doc.xpath('substring(//title, 1, 5)') # ✅ String extraction
doc.xpath('normalize-space(//title)') # ✅ Whitespace handling
doc.xpath('boolean(//book[@price])') # ✅ Boolean conversion
doc.xpath('floor(//book/@price)') # ✅ Numeric rounding
doc.xpath('starts-with(//title, "Ruby")') # ✅ Prefix checking
# Complex queries with function composition
doc.xpath('//book[@price < 25]/title') # ✅ Chained paths
doc.xpath('//book[contains(title, "Ruby")]') # ✅ Functions in predicates
doc.xpath('//book[position() = last()]') # ✅ Position functions
doc.xpath('//book[string-length(title) > 10]') # ✅ String functions
doc.xpath('//book[@price < sum(//book/@price) div count(//book)]') # ✅ Complex arithmeticLibXML adapter
DOCTYPE Limitations:
-
DOCTYPE parsing works
-
DOCTYPE round-trip preservation is limited
-
DOCTYPE cannot be reliably re-serialized after parsing
Performance:
-
Serialization speed: ~120 ips (slower than target)
-
Parsing speed: Good
-
For high-throughput serialization, consider Ox or Nokogiri
Other adapters
Nokogiri, Oga, REXML:
All three adapters have near-complete feature support with only minor edge case limitations. Use these adapters when you need full XPath and namespace support.
Development and testing
Skipping benchmarks
Benchmark tests can be slow and are not needed for regular test runs. To
speed up local development, you can skip benchmark tests using the
SKIP_BENCHMARKS environment variable.
Syntax:
SKIP_BENCHMARKS=1 bundle exec rspecThis will skip all benchmark tests while running the regular test suite.
To run benchmarks explicitly:
bundle exec rspec spec/moxml/examples/xpath_benchmark_spec.rbOr use the rake task:
rake benchmark:xpath|
Note
|
The rake benchmark:xpath task always runs benchmarks regardless of the
SKIP_BENCHMARKS environment variable setting.
|
Running tests with coverage
To run the test suite with code coverage tracking:
COVERAGE=true bundle exec rspecAfter running, view the coverage report:
open coverage/index.htmlThe coverage configuration includes:
-
Minimum overall coverage: 90%
-
Minimum per-file coverage: 80%
-
Organized groups for Core, Adapters, and Utilities
-
Filters for spec and vendor directories
Generating performance benchmark reports
Moxml provides a comprehensive benchmark reporting system that measures and compares all adapters across multiple dimensions.
Running the benchmark report
To generate a complete performance report for all adapters:
rake benchmark:reportOr run the script directly:
bundle exec ruby benchmarks/generate_report.rbThis will benchmark all available adapters and generate a detailed report at
[benchmarks/PERFORMANCE_REPORT.md](benchmarks/PERFORMANCE_REPORT.md).
Benchmark categories
The report includes the following benchmark categories:
Parsing benchmarks:
-
Simple XML (< 1KB)
-
Medium XML (10KB, 50 elements with namespaces)
-
Large XML (145KB, 500 elements)
-
Complex nested structures
Serialization benchmarks:
-
Simple documents
-
Documents with namespaces
-
Documents with mixed content
XPath benchmarks:
-
Simple queries (
//element) -
Complex queries with predicates (
//element[@attribute]) -
Namespace-aware queries (
//ns:element)
Memory benchmarks:
-
Memory usage per document parse
-
Memory usage for large documents
Report contents
The generated report includes:
-
Summary table comparing all adapters with grades
-
Detailed performance metrics for each benchmark
-
ASCII performance visualization charts
-
Adapter selection recommendations based on results
-
Complete test environment details (Ruby version, platform, gem versions)
-
Best performers for each category
Viewing the report
After generation, the report is available at:
cat benchmarks/PERFORMANCE_REPORT.mdOr open it in your preferred Markdown viewer.
|
Note
|
The generated report is machine-specific and excluded from git via
.gitignore. Results will vary based on your hardware, OS, and Ruby version.
|
Example output
The summary table shows comparative performance:
| Adapter | Parse (ips) | Serialize (ips) | XPath (ips) | Memory (MB) | Grade |
|---------|-------------|-----------------|-------------|-------------|-------|
| Nokogiri | 76 | 13900 | 64958 | -0.1 ⭐ | A |
| Ox | 289 | 39203 | 9640 | 0.0 ⭐⭐⭐⭐⭐ | A+ |
...Performance visualizations help quickly identify the best adapter for specific needs:
Parsing (Medium XML):
Nokogiri █████████████ 76 ips
Ox ██████████████████████████████████████████████████ 289 ips
...Contributing
-
Fork the repository
-
Create your feature branch (
git checkout -b feature/my-new-feature) -
Commit your changes (
git commit -am 'Add some feature') -
Push to the branch (
git push origin feature/my-new-feature) -
Create a new Pull Request
License
Copyright Ribose.
This project is licensed under the BSD-2-Clause License. See the LICENSE.md file for details.