Project

moxml

0.0
There's a lot of open issues
Moxml is a unified XML manipulation library that provides a common API.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
 Dependencies
 Project Readme

Moxml: Modern XML processing for Ruby

Build Status

Contents
  • Introduction and purpose
  • Supported XML libraries
    • Feature table
  • Getting started
    • Installation
    • Basic document creation
  • Working with documents
    • Using the builder pattern
    • Direct document manipulation
  • XML objects and their methods
    • Document object
    • Element object
    • Text object
    • CDATA object
    • Comment object
    • Processing instruction object
    • Attribute object
    • Namespace object
    • Node traversal and inspection
  • Advanced features
    • XPath querying and node mapping
      • Nokogiri, Oga, REXML
      • Ox
    • Namespace handling
    • Accessing native implementation
  • Error handling
  • Configuration
    • General
    • Default adapter selection
  • Thread safety
  • Performance considerations
    • Memory management
    • Efficient querying
  • Best practices
    • Document creation
    • Node manipulation
  • Contributing
  • License

Introduction and purpose

Moxml provides a unified, modern XML processing interface for Ruby applications. It offers a consistent API that abstracts away the underlying XML implementation details while maintaining high performance through efficient node mapping and native XPath querying.

Key features:

  • Intuitive, Ruby-idiomatic API for XML manipulation

  • Consistent interface across different XML libraries

  • Efficient node mapping for XPath queries

  • Support for all XML node types and features

  • Easy switching between XML processing engines

  • Clean separation between interface and implementation

Supported XML libraries

Moxml supports the following XML libraries:

REXML

REXML, a pure Ruby XML parser distributed with standard Ruby. Not the fastest, but always available.

Nokogiri

(default) Nokogiri, a widely used implementation which wraps around the performant libxml2 C library.

Oga

Oga, a pure Ruby XML parser. Recommended when you need a pure Ruby solution say for Opal.

Ox

Ox, a fast XML parser.

Feature table

Moxml exercises its best effort to provide a consistent interface across basic XML features, various XML libraries have different features and capabilities.

The following table summarizes the features supported by each library.

Note
The checkmarks indicate support for the feature, while the footnotes provide additional context for specific features.
Feature Nokogiri Oga REXML Ox

Parsing, serializing

Node manipulation

✅ See NOTE 1.

Basic XPath

Uses locate. See NOTE 2.

XPath with namespaces

Uses locate. See NOTE 2.

NOTE 1: Ox’s node manipulation may have issues, especially when manipulating Text nodes.

NOTE 2: The native Ox method locate is similar to XPath but has a different syntax.

Getting started

Installation

Install the gem and at least one supported XML library:

# In your Gemfile
gem 'moxml'
gem 'nokogiri'  # Or 'oga', 'rexml', or 'ox'

Basic document creation

require 'moxml'

# Create a new XML document
doc = Moxml.new.create_document

# Add XML declaration
doc.add_child(doc.create_declaration("1.0", "UTF-8"))

# Create root element with namespace
root = doc.create_element('book')
root.add_namespace('dc', 'http://purl.org/dc/elements/1.1/')
doc.add_child(root)

# Add content
title = doc.create_element('dc:title')
title.text = 'XML Processing with Ruby'
root.add_child(title)

# Output formatted XML
puts doc.to_xml(indent: 2)

Working with documents

Using the builder pattern

The builder pattern provides a clean DSL for creating XML documents:

doc = Moxml::Builder.new(Moxml.new).build do
  declaration version: "1.0", encoding: "UTF-8"

  element 'library', xmlns: 'http://example.org/library' do
    element 'book' do
      element 'title' do
        text 'Ruby Programming'
      end

      element 'author' do
        text 'Jane Smith'
      end

      comment 'Publication details'
      element 'published', year: '2024'

      cdata '<custom>metadata</custom>'
    end
  end
end

Direct document manipulation

doc = Moxml.new.create_document

# Add declaration
doc.add_child(doc.create_declaration("1.0", "UTF-8"))

# Create root with namespace
root = doc.create_element('library')
root.add_namespace(nil, 'http://example.org/library')
root.add_namespace("dc", "http://purl.org/dc/elements/1.1/")
doc.add_child(root)

# Add elements with attributes
book = doc.create_element('book')
book['id'] = 'b1'
root.add_child(book)

# Add mixed content
book.add_child(doc.create_comment('Book details'))
title = doc.create_element('title')
title.text = 'Ruby Programming'
book.add_child(title)

XML objects and their methods

Document object

The Document object represents an XML document and serves as the root container for all XML nodes.

# Creating a document
doc = Moxml.new.create_document
doc = Moxml.new.parse(xml_string)

# Document properties and methods
doc.encoding               # Get document encoding
doc.encoding = "UTF-8"     # Set document encoding
doc.version                # Get XML version
doc.version = "1.1"        # Set XML version
doc.standalone             # Get standalone declaration
doc.standalone = "yes"     # Set standalone declaration

# Document structure
doc.root                  # Get root element
doc.children              # Get all top-level nodes
doc.add_child(node)       # Add a child node
doc.remove_child(node)    # Remove a child node

# Node creation methods
doc.create_element(name)    # Create new element
doc.create_text(content)    # Create text node
doc.create_cdata(content)   # Create CDATA section
doc.create_comment(content) # Create comment
doc.create_processing_instruction(target, content) # Create PI

# Document querying
doc.xpath(expression)      # Find nodes by XPath
doc.at_xpath(expression)   # Find first node by XPath

# Serialization
doc.to_xml(options)        # Convert to XML string

Element object

Elements are the primary structural components of an XML document, representing tags with attributes and content.

# Element properties
element.name               # Get element name
element.name = "new_name"  # Set element name
element.text              # Get text content
element.text = "content"   # Set text content
element.inner_text        # Get text content for current node only
element.inner_xml         # Get inner XML content
element.inner_xml = xml   # Set inner XML content

# Attributes
element[name]             # Get attribute value
element[name] = value     # Set attribute value
element.attributes        # Get all attributes
element.remove_attribute(name) # Remove attribute

# Namespace handling
element.namespace         # Get element's namespace
element.namespace = ns     # Set element's namespace
element.add_namespace(prefix, uri) # Add new namespace
element.namespaces        # Get all namespace definitions

# Node structure
element.parent            # Get parent node
element.children          # Get child nodes
element.add_child(node)   # Add child node
element.remove_child(node) # Remove child node
element.add_previous_sibling(node) # Add sibling before
element.add_next_sibling(node)    # Add sibling after
element.replace(node)     # Replace with another node
element.remove           # Remove from document

# Node type checking
element.element?         # Returns true
element.text?           # Returns false
element.cdata?          # Returns false
element.comment?        # Returns false
element.processing_instruction? # Returns false

# Node querying
element.xpath(expression)  # Find nodes by XPath
element.at_xpath(expression) # Find first node by XPath

Text object

Text nodes represent character data in the XML document.

# Creating text nodes
text = doc.create_text("content")

# Text properties
text.content             # Get text content
text.content = "new"     # Set text content

# Node type checking
text.text?              # Returns true

# Structure
text.parent             # Get parent node
text.remove             # Remove from document
text.replace(node)      # Replace with another node

CDATA object

CDATA sections contain text that should not be parsed as markup.

# Creating CDATA sections
cdata = doc.create_cdata("<raw>content</raw>")

# CDATA properties
cdata.content           # Get CDATA content
cdata.content = "new"   # Set CDATA content

# Node type checking
cdata.cdata?           # Returns true

# Structure
cdata.parent           # Get parent node
cdata.remove           # Remove from document
cdata.replace(node)    # Replace with another node

Comment object

Comments contain human-readable notes in the XML document.

# Creating comments
comment = doc.create_comment("Note")

# Comment properties
comment.content         # Get comment content
comment.content = "new" # Set comment content

# Node type checking
comment.comment?        # Returns true

# Structure
comment.parent          # Get parent node
comment.remove         # Remove from document
comment.replace(node)   # Replace with another node

Processing instruction object

Processing instructions provide instructions to applications processing the XML.

# Creating processing instructions
pi = doc.create_processing_instruction("xml-stylesheet",
  'type="text/xsl" href="style.xsl"')

# PI properties
pi.target              # Get PI target
pi.target = "new"      # Set PI target
pi.content             # Get PI content
pi.content = "new"     # Set PI content

# Node type checking
pi.processing_instruction? # Returns true

# Structure
pi.parent             # Get parent node
pi.remove             # Remove from document
pi.replace(node)      # Replace with another node

Attribute object

Attributes represent name-value pairs on elements.

# Attribute properties
attr.name              # Get attribute name
attr.name = "new"      # Set attribute name
attr.value            # Get attribute value
attr.value = "new"     # Set attribute value

# Namespace handling
attr.namespace         # Get attribute's namespace
attr.namespace = ns    # Set attribute's namespace

# Node type checking
attr.attribute?        # Returns true

Namespace object

Namespaces define XML namespaces used in the document.

# Namespace properties
ns.prefix             # Get namespace prefix
ns.uri               # Get namespace URI

# Formatting
ns.to_s              # Format as xmlns declaration

# Node type checking
ns.namespace?        # Returns true

Node traversal and inspection

Each node type provides methods for traversing the document structure:

node.parent              # Get parent node
node.children            # Get child nodes
node.next_sibling        # Get next sibling
node.previous_sibling    # Get previous sibling

# Type checking
node.element?          # Is it an element?
node.text?             # Is it a text node?
node.cdata?            # Is it a CDATA section?
node.comment?          # Is it a comment?
node.processing_instruction? # Is it a PI?
node.attribute?        # Is it an attribute?
node.namespace?        # Is it a namespace?

# Node information
node.document          # Get owning document

Advanced features

XPath querying and node mapping

Nokogiri, Oga, REXML

Moxml provides efficient XPath querying by leveraging the native XML library’s implementation while maintaining consistent node mapping:

# Find all book elements
books = doc.xpath('//book')
# Returns Moxml::Element objects mapped to native nodes

# Find with namespaces
titles = doc.xpath('//dc:title',
  'dc' => 'http://purl.org/dc/elements/1.1/')

# Find first matching node
first_book = doc.at_xpath('//book')

# Chain queries
doc.xpath('//book').each do |book|
  # Each book is a mapped Moxml::Element
  title = book.at_xpath('.//title')
  puts "#{book['id']}: #{title.text}"
end

Ox

The native Ox’s query method locate resembles XPath but has a different syntax.

Namespace handling

# Add namespace to element
element.add_namespace('dc', 'http://purl.org/dc/elements/1.1/')

# Create element in namespace
title = doc.create_element('dc:title')
title.text = 'Document Title'

# Query with namespaces
doc.xpath('//dc:title',
  'dc' => 'http://purl.org/dc/elements/1.1/')

Accessing native implementation

While not typically needed, you can access the underlying XML library’s nodes:

# Get native node
native_node = element.native

# Get adapter being used
adapter = element.context.config.adapter

# Create from native node
element = Moxml::Element.new(native_node, context)

Error handling

Moxml provides specific error classes for different types of errors that may occur during XML processing:

begin
  doc = context.parse(xml_string)
rescue Moxml::ParseError => e
  # Handles XML parsing errors
  puts "Parse error at line #{e.line}, column #{e.column}"
  puts "Message: #{e.message}"
rescue Moxml::ValidationError => e
  # Handles XML validation errors
  puts "Validation error: #{e.message}"
rescue Moxml::XPathError => e
  # Handles XPath expression errors
  puts "XPath error: #{e.message}"
rescue Moxml::NamespaceError => e
  # Handles namespace errors
  puts "Namespace error: #{e.message}"
rescue Moxml::Error => e
  # Handles other Moxml-specific errors
  puts "Error: #{e.message}"
end

Configuration

General

Moxml can be configured globally or per instance.

# Global configuration
Moxml.configure do |config|
  config.default_adapter = :nokogiri
  config.strict = true
  config.encoding = 'UTF-8'
end

# Instance configuration
moxml = Moxml.new do |config|
  config.adapter = :oga
  config.strict = false
end

Default adapter selection

To select a non-default adapter, set it before processing any input using the following syntax.

Moxml::Config.default_adapter = <adapter-symbol>

Where, <adapter-symbol> is one of the following:

:rexml

REXML

:nokogiri

Nokogiri (default)

:oga

Oga

:ox

Ox

Thread safety

Moxml is thread-safe when used properly. Each instance maintains its own state and can be used safely in concurrent operations:

class XmlProcessor
  def initialize
    @mutex = Mutex.new
    @context = Moxml.new
  end

  def process(xml)
    @mutex.synchronize do
      doc = @context.parse(xml)
      # Modify document
      doc.to_xml
    end
  end
end

Performance considerations

Memory management

Moxml maintains a node registry to ensure consistent object mapping:

doc = context.parse(large_xml)
# Process document
doc = nil  # Allow garbage collection of document and registry
GC.start   # Force garbage collection if needed

Efficient querying

Use specific XPath expressions for better performance:

# More efficient - specific path
doc.xpath('//book/title')

# Less efficient - requires full document scan
doc.xpath('//title')

# Most efficient - direct child access
root.xpath('./*/title')

Best practices

Document creation

# Preferred - using builder pattern
doc = Moxml::Builder.new(Moxml.new).build do
  declaration version: "1.0", encoding: "UTF-8"
  element 'root' do
    element 'child' do
      text 'content'
    end
  end
end

# Alternative - direct manipulation
doc = Moxml.new.create_document
doc.add_child(doc.create_declaration("1.0", "UTF-8"))
root = doc.create_element('root')
doc.add_child(root)

Node manipulation

# Preferred - chainable operations
element
  .add_namespace('dc', 'http://purl.org/dc/elements/1.1/')
  .add_child(doc.create_text('content'))

# Preferred - clear node type checking
if node.element?
  node.add_namespace('dc', 'http://purl.org/dc/elements/1.1/')
  node.add_child(doc.create_text('content'))
end

Contributing

  1. Fork the repository

  2. Create your feature branch (git checkout -b feature/my-new-feature)

  3. Commit your changes (git commit -am 'Add some feature')

  4. Push to the branch (git push origin feature/my-new-feature)

  5. Create a new Pull Request

License

Copyright Ribose.

This project is licensed under the BSD-2-Clause License. See the LICENSE.md file for details.