RXerces
A Ruby XML library with a Nokogiri-compatible API, powered by Apache Xerces-C instead of libxml2.
Overview
RXerces provides a familiar Nokogiri-like interface for XML parsing and manipulation, but uses the robust Apache Xerces-C XML parser under the hood. This allows Ruby developers to leverage Xerces-C's performance and standards compliance while maintaining compatibility with existing Nokogiri-based code.
Features
- ✅ Nokogiri-compatible API
- ✅ Powered by Apache Xerces-C
- ✅ Parse XML documents
- ✅ Navigate and manipulate DOM trees
- ✅ Read and write node attributes
- ✅ Query nodes with XPath (basic support)
- ✅ Serialize documents back to XML strings
Installation
Prerequisites
You need to have Xerces-C installed on your system:
macOS (Homebrew):
brew install xerces-cUbuntu/Debian:
sudo apt-get install libxerces-c-devFedora/RHEL:
sudo yum install xerces-c-develXalan
For XPath 1.0 compliance, you will need to install the Xalan library. Note that this is optional, and rxerces will default to using the Xpath support from Xerces, which is more limited.
Ubuntu/Debian:
sudo apt-get install libxalan-c-devFedora/RHEL:
sudo yum install xalan-c-develNote that MacOS, contrary to what the documentation currently says, does not have a brew package for Xalan. You will either need to use Mac ports or clone and build the code manually. I found that it required some tweaking to work:
Install the Gem
Add this line to your application's Gemfile:
gem 'rxerces'And then execute:
bundle installOr install it yourself as:
gem install rxercesUsage
Basic Parsing
require 'rxerces'
# Parse XML string
xml = '<root><person name="Alice">Hello</person></root>'
doc = RXerces.XML(xml)
# Access root element
root = doc.root
puts root.name # => "root"Nokogiri Compatibility
RXerces provides optional Nokogiri compatibility. Require rxerces/nokogiri to enable drop-in replacement:
require 'rxerces/nokogiri'
# Parse XML with Nokogiri syntax
doc = Nokogiri.XML('<root><child>text</child></root>')
puts doc.root.name # => "root"
# Parse HTML with Nokogiri syntax
html_doc = Nokogiri.HTML('<html><body><h1>Hello</h1></body></html>')
puts html_doc.root.name # => "html"
# Alternative syntax
xml_doc = Nokogiri::XML.parse('<root>text</root>')
html_doc = Nokogiri::HTML.parse('<html>...</html>')
# Classes are aliased for both XML and HTML
Nokogiri::XML::Document == RXerces::XML::Document # => true
Nokogiri::HTML::Document == RXerces::XML::Document # => trueNote: If you don't need Nokogiri compatibility, just require 'rxerces' and use the RXerces module directly.
HTML Parsing Note: Since RXerces uses Xerces-C (an XML parser), Nokogiri::HTML parses HTML as XML. This means it won't perform HTML-specific error correction or tag fixing like Nokogiri does with libxml2's HTML parser. For well-formed HTML/XHTML documents, this works fine.
Working with Nodes
# Parse XML
xml = <<-XML
<library>
<book id="1" title="1984">
<author>George Orwell</author>
<year>1949</year>
</book>
<book id="2" title="Brave New World">
<author>Aldous Huxley</author>
<year>1932</year>
</book>
</library>
XML
doc = RXerces.XML(xml)
root = doc.root
# Get attributes
book = root.children.find { |n| n.is_a?(RXerces::XML::Element) }
puts book['id'] # => "1"
puts book['title'] # => "1984"
# Set attributes
book['isbn'] = '978-0451524935'
puts book['isbn'] # => "978-0451524935"
# Get text content
author = book.children.find { |n| n.name == 'author' }
puts author.text # => "George Orwell"
# Set text content
author.text = "Eric Arthur Blair"
puts author.text # => "Eric Arthur Blair"Navigating the DOM
# Get all children
root.children.each do |child|
puts "#{child.name}: #{child.class}"
end
# Find specific elements
books = root.children.select { |n| n.is_a?(RXerces::XML::Element) && n.name == 'book' }
books.each do |book|
puts "Book ID: #{book['id']}"
endSerialization
# Convert document back to XML string
xml_string = doc.to_xml
puts xml_string
# or use to_s
puts doc.to_sXPath Queries
RXerces supports XPath queries using Xerces-C's XPath implementation by default:
xml = <<-XML
<library>
<book>
<title>1984</title>
<author>George Orwell</author>
</book>
<book>
<title>Brave New World</title>
<author>Aldous Huxley</author>
</book>
</library>
XML
doc = RXerces.XML(xml)
# Find all book elements
books = doc.xpath('//book')
puts books.length # => 2
# Find all titles
titles = doc.xpath('//title')
titles.each do |title|
puts title.text.strip
end
# Use path expressions
authors = doc.xpath('/library/book/author')
puts authors.length # => 2
# Query from a specific node
first_book = books[0]
title = first_book.xpath('.//title').first
puts title.text # => "1984"Note on XPath Support: Xerces-C implements the XML Schema XPath subset, not full XPath 1.0. Supported features include:
- Basic path expressions (
/,//,.,..) - Element selection by name
- Descendant and child axes
Not supported:
- Attribute predicates (
[@attribute="value"]) - XPath functions (
last(),position(),text()) - Comparison operators in predicates
For more complex queries, you can combine basic XPath with Ruby's select and find methods.
For full XPath 1.0 support, install the Xalan library.
API Reference
RXerces Module
-
RXerces.XML(string)- Parse XML string and return Document -
RXerces.parse(string)- Alias forXML -
RXerces.xalan_enabled?- Check if Xalan XPath 1.0 support is available
XPath Validation Cache Configuration
RXerces validates XPath expressions for security (preventing injection attacks). For high-volume applications, validated expressions are cached to avoid redundant validation overhead.
# Check if caching is enabled (default: true)
RXerces.cache_xpath_validation? # => true
# Disable caching (re-validates every query)
RXerces.cache_xpath_validation = false
# Re-enable caching
RXerces.cache_xpath_validation = true
# Get current cache size
RXerces.xpath_validation_cache_size # => 42
# Get/set maximum cache size (default: 10,000)
RXerces.xpath_validation_cache_max_size # => 10000
RXerces.xpath_validation_cache_max_size = 5000
# Clear the cache
RXerces.clear_xpath_validation_cachePerformance note: Caching provides ~7-9% speedup for repeated XPath queries by avoiding redundant validation. The cache is thread-safe.
RXerces::XML::Document
-
.parse(string)- Parse XML string (class method) -
#root- Get root element -
#to_s/#to_xml- Serialize to XML string -
#xpath(path)- Query with XPath (returns NodeSet)
RXerces::XML::Node
-
#name- Get node name -
#text/#content- Get text content -
#text=/#content=- Set text content -
#[attribute]- Get attribute value -
#[attribute]=- Set attribute value -
#children- Get array of child nodes -
#xpath(path)- Query descendants with XPath
RXerces::XML::Element
Inherits all methods from Node. Represents element nodes.
RXerces::XML::Text
Inherits all methods from Node. Represents text nodes.
RXerces::XML::NodeSet
-
#length/#size- Get number of nodes -
#[]- Access node by index -
#each- Iterate over nodes (Enumerable) -
#to_a- Convert to array
Development
Building the Extension
bundle install
bundle exec rake compileRunning Tests
bundle exec rspecRunning Tests with Compilation
bundle exec rakeImplementation Notes
- Uses Apache Xerces-C 3.x for XML parsing
- C++ extension compiled with Ruby's native extension API
- XPath support is basic by default (full XPath requires Xalan)
- Memory management handled by Ruby's GC and Xerces-C's DOM
Differences from Nokogiri
While RXerces aims for API compatibility with Nokogiri, there are some differences:
- Parser Backend: Uses Xerces-C instead of libxml2
- XPath: Basic XPath support (returns empty NodeSet currently)
- Features: Subset of Nokogiri's full feature set
- Performance: Different performance characteristics due to Xerces-C
Contributing
- Fork it
- Create your feature branch (
git checkout -b my-new-feature) - Commit your changes (
git commit -am 'Add some feature') - Push to the branch (
git push origin my-new-feature) - Create new Pull Request
License
MIT License - see LICENSE file for details
Credits
- Built with Apache Xerces-C
- API inspired by Nokogiri
Misc
This library was almost entirely written using AI (Claude Sonnet 4.5). It was mainly a reaction to the lack of maintainers for libxml2, and the generally sorry state of that library in general. Since nokogiri uses it under the hood, I thought it best to create an alternative.
Copyright
(C) 2025-2026, Daniel J. Berger All Rights Reserved
Author
- Daniel J. Berger