metanorma-document
Ruby library for working with Metanorma document XML.
Installation
gem install metanorma-document
Usage
require 'metanorma/document'
doc = IO.read('spec/fixtures/rice-amd-en.final.xml')
standard = Metanorma::IsoDocument::Root.from_xml(doc)
puts standard.to_xml(pretty: true)
HTML Generation
The Metanorma::Html::Generator produces a complete, self-contained HTML document from a presentation XML document model. It handles body content rendering, table of contents, CSS theming, JavaScript interactivity, and responsive layout.
Quick Start
require "metanorma/document"
require "metanorma/html/generator"Parse presentation XML into a document model
xml = File.read("spec/fixtures/iso/is/document-en.presentation.xml") doc = Metanorma::IsoDocument::Root.from_xml(xml)
Generate a complete HTML document
html = Metanorma::Html::Generator.generate(doc) File.write("output.html", html)
Renderer Selection
The Generator automatically selects the correct renderer based on the document model class:
Metanorma::Document::Root
|
|
Metanorma::StandardDocument::Root
|
|
Metanorma::IsoDocument::Root
|
|
For publisher-based dispatch within the same model (e.g., ICC documents published through ISO), use taste registration:
ICC documents are IsoDocument::Root but use IccRenderer
Metanorma::Html::Generator.register_taste( Metanorma::IsoDocument::Root, "ICC", Metanorma::Html::IccRenderer )
The lookup order is: taste match (publisher) → model class (most specific last) → BaseRenderer.
Renderer Architecture
The renderer hierarchy follows the open/closed principle:
BaseRenderer # Core HTML rendering, document assembly, CSS/JS pipeline
└── StandardRenderer # Terms, definitions, annexes, bibliography
├── IsoRenderer # ISO cover page, copyright, title formatting
│ ├── IccRenderer # ICC publisher styling
│ └── PdfaRenderer # PDF Association publisher styling
├── IecRenderer # IEC-specific formatting
├── IeeeRenderer # IEEE-specific formatting
├── IetfRenderer # IETF-specific formatting
├── IhoRenderer # IHO-specific formatting
├── ItuRenderer # ITU-specific formatting
├── OgcRenderer # OGC-specific formatting
├── OimlRenderer # OIML-specific formatting
├── BipmRenderer # BIPM-specific formatting
├── CcRenderer # CC-specific formatting
└── RiboseRenderer # Ribose-specific formatting
Drop Pattern
Block elements (notes, examples, sourcecode, formulas, figures, admonitions) use the Drop pattern:
A Drop captures rendered content via RendererContext, then passes
data to a Liquid template. The renderer never emits HTML directly.
def render_note(note, **_opts) drop = Drops::NoteDrop.from_model(note, renderer: renderer_context) @output << render_liquid("_note.html.liquid", { "block" ⇒ drop }) end
Each Drop class inherits from BlockElementDrop and implements .from_model(model, renderer:) which:
-
Captures child content via
renderer.capture_output { … } -
Returns a new Drop instance with pre-rendered HTML strings
-
The Liquid template reads Drop attributes and outputs final HTML
RendererContext is a facade that exposes only the rendering methods Drops need, maintaining encapsulation.
Class Name Ownership
The HTML renderer owns its class names entirely. No XML-originated class names appear in the HTML output.
-
Block-level classes (
note-block,formula,figure, etc.) are assigned by the renderer based on what it’s rendering -
Inline span classes use
SPAN_ROLE_CLASSESto map XML span roles to HTML-specific names (e.g.,boldtitle→title-text,citesec→xref-section) -
The XML’s
class_attris read as input only to determine semantic role, never emitted directly
Presentation XML
The HTML renderer expects presentation XML (not source XML). Presentation XML contains fmt- display elements (fmt-title, fmt-xref, fmt-link, fmt-concept, fmt-definition, fmt-preferred, etc.) alongside semantic elements. The renderer prioritizes fmt- elements for rendering, falling back to semantic elements when display elements are absent.
Theming
Each renderer has a Theme object controlling colors, typography, and layout. Override theme properties in subclasses:
class MyRenderer < Metanorma::Html::StandardRenderer
def theme
@theme ||= begin
t = Theme.new
t.primary = "#1a5276"
t.accent = "#2e86c1"
t.font_body = '"Charter", serif'
t
end
end
endTheme properties are emitted as CSS custom properties (--mn-primary, --font-body, etc.) for runtime customization.
Liquid Templates
HTML structure is defined in .liquid templates under lib/metanorma/html/templates/:
document.html.liquid
|
Full document shell ( |
_header.html.liquid
|
Sticky header with publisher logos |
_footer.html.liquid
|
Footer with copyright |
_cover.html.liquid
|
Cover page layout |
_iso_cover.html.liquid
|
ISO-specific cover page |
_footnotes.html.liquid
|
Footnotes section |
_doc_title.html.liquid
|
Document title rendering |
_note.html.liquid
|
Note block |
_example.html.liquid
|
Example block |
_sourcecode.html.liquid
|
Source code block |
_formula.html.liquid
|
Formula block |
_figure.html.liquid
|
Figure block |
_admonition.html.liquid
|
Admonition block |
Templates use Liquid::LocalFileSystem for partials (prefixed with _). The render_liquid method handles template caching via TEMPLATE_CACHE.
Asset Pipeline
The AssetPipeline compiles CSS and JavaScript from modular source files:
| CSS |
|
| JS |
|
All assets are compiled into inline <style> and <script> blocks for self-contained output.
CLI Usage
From the command line:
Generate HTML from presentation XML
bundle exec ruby -e ' require "metanorma/document" require "metanorma/html/generator" doc = Metanorma::IsoDocument::Root.from_xml(ARGF.read) puts Metanorma::Html::Generator.generate(doc) ' < document.presentation.xml > output.html
API Reference
Generator.generate(document, **options)
|
Returns a complete HTML string. Auto-selects the renderer. |
Generator.register(model_class, renderer_class)
|
Register a model-to-renderer mapping. |
Generator.register_taste(model_class, publisher_abbrev, renderer_class)
|
Register a publisher-based override. |
Generator.renderer_for(document)
|
Returns the renderer class that would be used. |
Renderer instance methods:
generate_full_document(document)
|
Full HTML document (body + assembly). |
to_html
|
Returns the rendered body content after |
theme
|
Returns the |
render_liquid(template_name, assigns)
|
Renders a Liquid template with caching. |
Document Flavors
The gem provides a hierarchy of document flavors:
-
BasicDocument - Basic document model
-
StandardDocument - Standard document model (extends BasicDocument)
-
IsoDocument - ISO standard document model (extends StandardDocument)
-
IecDocument - IEC standard document model
-
IeeeDocument - IEEE standard document model
-
IetfDocument - IETF standard document model
-
IhoDocument - IHO standard document model
-
OimlDocument - OIML standard document model
-
BipmDocument - BIPM standard document model
-
ItuDocument - ITU standard document model
-
OgcDocument - OGC standard document model
-
CcDocument - CC standard document model
-
RiboseDocument - Ribose standard document model
Supported XML Formats
This library targets the modern Metanorma XML format which uses <metanorma> as the root element.
Legacy XML formats that use flavor-specific root elements (e.g. <iso-standard>, <m3d-standard>, <csa-standard>, <un-standard>) are not supported.
Mirror Format
The Metanorma::Mirror module is a ProseMirror-style document model that captures the structural and inline content of a Metanorma document. It serves as an intermediate representation suitable for serialization, round-tripping, alternate rendering pipelines, and editor integrations.
The mirror format is fully model-driven: every node, mark, and document is an instance of a Model class. There are no hashes at the API boundary.
Model Layer
Model::Container
|
A node with children ( |
Model::Leaf
|
A node without children (e.g. |
Model::Text
|
Inline text carrying a string and a list of `Mark`s |
Model::SoftBreak
|
A soft line break inside a paragraph |
Model::Mark
|
An inline decoration ( |
Model::Guide
|
Output wrapper with |
Model::Factory.from_h(hash)
|
Rebuilds a model graph from a serialized hash |
Each model class encapsulates its attributes via bounded accessors. Mark exposes [], []=, set_attr, and fetch for safe attribute mutation. #to_h is the canonical serialization method.
Forward Conversion (Metanorma → Mirror)
require "metanorma/document"
require "metanorma/mirror"xml = File.read("spec/fixtures/iso/is/document-en.presentation.xml") doc = Metanorma::IsoDocument::Root.from_xml(xml)
transformer = Metanorma::Mirror::Transformer.new document = transformer.from_metanorma(doc) document.class # ⇒ Metanorma::Mirror::Model::Container
Reverse Conversion (Mirror → Mirror, with Rewriting)
The Rewriter walks a Model graph and produces a new Model graph, applying skip rules and optional per-type customization. (The historical name MirrorToMetanorma is no longer accurate: the output is a Model graph, not Metanorma XML. The class was renamed to Rewriter to reflect this.)
rewriter = Metanorma::Mirror::Rewriter.new
rebuilt = rewriter.call(document)Each Rewriter instance seeds its skip set and builder map from class-level defaults at new time, so two instances never share state. Instance-level mutation does not leak back to the class, and class-level mutation does not retroactively affect existing instances.
Instance API — only affects this instance
rewriter.skip("my_type") rewriter.register("my_type") do |node, _rewriter| Metanorma::Mirror::Model::Leaf.new(type: "customized") end
Class API — seeds defaults copied into future instances
Metanorma::Mirror::Rewriter.skip("always_skip") Metanorma::Mirror::Rewriter.register("always_customize") { |n, _r| … }
Model dispatch is polymorphic: each Model::* class implements accept_rewriter(rewriter), which calls the matching rewriter#rewrite_container / #rewrite_leaf / #rewrite_text / #rewrite_soft_break. There is no case/when on node classes in the dispatch path.
Default Handler Registry
Metanorma::Mirror::DefaultRegistry.build returns a fresh HandlerRegistry pre-populated with the standard Metanorma → Mirror mappings (paragraphs, blocks, lists, sections, terms, structural containers). It is the single source of truth for the default handler set — adding a new model-to-handler mapping is a single register call here, with no edits to HandlerRegistry itself.
Metanorma::Mirror.build_default_registry
# => #<Metanorma::Mirror::HandlerRegistry> (fresh instance each call)Metadata Service
Metanorma::Mirror::Metadata is a standalone service that extracts document metadata (title, etc.) from a parsed bibdata object. It is consumed by both the forward pipeline (Output::Pipeline::AttachMetadata) and the forward transformer (MetanormaToMirror#extract_root_title), keeping title-extraction logic in one place rather than duplicated across the Handlers layer.
Metanorma::Mirror::Metadata.title_from_bibdata(doc.bibdata)
# => "First title string"Serialization
Mirror documents round-trip cleanly through JSON and YAML:
json = Metanorma::Mirror::Serialization::JsonSerializer.serialize(document)
restored = Metanorma::Mirror::Serialization::JsonSerializer.deserialize(json)ID Strategies
ID assignment is pluggable via Metanorma::Mirror::IdStrategy:
IdStrategy::Preserve
|
Default. Keeps element IDs as-is. |
IdStrategy::Positional
|
Rewrites UUID-style IDs to positional IDs ( |
The Positional strategy exposes register_category(model_class, category) so new flavors can declare their model classes without editing dispatch code.
HTML Rendering from Mirror
Metanorma::Mirror::Output::HtmlRenderer produces SSR-ready HTML directly from a mirror Model graph. Renderer behavior is split into pluggable modules registered by node type (clause, paragraph, bullet_list, table, …) and by mark type (emphasis, link, xref, …). All dispatch is registry-driven (no case/when on types), so new node and mark types can be added without modifying core rendering code.
All HTML construction goes through Nokogiri::HTML4::Builder (or Nokogiri::HTML5::Builder for full documents). String concatenation of HTML is not permitted anywhere in the renderer layer. The HtmlRenderers module exposes shared helpers (build, build_fragment, embed, wrap, escape_text) so each renderer stays declarative.
Node handlers are stored as `UnboundMethod`s bound to the renderer instance at dispatch time. Register a new node handler by passing an instance method reference:
module MyRenderers
def self.register(registry)
registry.register_node_handler("callout",
instance_method(:render_callout))
end def render_callout(node, depth: 0)
HtmlRenderers.build { |doc| doc.span(class: "callout") { doc.text node.attrs["text"] } }
end
end
MyRenderers.register(Metanorma::Mirror::Output::HtmlRenderer)
Mark handlers are stored as Proc`s that receive the already-rendered inner HTML and the `Mark instance:
Metanorma::Mirror::Output::HtmlRenderer.register_mark_handler("kbd",
->(inner_html, _mark) { HtmlRenderers.wrap(:kbd, inner_html) })Output Pipeline
Metanorma::Mirror::Output::Pipeline runs the full conversion: parse XML → forward transform → attach metadata. Metanorma::Mirror::Output::Builder orchestrates the pipeline and writes through a pluggable Formats registry (default: InlineFormat).
builder = Metanorma::Mirror::Output::Builder.new(
xml_path: "document.presentation.xml",
output_path: "output.html",
format: :inline,
flavor: "iso",
id_strategy: Metanorma::Mirror::IdStrategy::Positional.new,
)
builder.buildNew formats register themselves without editing lookup code:
Metanorma::Mirror::Output::Formats.register(:pdf, MyPdfFormat)CLI
Convert presentation XML to mirror JSON
metanorma-document to-mirror document.presentation.xml \ -o document.mirror.json \ --id-strategy positional \ --title "My Document"
Documentation
Detailed architecture documentation is in the docs/ directory:
-
HTML Renderer Architecture — renderer pipeline, drop pattern, class name ownership, template system