prosereflect gem: Library for ProseMirror documents

Purpose

prosereflect is a Ruby gem for working with the document structure used by the ProseMirror rich text editor.

It provides a set of models and utilities for parsing, manipulating, and accessing the hierarchical document tree structure represented in ProseMirror’s JSON/YAML format. This allows for convenient traversal and extraction of content from rich text documents.

Installation

Add this line to your application’s Gemfile:

gem 'prosereflect'

And then execute:

$ bundle install

Or install it yourself as:

$ gem install prosereflect

Usage

Parsing ProseMirror documents

From YAML

require 'prosereflect'

# Parse from YAML string or file
yaml_content = File.read('document.yaml')
document = Prosereflect::Parser.parse_document(yaml_content)

# Access the document structure
document.content.each do |node|
  # Work with nodes
end

From JSON

require 'prosereflect'

# Parse from JSON string or file
json_content = File.read('document.json')
document = Prosereflect::Parser.parse_document(json_content)

Navigating the document

# Get all tables in the document
tables = document.tables

# Get all paragraphs
paragraphs = document.paragraphs

# Access the first table
first_table = document.find_first('table')

# Access header row and data rows in a table
header = first_table.header_row
data_rows = first_table.data_rows

# Access cells in a table
cell = first_table.cell_at(0, 0)  # First data row, first column

Accessing content

# Get text content from a paragraph
paragraph = document.paragraphs.first
text = paragraph.text_content

# Get text content from a table cell
cell = document.tables.first.cell_at(0, 0)
cell_text = cell.text_content

# Get cell content as separate lines
lines = cell.lines

Finding nodes

# Find the first node of a specific type
table = document.find_first('table')
paragraph = document.find_first('paragraph')

# Find all nodes of a specific type
tables = document.find_all('table')
text_nodes = document.find_all('text')

# Find child nodes of a specific type
table_cells = table.find_children(TableCell)

HTML Conversion

The gem provides functionality to convert between HTML and ProseMirror document models.

From HTML

require 'prosereflect'

# Parse from HTML string
html_content = '<p>This is a <strong>bold</strong> text in a paragraph.</p>'
document = Prosereflect::Input::Html.parse(html_content)

# Access the document structure
paragraph = document.paragraphs.first
text_content = paragraph.text_content # "This is a bold text in a paragraph."

User Mentions

The gem supports user mentions in documents, which can be useful for social features or collaborative editing.

# Create a document with user mentions
document = Prosereflect::Document.create
paragraph = document.add_paragraph('Hello ')

# Add a user mention
user = Prosereflect::User.new
user.id = '123'
paragraph.add_child(user)

paragraph.add_text('!')

# Convert to HTML
html = Prosereflect::Output::Html.convert(document)
# => "<p>Hello <user-mention data-id=\"123\"></user-mention>!</p>"

# Parse HTML with user mentions
html_content = '<p>Hello <user-mention data-id="123"></user-mention>!</p>'
document = Prosereflect::Input::Html.parse(html_content)

# Access user mentions
user_mentions = document.find_all('user')
first_user = user_mentions.first
user_id = first_user.id # => "123"

User mentions are represented as <user-mention> elements in HTML with a data-id attribute containing the user’s identifier. When parsing HTML, these elements are converted to User nodes in the document model.

Common use cases: - Mentioning users in comments or messages - Tagging users in collaborative documents - Tracking user references in content

To HTML

require 'prosereflect'

# Create a document
document = Prosereflect::Document.create
paragraph = document.add_paragraph('Plain text')
paragraph.add_text(' with bold', [Prosereflect::Mark::Bold.new])

# Convert to HTML
html = Prosereflect::Output::Html.convert(document)
# => "<html><body><p>Plain text<strong> with bold</strong></p></body></html>"

Round-trip Conversion

# Start with HTML
original_html = '<p>This is <em>styled</em> text.</p>'

# Convert to document model
document = Prosereflect::Input::Html.parse(original_html)

# Modify the document if needed
document.paragraphs.first.add_text(' with additions')

# Convert back to HTML
modified_html = Prosereflect::Output::Html.convert(document)

Data model

The prosereflect gem represents the document structure as a hierarchy of node objects.

+-------------------+
|      Document     |
|                   |
| +content          |
+--------+----------+
         |
         | 1..*
+--------v----------+
|        Node       |
|                   |
| -type             |
| -attrs            |
| -marks            |
| +content          |
+-------------------+
         |
    +----+----+---------------------+-------------+
    |         |                     |             |
+---v---+ +---v----------+  +-------v--------+  +-v-----+
|Table  | |  Paragraph   |  |     Text       |  | User  |
|       | |              |  |                |  |       |
+---+---+ +--------------+  +----------------+  +-------+
    |
    |
+---v-----------+
|   TableRow    |
|               |
+---+-----------+
    |
+---v-----------+
|   TableCell   |
|               |
+---------------+

Classes

Node

Base class for all node types.

type: The node type (e.g., "doc", "paragraph", "text", "table")
content: A collection of child nodes
attrs: Attributes specific to the node type
marks: Formatting marks applied to the node

Document

Top-level container representing a ProseMirror document.

content: A collection of top-level nodes in the document

Paragraph

Represents a paragraph of text.

text_content: Returns the combined text content of all child text nodes

Text

Represents a text node.

text: The text content of the node

User

Represents a user mention in the document.

id: The unique identifier of the referenced user
type: Always set to "user"
content: Always empty (user mentions cannot have child nodes)

Table

Represents a table structure.

rows: Collection of table rows
header_row: First row if it contains header cells
data_rows: All non-header rows

Heading

Represents a heading element (h1-h6).

level: The heading level (1-6)
text_content: Returns the combined text content of all child text nodes
content: Collection of child nodes (text, styled text, etc.)

Image

Represents an image element.

src: The image source URL
alt: Alternative text description
title: Image tooltip text
width: Image width in pixels
height: Image height in pixels

HorizontalRule

Represents a horizontal rule (hr) element.

style: Border style (solid, dashed, dotted)
width: Rule width (px or %)
thickness: Border thickness in pixels

BulletList

Represents an unordered list.

bullet_style: List style type (disc, circle, square)
items: Collection of list items

OrderedList

Represents an ordered list.

start: Starting number for the list
items: Collection of list items

ListItem

Represents a list item within ordered or unordered lists.

content: Collection of child nodes (can contain paragraphs, nested lists, etc.)
text_content: Returns the combined text content

Blockquote

Represents a blockquote element.

citation: Optional citation URL
blocks: Collection of content blocks within the quote

CodeBlockWrapper

Container for code blocks with additional attributes.

line_numbers: Whether to display line numbers
highlight_lines: Array of line numbers to highlight
code_blocks: Collection of code blocks

CodeBlock

Represents a code block with syntax highlighting.

content: The code content
language: Programming language for syntax highlighting

Mark

Base class for text formatting marks.

Available Mark Types

Bold: Bold text formatting
Italic: Italic text formatting
Code: Inline code formatting
Link: Hyperlink with href attribute
Strike: Strikethrough text
Subscript: Subscript text
Superscript: Superscript text
Underline: Underlined text

TableRow

Represents a row in a table.

cells: All cells in the row

TableCell

Represents a cell in a table.

paragraphs: All paragraphs in the cell
text_content: All text content combined
lines: Text content split into separate lines

Development

Adding test fixtures

The repository includes a utility script bin/extract-ituob-amendments.rb to extract ProseMirror content from the ITU Operational Bulletin for test fixtures.

Syntax:

$ bin/extract-ituob-amendments.rb {filename} {issue_number}

Where,

{filename}: The amendments YAML file to extract from. The script expects the {filename} file in the format used by the ITU Operational Bulletin data repository: https://github.com/ituob/itu-ob-data/
{issue_number}: The issue number to use in the generated file names.

This command:

Extract ProseMirror content from the specified amendments file
Generate both YAML and JSON files in the current directory
Name files according to the pattern ituob-<issue_number>-<publication>.<format>

These generated files can be moved to spec/fixtures/ituob-<issue_number>/ to use in tests.

$ bin/extract-ituob-amendments.rb amendments.yaml 1000

Copyright

This gem is developed, maintained and funded by Ribose Inc.

License

The gem is available as open source under the terms of the 2-Clause BSD License.

prosereflect

Runtime