prosereflect gem: Library for ProseMirror documents
Purpose
prosereflect
is a Ruby gem for working with the document structure used by the ProseMirror rich text editor.
It provides a set of models and utilities for parsing, manipulating, and accessing the hierarchical document tree structure represented in ProseMirror’s JSON/YAML format. This allows for convenient traversal and extraction of content from rich text documents.
Installation
Add this line to your application’s Gemfile:
gem 'prosereflect'
And then execute:
$ bundle install
Or install it yourself as:
$ gem install prosereflect
Usage
Parsing ProseMirror documents
From YAML
require 'prosereflect'
# Parse from YAML string or file
yaml_content = File.read('document.yaml')
document = Prosereflect::Parser.parse_document(yaml_content)
# Access the document structure
document.content.each do |node|
# Work with nodes
end
From JSON
require 'prosereflect'
# Parse from JSON string or file
json_content = File.read('document.json')
document = Prosereflect::Parser.parse_document(json_content)
Navigating the document
# Get all tables in the document
tables = document.tables
# Get all paragraphs
paragraphs = document.paragraphs
# Access the first table
first_table = document.find_first('table')
# Access header row and data rows in a table
header = first_table.header_row
data_rows = first_table.data_rows
# Access cells in a table
cell = first_table.cell_at(0, 0) # First data row, first column
Accessing content
# Get text content from a paragraph
paragraph = document.paragraphs.first
text = paragraph.text_content
# Get text content from a table cell
cell = document.tables.first.cell_at(0, 0)
cell_text = cell.text_content
# Get cell content as separate lines
lines = cell.lines
Finding nodes
# Find the first node of a specific type
table = document.find_first('table')
paragraph = document.find_first('paragraph')
# Find all nodes of a specific type
tables = document.find_all('table')
text_nodes = document.find_all('text')
# Find child nodes of a specific type
table_cells = table.find_children(TableCell)
HTML Conversion
The gem provides functionality to convert between HTML and ProseMirror document models.
From HTML
require 'prosereflect'
# Parse from HTML string
html_content = '<p>This is a <strong>bold</strong> text in a paragraph.</p>'
document = Prosereflect::Input::Html.parse(html_content)
# Access the document structure
paragraph = document.paragraphs.first
text_content = paragraph.text_content # "This is a bold text in a paragraph."
User Mentions
The gem supports user mentions in documents, which can be useful for social features or collaborative editing.
# Create a document with user mentions
document = Prosereflect::Document.create
paragraph = document.add_paragraph('Hello ')
# Add a user mention
user = Prosereflect::User.new
user.id = '123'
paragraph.add_child(user)
paragraph.add_text('!')
# Convert to HTML
html = Prosereflect::Output::Html.convert(document)
# => "<p>Hello <user-mention data-id=\"123\"></user-mention>!</p>"
# Parse HTML with user mentions
html_content = '<p>Hello <user-mention data-id="123"></user-mention>!</p>'
document = Prosereflect::Input::Html.parse(html_content)
# Access user mentions
user_mentions = document.find_all('user')
first_user = user_mentions.first
user_id = first_user.id # => "123"
User mentions are represented as <user-mention>
elements in HTML with a data-id
attribute containing the user’s identifier. When parsing HTML, these elements are converted to User
nodes in the document model.
Common use cases: - Mentioning users in comments or messages - Tagging users in collaborative documents - Tracking user references in content
To HTML
require 'prosereflect'
# Create a document
document = Prosereflect::Document.create
paragraph = document.add_paragraph('Plain text')
paragraph.add_text(' with bold', [Prosereflect::Mark::Bold.new])
# Convert to HTML
html = Prosereflect::Output::Html.convert(document)
# => "<html><body><p>Plain text<strong> with bold</strong></p></body></html>"
Round-trip Conversion
# Start with HTML
original_html = '<p>This is <em>styled</em> text.</p>'
# Convert to document model
document = Prosereflect::Input::Html.parse(original_html)
# Modify the document if needed
document.paragraphs.first.add_text(' with additions')
# Convert back to HTML
modified_html = Prosereflect::Output::Html.convert(document)
Data model
The prosereflect gem represents the document structure as a hierarchy of node objects.
+-------------------+
| Document |
| |
| +content |
+--------+----------+
|
| 1..*
+--------v----------+
| Node |
| |
| -type |
| -attrs |
| -marks |
| +content |
+-------------------+
|
+----+----+---------------------+-------------+
| | | |
+---v---+ +---v----------+ +-------v--------+ +-v-----+
|Table | | Paragraph | | Text | | User |
| | | | | | | |
+---+---+ +--------------+ +----------------+ +-------+
|
|
+---v-----------+
| TableRow |
| |
+---+-----------+
|
+---v-----------+
| TableCell |
| |
+---------------+
Classes
Node
Base class for all node types.
type
-
The node type (e.g., "doc", "paragraph", "text", "table")
content
-
A collection of child nodes
attrs
-
Attributes specific to the node type
marks
-
Formatting marks applied to the node
Document
Top-level container representing a ProseMirror document.
content
-
A collection of top-level nodes in the document
Paragraph
Represents a paragraph of text.
text_content
-
Returns the combined text content of all child text nodes
Text
Represents a text node.
text
-
The text content of the node
User
Represents a user mention in the document.
id
-
The unique identifier of the referenced user
type
-
Always set to "user"
content
-
Always empty (user mentions cannot have child nodes)
Table
Represents a table structure.
rows
-
Collection of table rows
header_row
-
First row if it contains header cells
data_rows
-
All non-header rows
Heading
Represents a heading element (h1-h6).
level
-
The heading level (1-6)
text_content
-
Returns the combined text content of all child text nodes
content
-
Collection of child nodes (text, styled text, etc.)
Image
Represents an image element.
src
-
The image source URL
alt
-
Alternative text description
title
-
Image tooltip text
width
-
Image width in pixels
height
-
Image height in pixels
HorizontalRule
Represents a horizontal rule (hr) element.
style
-
Border style (solid, dashed, dotted)
width
-
Rule width (px or %)
thickness
-
Border thickness in pixels
BulletList
Represents an unordered list.
bullet_style
-
List style type (disc, circle, square)
items
-
Collection of list items
OrderedList
Represents an ordered list.
start
-
Starting number for the list
items
-
Collection of list items
ListItem
Represents a list item within ordered or unordered lists.
content
-
Collection of child nodes (can contain paragraphs, nested lists, etc.)
text_content
-
Returns the combined text content
Blockquote
Represents a blockquote element.
citation
-
Optional citation URL
blocks
-
Collection of content blocks within the quote
CodeBlockWrapper
Container for code blocks with additional attributes.
line_numbers
-
Whether to display line numbers
highlight_lines
-
Array of line numbers to highlight
code_blocks
-
Collection of code blocks
CodeBlock
Represents a code block with syntax highlighting.
content
-
The code content
language
-
Programming language for syntax highlighting
Mark
Base class for text formatting marks.
Available Mark Types
Bold
-
Bold text formatting
Italic
-
Italic text formatting
Code
-
Inline code formatting
Link
-
Hyperlink with href attribute
Strike
-
Strikethrough text
Subscript
-
Subscript text
Superscript
-
Superscript text
Underline
-
Underlined text
TableRow
Represents a row in a table.
cells
-
All cells in the row
TableCell
Represents a cell in a table.
paragraphs
-
All paragraphs in the cell
text_content
-
All text content combined
lines
-
Text content split into separate lines
Development
Adding test fixtures
The repository includes a utility script bin/extract-ituob-amendments.rb
to
extract ProseMirror content from the ITU Operational Bulletin for test fixtures.
Syntax:
$ bin/extract-ituob-amendments.rb {filename} {issue_number}
Where,
{filename}
-
The amendments YAML file to extract from. The script expects the
{filename}
file in the format used by the ITU Operational Bulletin data repository: https://github.com/ituob/itu-ob-data/ {issue_number}
-
The issue number to use in the generated file names.
This command:
-
Extract ProseMirror content from the specified amendments file
-
Generate both YAML and JSON files in the current directory
-
Name files according to the pattern
ituob-<issue_number>-<publication>.<format>
These generated files can be moved to spec/fixtures/ituob-<issue_number>/
to use in tests.
$ bin/extract-ituob-amendments.rb amendments.yaml 1000
Copyright
This gem is developed, maintained and funded by Ribose Inc.
License
The gem is available as open source under the terms of the 2-Clause BSD License.