0.0
The project is in a healthy, maintained state
Ruby model accessor for prosereflect document trees.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
 Dependencies

Runtime

 Project Readme

prosereflect gem: Library for ProseMirror documents

Gem Version Build Status Pull Requests Commits since latest

Purpose

prosereflect is a Ruby gem for working with the document structure used by the ProseMirror rich text editor.

It provides a set of models and utilities for parsing, manipulating, and accessing the hierarchical document tree structure represented in ProseMirror’s JSON/YAML format. This allows for convenient traversal and extraction of content from rich text documents.

Installation

Add this line to your application’s Gemfile:

gem 'prosereflect'

And then execute:

$ bundle install

Or install it yourself as:

$ gem install prosereflect

Usage

Parsing ProseMirror documents

From YAML

require 'prosereflect'

# Parse from YAML string or file
yaml_content = File.read('document.yaml')
document = Prosereflect::Parser.parse_document(yaml_content)

# Access the document structure
document.content.each do |node|
  # Work with nodes
end

From JSON

require 'prosereflect'

# Parse from JSON string or file
json_content = File.read('document.json')
document = Prosereflect::Parser.parse_document(json_content)

Navigating the document

# Get all tables in the document
tables = document.tables

# Get all paragraphs
paragraphs = document.paragraphs

# Access the first table
first_table = document.find_first('table')

# Access header row and data rows in a table
header = first_table.header_row
data_rows = first_table.data_rows

# Access cells in a table
cell = first_table.cell_at(0, 0)  # First data row, first column

Accessing content

# Get text content from a paragraph
paragraph = document.paragraphs.first
text = paragraph.text_content

# Get text content from a table cell
cell = document.tables.first.cell_at(0, 0)
cell_text = cell.text_content

# Get cell content as separate lines
lines = cell.lines

Finding nodes

# Find the first node of a specific type
table = document.find_first('table')
paragraph = document.find_first('paragraph')

# Find all nodes of a specific type
tables = document.find_all('table')
text_nodes = document.find_all('text')

# Find child nodes of a specific type
table_cells = table.find_children(TableCell)

HTML Conversion

The gem provides functionality to convert between HTML and ProseMirror document models.

From HTML

require 'prosereflect'

# Parse from HTML string
html_content = '<p>This is a <strong>bold</strong> text in a paragraph.</p>'
document = Prosereflect::Input::Html.parse(html_content)

# Access the document structure
paragraph = document.paragraphs.first
text_content = paragraph.text_content # "This is a bold text in a paragraph."

User Mentions

The gem supports user mentions in documents, which can be useful for social features or collaborative editing.

# Create a document with user mentions
document = Prosereflect::Document.create
paragraph = document.add_paragraph('Hello ')

# Add a user mention
user = Prosereflect::User.new
user.id = '123'
paragraph.add_child(user)

paragraph.add_text('!')

# Convert to HTML
html = Prosereflect::Output::Html.convert(document)
# => "<p>Hello <user-mention data-id=\"123\"></user-mention>!</p>"

# Parse HTML with user mentions
html_content = '<p>Hello <user-mention data-id="123"></user-mention>!</p>'
document = Prosereflect::Input::Html.parse(html_content)

# Access user mentions
user_mentions = document.find_all('user')
first_user = user_mentions.first
user_id = first_user.id # => "123"

User mentions are represented as <user-mention> elements in HTML with a data-id attribute containing the user’s identifier. When parsing HTML, these elements are converted to User nodes in the document model.

Common use cases: - Mentioning users in comments or messages - Tagging users in collaborative documents - Tracking user references in content

To HTML

require 'prosereflect'

# Create a document
document = Prosereflect::Document.create
paragraph = document.add_paragraph('Plain text')
paragraph.add_text(' with bold', [Prosereflect::Mark::Bold.new])

# Convert to HTML
html = Prosereflect::Output::Html.convert(document)
# => "<html><body><p>Plain text<strong> with bold</strong></p></body></html>"

Round-trip Conversion

# Start with HTML
original_html = '<p>This is <em>styled</em> text.</p>'

# Convert to document model
document = Prosereflect::Input::Html.parse(original_html)

# Modify the document if needed
document.paragraphs.first.add_text(' with additions')

# Convert back to HTML
modified_html = Prosereflect::Output::Html.convert(document)

Data model

The prosereflect gem represents the document structure as a hierarchy of node objects.

+-------------------+
|      Document     |
|                   |
| +content          |
+--------+----------+
         |
         | 1..*
+--------v----------+
|        Node       |
|                   |
| -type             |
| -attrs            |
| -marks            |
| +content          |
+-------------------+
         |
    +----+----+---------------------+-------------+
    |         |                     |             |
+---v---+ +---v----------+  +-------v--------+  +-v-----+
|Table  | |  Paragraph   |  |     Text       |  | User  |
|       | |              |  |                |  |       |
+---+---+ +--------------+  +----------------+  +-------+
    |
    |
+---v-----------+
|   TableRow    |
|               |
+---+-----------+
    |
+---v-----------+
|   TableCell   |
|               |
+---------------+

Classes

Node

Base class for all node types.

type

The node type (e.g., "doc", "paragraph", "text", "table")

content

A collection of child nodes

attrs

Attributes specific to the node type

marks

Formatting marks applied to the node

Document

Top-level container representing a ProseMirror document.

content

A collection of top-level nodes in the document

Paragraph

Represents a paragraph of text.

text_content

Returns the combined text content of all child text nodes

Text

Represents a text node.

text

The text content of the node

User

Represents a user mention in the document.

id

The unique identifier of the referenced user

type

Always set to "user"

content

Always empty (user mentions cannot have child nodes)

Table

Represents a table structure.

rows

Collection of table rows

header_row

First row if it contains header cells

data_rows

All non-header rows

Heading

Represents a heading element (h1-h6).

level

The heading level (1-6)

text_content

Returns the combined text content of all child text nodes

content

Collection of child nodes (text, styled text, etc.)

Image

Represents an image element.

src

The image source URL

alt

Alternative text description

title

Image tooltip text

width

Image width in pixels

height

Image height in pixels

HorizontalRule

Represents a horizontal rule (hr) element.

style

Border style (solid, dashed, dotted)

width

Rule width (px or %)

thickness

Border thickness in pixels

BulletList

Represents an unordered list.

bullet_style

List style type (disc, circle, square)

items

Collection of list items

OrderedList

Represents an ordered list.

start

Starting number for the list

items

Collection of list items

ListItem

Represents a list item within ordered or unordered lists.

content

Collection of child nodes (can contain paragraphs, nested lists, etc.)

text_content

Returns the combined text content

Blockquote

Represents a blockquote element.

citation

Optional citation URL

blocks

Collection of content blocks within the quote

CodeBlockWrapper

Container for code blocks with additional attributes.

line_numbers

Whether to display line numbers

highlight_lines

Array of line numbers to highlight

code_blocks

Collection of code blocks

CodeBlock

Represents a code block with syntax highlighting.

content

The code content

language

Programming language for syntax highlighting

Mark

Base class for text formatting marks.

Available Mark Types

Bold

Bold text formatting

Italic

Italic text formatting

Code

Inline code formatting

Link

Hyperlink with href attribute

Strike

Strikethrough text

Subscript

Subscript text

Superscript

Superscript text

Underline

Underlined text

TableRow

Represents a row in a table.

cells

All cells in the row

TableCell

Represents a cell in a table.

paragraphs

All paragraphs in the cell

text_content

All text content combined

lines

Text content split into separate lines

Development

Adding test fixtures

The repository includes a utility script bin/extract-ituob-amendments.rb to extract ProseMirror content from the ITU Operational Bulletin for test fixtures.

Syntax:

$ bin/extract-ituob-amendments.rb {filename} {issue_number}

Where,

{filename}

The amendments YAML file to extract from. The script expects the {filename} file in the format used by the ITU Operational Bulletin data repository: https://github.com/ituob/itu-ob-data/

{issue_number}

The issue number to use in the generated file names.

This command:

  1. Extract ProseMirror content from the specified amendments file

  2. Generate both YAML and JSON files in the current directory

  3. Name files according to the pattern ituob-<issue_number>-<publication>.<format>

These generated files can be moved to spec/fixtures/ituob-<issue_number>/ to use in tests.

$ bin/extract-ituob-amendments.rb amendments.yaml 1000

This gem is developed, maintained and funded by Ribose Inc.

License

The gem is available as open source under the terms of the 2-Clause BSD License.