The project is in a healthy, maintained state
HTML sanitizer with configurable allow lists for tags and attributes. Strip dangerous elements like script, style, and iframe tags, remove event attributes, and safely render user-generated content.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
 Dependencies
 Project Readme

philiprehberger-sanitize_html

Tests Gem Version Last updated

HTML sanitizer with configurable allow lists, security profiles, and URL/CSS sanitization for safe user content rendering

Requirements

  • Ruby >= 3.1

Installation

Add to your Gemfile:

gem "philiprehberger-sanitize_html"

Or install directly:

gem install philiprehberger-sanitize_html

Usage

require "philiprehberger/sanitize_html"

# Clean HTML with default allowed tags
safe = Philiprehberger::SanitizeHtml.clean('<p>Hello <script>alert("xss")</script></p>')
# => "<p>Hello </p>"

Custom Allow Lists

Philiprehberger::SanitizeHtml.clean(
  '<div class="box"><span>text</span></div>',
  tags: %w[div span],
  attributes: { 'div' => %w[class] }
)
# => '<div class="box"><span>text</span></div>'

Security Profiles

# :strict - removes all tags
Philiprehberger::SanitizeHtml.clean('<p>Hello <b>world</b></p>', profile: :strict)
# => "Hello world"

# :moderate - basic formatting (p, br, strong, em, b, i, u, lists, blockquote)
Philiprehberger::SanitizeHtml.clean('<p>Hello <b>world</b></p>', profile: :moderate)
# => "<p>Hello <b>world</b></p>"

# :permissive - most safe tags (formatting, links, images, tables, divs, spans)
Philiprehberger::SanitizeHtml.clean('<div><table><tr><td>cell</td></tr></table></div>', profile: :permissive)
# => "<div><table><tr><td>cell</td></tr></table></div>"

# :markdown - code, links, formatting, headings, tables
Philiprehberger::SanitizeHtml.clean('<pre><code>puts "hi"</code></pre>', profile: :markdown)
# => '<pre><code>puts "hi"</code></pre>'

URL Protocol Sanitization

# Default: allows http, https, mailto
Philiprehberger::SanitizeHtml.clean('<a href="javascript:alert(1)">click</a>')
# => "<a>click</a>"

# Custom allowed protocols
Philiprehberger::SanitizeHtml.clean(
  '<a href="ftp://files.example.com/doc.pdf">download</a>',
  allowed_protocols: %w[http https ftp]
)
# => '<a href="ftp://files.example.com/doc.pdf">download</a>'

Data URI Filtering

# Allow specific MIME types for data: URIs
Philiprehberger::SanitizeHtml.clean(
  '<a href="data:image/png;base64,abc123">image</a>',
  allowed_data_mimes: ['image/png', 'image/jpeg']
)
# => '<a href="data:image/png;base64,abc123">image</a>'

CSS Sanitization

# Safe CSS properties are preserved, dangerous ones are stripped
Philiprehberger::SanitizeHtml.clean(
  '<p style="color: red; expression(alert(1))">text</p>',
  tags: %w[p],
  attributes: { 'p' => %w[style] }
)
# => '<p style="color: red">text</p>'

Callback Hooks

# Custom tag processing with on_tag callback
result = Philiprehberger::SanitizeHtml.clean(
  '<a href="http://example.com">link</a>',
  on_tag: ->(tag, attrs) {
    attrs['rel'] = 'nofollow' if tag == 'a'
    attrs
  }
)

# Return nil from callback to remove a tag
result = Philiprehberger::SanitizeHtml.clean(
  '<p>Keep</p><strong>Remove</strong>',
  on_tag: ->(tag, _attrs) { tag == 'strong' ? nil : {} }
)
# => "<p>Keep</p>"

Strip All Tags

Philiprehberger::SanitizeHtml.strip('<p>Hello <strong>world</strong></p>')
# => "Hello world"

Escape HTML

Philiprehberger::SanitizeHtml.escape('<p>Hello</p>')
# => "&lt;p&gt;Hello&lt;/p&gt;"

API

Method / Constant Description
.clean(html, tags:, attributes:, profile:, allowed_protocols:, allowed_data_mimes:, on_tag:) Sanitize HTML keeping only allowed tags and attributes with optional security profile, URL sanitization, data URI filtering, and callback hooks
.strip(html) Remove all HTML tags, returning plain text (with entity normalization)
.escape(html) Entity-encode all HTML special characters
DEFAULT_ALLOWED_TAGS Frozen array of tag names allowed by default (p, br, strong, em, b, i, u, a, ul, ol, li, blockquote, code, pre, h1-h6)
DEFAULT_ALLOWED_ATTRIBUTES Frozen hash of attributes allowed per tag (a => href, title; img => src, alt)
DEFAULT_ALLOWED_PROTOCOLS Frozen array of allowed URL protocols (http, https, mailto)
DEFAULT_ALLOWED_DATA_MIMES Frozen empty array of allowed data URI MIME types (none by default)
SAFE_CSS_PROPERTIES Frozen array of CSS property names considered safe for style attributes
PROFILES Frozen hash of predefined security profiles (:strict, :moderate, :permissive, :markdown)
DANGEROUS_TAGS Frozen array of tags always removed with their content (script, style, iframe)
EVENT_ATTRIBUTE_PATTERN Regex matching event-handler attributes (e.g. onclick, onload) that are always stripped
Error Base error class for the module (Philiprehberger::SanitizeHtml::Error)

Development

bundle install
bundle exec rspec
bundle exec rubocop

Support

If you find this project useful:

Star the repo

🐛 Report issues

💡 Suggest features

❤️ Sponsor development

🌐 All Open Source Projects

💻 GitHub Profile

🔗 LinkedIn Profile

License

MIT