No commit activity in last 3 years
No release in over 3 years
A simple library for converting HTML into an approximation in plain text.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

>= 0
>= 0
> 2.6.0

Runtime

>= 1.4.0
 Project Readme

HTML To Plain Text¶ ↑

gem install html_to_plain_text

A simple gem that provide code to convert HTML into a plain text alternative. Line breaks from HTML block level elements will be maintained. Lists and tables will also maintain a little bit of formatting.

  • Line breaks will be approximated using the generally established default margins for HTML tags (i.e. <p>

tag generates two line breaks, <div> generates one)

  • Lists items will be numbered or bulleted with an asterisk


  • tags will add line breaks

  • <hr> tags will add a string of hyphens to serve as a horizontal rule

  • <table> elements will enclosed in “|” delimiters

  • <a> tags will have the href URL appended to the text in parentheses

  • Formatting tags like <strong> or <b> will be stripped

  • Formatting inside <pre> or <plaintext> elements will be honored

  • Code-like tags like <script> or <style> will be stripped

Usage¶ ↑

require 'html_to_plain_text'
html = "<h1>Hello</h1><p>world!</p>"
HtmlToPlainText.plain_text(html)
=> "Hello\n\nworld!"