0.0
The project is in a healthy, maintained state
Ruby port of Mozilla Readability.js - extracts the main content from web pages, like Firefox Reader View
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
 Dependencies

Runtime

~> 1.14
 Project Readme

Readability

Ruby port of Mozilla Readability.js -- extract readable article content from HTML pages, like Firefox Reader View.

Gem Version Build Status License

Passes all 130 Mozilla test fixtures.

Installation

Add this line to your application's Gemfile:

gem "readability-rb"

Quick Start

result = Readability.parse(html, url: "https://example.com/article")

result.title        # article title
result.byline       # author name
result.content      # cleaned HTML content
result.text_content # plain text content
result.excerpt      # short summary
result.length       # text content length
result.site_name    # site name
result.published_time # publication date
result.dir          # text direction
result.lang         # language

Usage

Parse an article

html = Net::HTTP.get(URI("https://example.com/article"))
result = Readability.parse(html, url: "https://example.com/article")

puts result.title
puts result.content

Returns a Readability::Result or nil if parsing fails.

Check if a page is readable

if Readability.readerable?(html)
  result = Readability.parse(html)
end

Accepts min_score and min_content_length options.

Readability.readerable?(html, min_score: 30, min_content_length: 200)

Use the lower-level API

Pass a Nokogiri document directly.

doc = Nokogiri::HTML5(html)
result = Readability::Document.new(doc, url: "https://example.com").parse

Custom serializer

Replace the default HTML serializer.

result = Readability.parse(html, serializer: ->(el) { el.to_html })

Options

Option Description Default
url Base URL for resolving relative links nil
max_elems_to_parse Max elements before aborting (0 = no limit) 0
nb_top_candidates Number of top candidates to consider 5
char_threshold Min characters for a successful parse 500
classes_to_preserve CSS classes to keep on elements []
keep_classes Preserve all CSS classes false
disable_json_ld Skip JSON-LD metadata extraction false
allowed_video_regex Regex for allowed video embed URLs built-in
link_density_modifier Adjust link density calculation 0
serializer Lambda to serialize the content element inner_html
max_attributes Max attributes per element at parse time 1000
max_tree_depth Max document tree depth at parse time (-1 to disable) 1000

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

License

Apache 2.0