0.0
No release in over 3 years
ReadabilityJs is a Ruby wrapper gem for the mozilla readability library to extract the main content from web pages. It uses the Nodo gem to run the JavaScript Readability library in a Node.js environment, allowing for efficient and accurate content extraction within Ruby applications.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
 Dependencies

Development

>= 1.14
= 0.8.1
= 0.14.1
>= 10.0
>= 3.0

Runtime

~> 1.8
~> 1.18
~> 0.6.3
 Project Readme

ReadabilityJS for Ruby

Gem Gem License: MIT

Clean up web pages and extract the main content, powered by Mozilla Readability.

This is a Ruby wrapper gem for readability, by running a node process with nodo.

Contents

  • Installation
  • Usage examples
  • Documentation
  • Contributing

Installation

Prerequisites

NodeJS >= 22.x is installed and available via commandline (in PATH).

Gem

Add this line to your application's Gemfile:

gem 'readability_js'

And then execute:

$ bundle install

Or install it yourself as:

$ gem install readability_js

Usage examples

Original parse

Using this method, only the mozilla readability parse method is called.

    require 'readability_js'
    html = File.read("my_article.html")
    result = ReadabilityJs.parse(html)
    p result

Extended parse

Using this method, the extended parse method is called, which provides more cleaned up output, and includes a beautified markdown version of the content.

    require 'readability_js'
    html = File.read("my_article.html")
    result = ReadabilityJs.parse_extended(html)
    p result

Query parameters

You can pass all parameters supported by readability, checkout the rubydoc for more details.

Here an example with all parameters, the camelCase parameters are converted to snake_case in ruby:

    require 'readability_js'
data = ReadabilityJs.parse(
  # TODO: add parameters here
)
# => Hash

Query response

The response object is of type Hash. It contains the data returned by readability, with hash keys transformed in snake_case.

{
  "title" => "Article Title",
  "content" => "<div>...</div>",
  "text_content" => "Plain text content",
  "markdown_content" => "## Markdown content", # only for extended parse
  "length" => 1234,
  "excerpt" => "This is an excerpt of the article...",
  "byline" => "Author Name",
  "dir" => "ltr",
  "site_name" => "example.com",
  "lang" => "en",
  "published_time" => "2024-01-01T12:00:00Z"
}    

Documentation

Check out the doc at RubyDoc:
https://www.rubydoc.info/gems/readability_js

As this library is only a wrapper, checkout the original readability documentation:
https://github.com/mozilla/readability

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/magynhard/ruby-readability_js.

This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.