Compare Projects

Project comparisons allow you to view any selection of projects side by side just like they're shown on regular categories or in search results. You can try out an example or start yourself by adding a library to the comparison via the input below. You can also easily share your current comparison with others by sending the URL of the current page.

scrapetor

Full

Compact

Table

scrapetor

0.0

The project is in a healthy, maintained state

scrapetor alaa-abdulridha/scrapetor Homepage Documentation Source Code Bug Tracker Wiki

Scrapetor is a Ruby HTML parsing + scraping toolkit. The parser is a native C arena DOM with structural indexes built at parse time and NEON SIMD scanners in the SAX hot loop. A streaming extraction engine compiles the schema DSL into a single forward pass — no DOM materialised, one Ruby boundary crossing per document. On builds where libcurl is available, Scrapetor::Fetcher adds an HTTP/2-capable fetch layer with per-thread connection cache, shared DNS + TLS session pool, in-process gzip / deflate / brotli / zstd decoding, iconv charset transcoding, retry + exponential backoff, ETag / Last-Modified disk cache with bulk revalidation, per-host throttle, cookie jar, basic + bearer auth, proxy, and three bulk concurrency models (parallel_fetch / multi_fetch / streaming multi_each). Scrapetor::Session ties the cookie / auth / throttle / retry policies together. Also ships robots.txt + sitemap.xml parsers, a bounded-memory streaming HTML parser, and structured-data extractors (JSON-LD, OpenGraph, Schema.org, Microdata, RDFa, Twitter Cards). The Net::HTTP-based Scrapetor.fetch is preserved as the no-libcurl fallback.

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016