Search results for 'crawler' - The Ruby Toolbox

Cobweb is a web crawler that can use resque to cluster crawls to quickly crawl extremely large sites which is much more performant than multi-threaded crawlers. It is also a standalone crawler that has a sophisticated statistics monitoring interface to monitor the progress of the crawls.

2019

2020

2021

2022

2023

2024

323,609

228

1.2.1

2010-11-10

2021-01-09

Show more project details Compare

voight_kampff

0.26

Low commit activity in last 3 years

No release in over a year

voight_kampff biola/voight-kampff Homepage

Voight-Kampff detects bots, spiders, crawlers and replicants

2019

2020

2021

2022

2023

2024

6,440,324

177

2.0.0

2011-05-11

2023-03-12

Show more project details Compare

polipus

0.07

No commit activity in last 3 years

No release in over 3 years

There's a lot of open issues

polipus taganaka/polipus Homepage

An easy to use distributed web-crawler framework based on Redis

2019

2020

2021

2022

2023

2024

51,416

0.5.1

2014-01-05

2015-07-17

Show more project details Compare

rubyretriever

0.08

No release in over 3 years

Low commit activity in last 3 years

There's a lot of open issues

rubyretriever joenorton/rubyretriever Homepage

Asynchronous web crawler, scraper and file harvester

2019

2020

2021

2022

2023

2024

67,085

141

1.4.6

2014-05-25

2016-04-11

Show more project details Compare

instagram-crawler

0.08

No release in over 3 years

Low commit activity in last 3 years

There's a lot of open issues

instagram-crawler mgleon08/instagram-crawler Homepage

Crawl instagram photos, posts and videos for download.

2019

2020

2021

2022

2023

2024

7,333

197

0.3.0

2018-11-23

2019-04-14

Show more project details Compare

arachnid

0.03

No commit activity in last 3 years

No release in over 3 years

arachnid dchuk/arachnid Homepage

Arachnid is a web crawler that relies on Bloom Filters to efficiently store visited urls and Typhoeus to avoid the overhead of Mechanize when crawling every page on a domain.

2019

2020

2021

2022

2023

2024

48,702

0.4.1

2011-11-11

2014-01-16

Show more project details Compare

crawler_detect

0.07

User Agent Detection

Low commit activity in last 3 years

A long-lived project that still receives updates

crawler_detect loadkpi/crawler_detect Homepage

CrawlerDetect is a library to detect bots/crawlers via the user agent

2019

2020

2021

2022

2023

2024

1,016,280

110

1.2.4

2018-08-05

2024-03-20

Show more project details Compare

google_ajax_crawler

0.03

No commit activity in last 3 years

No release in over 3 years

google_ajax_crawler benkitzelman/google-ajax-crawler Homepage

Rack Middleware adhering to the Google Ajax Crawling Scheme, using a headless browser to render JS heavy pages and serve a dom snapshot of the rendered state to a requesting search engine.

2019

2020

2021

2022

2023

2024

15,812

0.2.0

2013-03-16

2013-07-13

Show more project details Compare

wayback_archiver

0.03

No release in over 3 years

Low commit activity in last 3 years

wayback_archiver buren/wayback_archiver Homepage

Post URLs to Wayback Machine (Internet Archive), using a crawler, from Sitemap(s) or a list of URLs.

2019

2020

2021

2022

2023

2024

46,011

1.4.0

2014-07-17

2021-04-23

Show more project details Compare

validate-website

0.03

Low commit activity in last 3 years

No release in over a year

validate-website spk/validate-website Homepage

validate-website is a web crawler for checking the markup validity with XML Schema / DTD and not found urls.

2019

2020

2021

2022

2023

2024

125,277

1.12.0

2009-10-24

2022-11-15

Show more project details Compare

grell

0.02

No commit activity in last 3 years

No release in over 3 years

grell mdsol/grell Homepage

Ruby web crawler using PhantomJS

2019

2020

2021

2022

2023

2024

86,120

2.1.2

2015-05-07

2021-02-17

Show more project details Compare

render_static

0.01

No commit activity in last 3 years

No release in over 3 years

render_static herval/render_static Homepage

render_static allows you to make your single-page apps (Backbone, Angular, etc) built on Rails SEO-friendly. It works by injecting a small rack middleware that will render pages as plain html, when the requester is one of the most common crawlers/bots out there (Google, Yahoo Baidu and Bing)

2019

2020

2021

2022

2023

2024

4,133

0.0.0

2013-05-08

2013-05-08

Show more project details Compare

rdig

0.01

No commit activity in last 3 years

No release in over 3 years

rdig jkraemer/rdig Homepage

Website crawler and fulltext indexer.

2019

2020

2021

2022

2023

2024

48,101

0.3.12

2006-03-25

2009-04-25

Show more project details Compare

is_crawler

0.02

No commit activity in last 3 years

No release in over 3 years

is_crawler ccashwell/is_crawler Homepage

is_crawler does exactly what you might think it does: determine if the supplied string matches a known crawler or bot.

2019

2020

2021

2022

2023

2024

160,690

0.1.5

2013-02-27

2013-05-23

Show more project details Compare

daimon_skycrawlers

0.01

Repository is archived

No commit activity in last 3 years

No release in over 3 years

daimon_skycrawlers bm-sms/daimon_skycrawlers Homepage

This is a crawler framework.

2019

2020

2021

2022

2023

2024

40,986

1.0.0

2016-01-27

2017-02-15

Show more project details Compare

medusa-crawler

0.01

No commit activity in last 3 years

No release in over 3 years

medusa-crawler brutuscat/medusa-crawler Homepage

== Medusa: a ruby crawler framework {rdoc-image:https://badge.fury.io/rb/medusa-crawler.svg}[https://rubygems.org/gems/medusa-crawler] rdoc-image:https://github.com/brutuscat/medusa-crawler/workflows/Ruby/badge.svg?event=push Medusa is a framework for the ruby language to crawl and collect usefu...

2019

2020

2021

2022

2023

2024

4,193

1.0.0

2020-08-06

2020-08-17

Show more project details Compare

driller

0.01

No commit activity in last 3 years

No release in over 3 years

driller shashikant86/driller Homepage

Driller is a command line Ruby based web crawler based on Anemone. Driller can crawl website and reports error pages and slow pages and generates HTML reports.

2019

2020

2021

2022

2023

2024

33,685

0.1.4