Search results for 'crawler' - The Ruby Toolbox

Cobweb is a web crawler that can use resque to cluster crawls to quickly crawl extremely large sites which is much more performant than multi-threaded crawlers. It is also a standalone crawler that has a sophisticated statistics monitoring interface to monitor the progress of the crawls.

2019

2020

2021

2022

2023

2024

323,704

228

1.2.1

2010-11-10

2021-01-09

Show more project details Compare

instagram-crawler

0.08

No release in over 3 years

Low commit activity in last 3 years

There's a lot of open issues

instagram-crawler mgleon08/instagram-crawler Homepage

Crawl instagram photos, posts and videos for download.

2019

2020

2021

2022

2023

2024

7,333

197

0.3.0

2018-11-23

2019-04-14

Show more project details Compare

voight_kampff

0.26

Low commit activity in last 3 years

No release in over a year

voight_kampff biola/voight-kampff Homepage

Voight-Kampff detects bots, spiders, crawlers and replicants

2019

2020

2021

2022

2023

2024

6,442,350

177

2.0.0

2011-05-11

2023-03-12

Show more project details Compare

rubyretriever

0.08

No release in over 3 years

Low commit activity in last 3 years

There's a lot of open issues

rubyretriever joenorton/rubyretriever Homepage

Asynchronous web crawler, scraper and file harvester

2019

2020

2021

2022

2023

2024

67,085

141

1.4.6

2014-05-25

2016-04-11

Show more project details Compare

crawler_detect

0.07

User Agent Detection

Low commit activity in last 3 years

A long-lived project that still receives updates

crawler_detect loadkpi/crawler_detect Homepage

CrawlerDetect is a library to detect bots/crawlers via the user agent

2019

2020

2021

2022

2023

2024

1,016,608

110

1.2.4

2018-08-05

2024-03-20

Show more project details Compare

polipus

0.07

No commit activity in last 3 years

No release in over 3 years

There's a lot of open issues

polipus taganaka/polipus Homepage

An easy to use distributed web-crawler framework based on Redis

2019

2020

2021

2022

2023

2024

51,416

0.5.1

2014-01-05

2015-07-17

Show more project details Compare

google_ajax_crawler

0.03

No commit activity in last 3 years

No release in over 3 years

google_ajax_crawler benkitzelman/google-ajax-crawler Homepage

Rack Middleware adhering to the Google Ajax Crawling Scheme, using a headless browser to render JS heavy pages and serve a dom snapshot of the rendered state to a requesting search engine.

2019

2020

2021

2022

2023

2024

15,812

0.2.0

2013-03-16

2013-07-13

Show more project details Compare

wayback_archiver

0.03

No release in over 3 years

Low commit activity in last 3 years

wayback_archiver buren/wayback_archiver Homepage

Post URLs to Wayback Machine (Internet Archive), using a crawler, from Sitemap(s) or a list of URLs.

2019

2020

2021

2022

2023

2024

46,017

1.4.0

2014-07-17

2021-04-23

Show more project details Compare

grell

0.02

No commit activity in last 3 years

No release in over 3 years

grell mdsol/grell Homepage

Ruby web crawler using PhantomJS

2019

2020

2021

2022

2023

2024

86,120

2.1.2

2015-05-07

2021-02-17

Show more project details Compare

cosmicrawler

0.02

Repository is archived

No commit activity in last 3 years

No release in over 3 years

cosmicrawler bash0c7/cosmicrawler Homepage

Cosmicrawler is crawler library for Ruby. It provides scalable asynchronous crawling by (http|file|etc) using EventMachine.

2019

2020

2021

2022

2023

2024

5,066

0.0.1

2013-03-11

2013-03-11

Show more project details Compare

validate-website

0.03

Low commit activity in last 3 years

No release in over a year

validate-website spk/validate-website Homepage

validate-website is a web crawler for checking the markup validity with XML Schema / DTD and not found urls.

2019

2020

2021

2022

2023

2024

125,279

1.12.0

2009-10-24

2022-11-15

Show more project details Compare

arachnid

0.03

No commit activity in last 3 years

No release in over 3 years

arachnid dchuk/arachnid Homepage

Arachnid is a web crawler that relies on Bloom Filters to efficiently store visited urls and Typhoeus to avoid the overhead of Mechanize when crawling every page on a domain.

2019

2020

2021

2022

2023

2024

48,714

0.4.1

2011-11-11

2014-01-16

Show more project details Compare

is_crawler

0.02

No commit activity in last 3 years

No release in over 3 years

is_crawler ccashwell/is_crawler Homepage

is_crawler does exactly what you might think it does: determine if the supplied string matches a known crawler or bot.

2019

2020

2021

2022

2023

2024

160,690

0.1.5

2013-02-27

2013-05-23

Show more project details Compare

wriggle

0.01

No commit activity in last 3 years

No release in over 3 years

wriggle tsigo/wriggle Homepage

A simple directory crawler DSL.

2019

2020

2021

2022

2023

2024

13,996

1.3.0

2010-10-09

2011-03-09

Show more project details Compare

render_static

0.01

No commit activity in last 3 years

No release in over 3 years

render_static herval/render_static Homepage

render_static allows you to make your single-page apps (Backbone, Angular, etc) built on Rails SEO-friendly. It works by injecting a small rack middleware that will render pages as plain html, when the requester is one of the most common crawlers/bots out there (Google, Yahoo Baidu and Bing)

2019

2020

2021

2022

2023

2024

4,133

0.0.0