Search results for 'crawler' - The Ruby Toolbox

Cobweb is a web crawler that can use resque to cluster crawls to quickly crawl extremely large sites which is much more performant than multi-threaded crawlers. It is also a standalone crawler that has a sophisticated statistics monitoring interface to monitor the progress of the crawls.

2019

2020

2021

2022

2023

2024

323,609

228

1.2.1

2010-11-10

2021-01-09

Show more project details Compare

apollo-crawler

0.01

No commit activity in last 3 years

No release in over 3 years

apollo-crawler korczis/apollo-crawler Homepage

Gem for crawling data from external sources

2019

2020

2021

2022

2023

2024

266,373

0.1.31

2013-02-23

2013-03-27

Show more project details Compare

wombat

0.55

Web Content Scrapers

Low commit activity in last 3 years

There's a lot of open issues

No release in over a year

wombat felipecsl/wombat Homepage

Generic Web crawler with a DSL that parses structured data from web pages

2019

2020

2021

2022

2023

2024

204,807

1,303

3.0.0

2011-12-27

2022-08-23

Show more project details Compare

is_crawler

0.02

No commit activity in last 3 years

No release in over 3 years

is_crawler ccashwell/is_crawler Homepage

is_crawler does exactly what you might think it does: determine if the supplied string matches a known crawler or bot.

2019

2020

2021

2022

2023

2024

160,690

0.1.5

2013-02-27

2013-05-23

Show more project details Compare

validate-website

0.03

Low commit activity in last 3 years

No release in over a year

validate-website spk/validate-website Homepage

validate-website is a web crawler for checking the markup validity with XML Schema / DTD and not found urls.

2019

2020

2021

2022

2023

2024

125,277

1.12.0

2009-10-24

2022-11-15

Show more project details Compare

http_crawler

0.0

A long-lived project that still receives updates

http_crawler superjagger/http_crawler Homepage

初级开发工程师，基于 http 写的爬虫扩展包。请不要随意下载里面有很多坑。

2019

2020

2021

2022

2023

2024

110,023

0.3.2.12

2018-12-28

2024-02-26

Show more project details Compare

promoqui-api-sdk

0.0

No commit activity in last 3 years

No release in over 3 years

promoqui-api-sdk promoqui/promoqui-api-ruby Homepage

This gem helps Crawler Writers to interact with the PromoQui REST API

2019

2020

2021

2022

2023

2024

96,654

2.1.8

2015-01-15

2018-11-14

Show more project details Compare

grell

0.02

No commit activity in last 3 years

No release in over 3 years

grell mdsol/grell Homepage

Ruby web crawler using PhantomJS

2019

2020

2021

2022

2023

2024

86,120

2.1.2

2015-05-07

2021-02-17

Show more project details Compare

rubyretriever

0.08

No release in over 3 years

Low commit activity in last 3 years

There's a lot of open issues

rubyretriever joenorton/rubyretriever Homepage

Asynchronous web crawler, scraper and file harvester

2019

2020

2021

2022

2023

2024

67,085

141

1.4.6

2014-05-25

2016-04-11

Show more project details Compare

cangrejo

0.0

No commit activity in last 3 years

No release in over 3 years

cangrejo platanus/cangrejo-gem Homepage

Cangrejo lets you consume crabfarm crawlers using a simple DSL

2019

2020

2021

2022

2023

2024

63,812

0.2.5

2015-01-02

2016-03-30

Show more project details Compare

kabutops

0.0

No commit activity in last 3 years

No release in over 3 years

kabutops reneklacan/kabutops Homepage

Dead simple yet powerful Ruby crawler for easy parallel crawling with support for an anonymity.

2019

2020

2021

2022

2023

2024

60,745

0.3.0

2014-06-16

2015-11-23

Show more project details Compare

krawler

0.0

Repository is gone

No release in over 3 years

krawler Homepage

Simple little website crawler.

2019

2020

2021

2022

2023

2024

60,380

1.0.14

2012-05-10

2013-03-19

Show more project details Compare

dmm-crawler

0.0

No commit activity in last 3 years

No release in over 3 years

dmm-crawler sachin21/dmm-crawler Homepage

Show DMM and DMM.R18's crawled data. e.g. ranking

2019

2020

2021

2022

2023

2024

57,953

0.4.5

2017-04-29

2019-05-25

Show more project details Compare

polipus

0.07

No commit activity in last 3 years

No release in over 3 years

There's a lot of open issues

polipus taganaka/polipus Homepage

An easy to use distributed web-crawler framework based on Redis

2019

2020

2021

2022

2023

2024

51,416

0.5.1

2014-01-05

2015-07-17

Show more project details Compare

omelete

0.0

Repository is gone

No release in over 3 years

omelete Homepage

Ruby web crawler to access omelete informations

2019

2020

2021

2022

2023

2024

50,161

2.0.7

2012-05-06

2013-01-25

Show more project details Compare

arachnid

0.03

No commit activity in last 3 years

No release in over 3 years

arachnid dchuk/arachnid Homepage

Arachnid is a web crawler that relies on Bloom Filters to efficiently store visited urls and Typhoeus to avoid the overhead of Mechanize when crawling every page on a domain.

2019

2020

2021

2022

2023

2024

48,702

0.4.1

2011-11-11

2014-01-16

Show more project details Compare

rdig

0.01

No commit activity in last 3 years

No release in over 3 years

rdig jkraemer/rdig Homepage

Website crawler and fulltext indexer.

2019

2020

2021

2022

2023

2024

48,101

0.3.12

2006-03-25

2009-04-25

Show more project details Compare

iudex-da

0.0

No release in over 3 years

iudex-da Homepage

Iudex is a general purpose web crawler and feed processor in ruby/java. The iudex-da gem provides a PostgreSQL-based content meta-data store and work priority queue.

2019

2020

2021

2022

2023

2024

47,365

1.7.0

2011-04-04

2017-07-07

Show more project details Compare

Categories

Projects