Search results for 'crawler' - The Ruby Toolbox

2013-05-07

driller shashikant86/driller Homepage Documentation Source Code Bug Tracker Wiki

driller

0.01

No commit activity in last 3 years

No release in over 3 years

Driller is a command line Ruby based web crawler based on Anemone. Driller can crawl website and reports error pages and slow pages and generates HTML reports.

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

33,873

Releases

0.1.4

2015-05-10

2015-05-18

Activity

2015-05-14

arachnid2 samnissen/arachnid2 Homepage Documentation Source Code Bug Tracker Wiki

arachnid2

Web Content Scrapers

0.01

Web Content Scrapers

No release in over 3 years

Low commit activity in last 3 years

A simple, fast web crawler

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

29,239

Releases

0.4.0

2018-05-29

2020-07-15

Activity

100%

69%

2019-04-15

medusa-crawler brutuscat/medusa-crawler Homepage Documentation Source Code Bug Tracker Wiki

medusa-crawler

0.01

No commit activity in last 3 years

No release in over 3 years

== Medusa: a ruby crawler framework {rdoc-image:https://badge.fury.io/rb/medusa-crawler.svg}[https://rubygems.org/gems/medusa-crawler] rdoc-image:https://github.com/brutuscat/medusa-crawler/workflows/Ruby/badge.svg?event=push Medusa is a framework for the ruby language to crawl and collect useful information about the pages it visits. It is versatile, allowing you to write your own specialized tasks quickly and easily. === Features * Choose the links to follow on each page with +focus_crawl+ * Multi-threaded design for high performance * Tracks +301+ HTTP redirects * Allows exclusion of URLs based on regular expressions * Records response time for each page * Obey _robots.txt_ directives (optional, but recommended) * In-memory or persistent storage of pages during crawl, provided by Moneta[https://github.com/moneta-rb/moneta] * Inherits OpenURI behavior (redirects, automatic charset and encoding detection, proxy configuration options). <b>Do you have an idea or a suggestion? {Open an issue and talk about it}[https://github.com/brutuscat/medusa-crawler/issues/new]</b> === Examples Medusa is versatile and to be used programatically, you can start with one or multiple URIs: require 'medusa' Medusa.crawl('https://www.example.com', depth_limit: 2) Or you can pass a block and it will yield the crawler back, to manage configuration or drive its crawling focus: require 'medusa' Medusa.crawl('https://www.example.com', depth_limit: 2) do |crawler| crawler.discard_page_bodies = some_flag # Persist all the pages state across crawl-runs. crawler.clear_on_startup = false crawler.storage = Medusa::Storage.Moneta(:Redis, 'redis://redis.host.name:6379/0') crawler.skip_links_like(/private/) crawler.on_pages_like(/public/) do |page| logger.debug "[public page] #{page.url} took #{page.response_time} found #{page.links.count}" end # Use an arbitrary logic, page by page, to continue customize the crawling. crawler.focus_crawl(/public/) do |page| page.links.first end end

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

4,255

Releases

1.0.0

2020-08-06

2020-08-17

Activity

80%

2020-05-23

grucrawler slava-vishnyakov/grucrawler Homepage Documentation Source Code Bug Tracker Wiki

grucrawler

0.0

No commit activity in last 3 years

No release in over 3 years

Simple crawler using Redis as backend

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

10,386

Releases

0.0.5

2014-11-18

2014-11-18

Activity

2014-11-18

rcrawler i2bskn/rcrawler Homepage Documentation Source Code Bug Tracker Wiki

rcrawler

0.0

No commit activity in last 3 years

No release in over 3 years

The wrapper of capybara for crawler

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

8,656

Releases

0.0.3

2013-07-02

2013-07-30

Activity

2013-07-21

iudex-filter Homepage Documentation

iudex-filter

0.0

No release in over 3 years

Iudex is a general purpose web crawler and feed processor in ruby/java. The iudex-filter gem contains a fundamental filtering/chain of responsbility sub-system.

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

20,439

Releases

1.4.0

2011-04-04

2015-05-04

Activity

dmm-crawler sachin21/dmm-crawler Homepage Documentation Source Code Bug Tracker

dmm-crawler

0.0

No commit activity in last 3 years

No release in over 3 years

Show DMM and DMM.R18's crawled data. e.g. ranking

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

58,443

Releases

0.4.5

2017-04-29

2019-05-25

Activity

100%

100%

2019-01-27

nicoquery-crawler Documentation

nicoquery-crawler

0.0

No release in over 3 years

crawler of niconico

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

3,587

Releases

0.0.1.4

2013-08-09

2013-08-09

Activity

caule rafaelss/caule Homepage Documentation Source Code Bug Tracker Wiki

caule

0.0

No commit activity in last 3 years

No release in over 3 years

DSL to build crawlers easily

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

4,671

Releases

0.0.1

2012-04-14

2012-04-14

Activity

2012-04-20

gildia_comics_crawler Documentation

gildia_comics_crawler

0.0

No release in over 3 years

Crawler for downloading comics from komiks.gildia.pl

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

9,133

Releases

0.0.3

2013-12-21

2013-12-22

Activity

nicocrawler Documentation

nicocrawler

0.0

No release in over 3 years

crawler of niconico-douga

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

6,169

Releases

0.0.2.1

2013-08-03

2013-08-03

Activity

cangrejo platanus/cangrejo-gem Homepage Documentation Source Code Bug Tracker Wiki

cangrejo

0.0

No commit activity in last 3 years

No release in over 3 years

Cangrejo lets you consume crabfarm crawlers using a simple DSL

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

64,265

Releases

0.2.5

2015-01-02

2016-03-30

Activity

33%

2015-06-27

manga-crawler thiagokimo/manga-crawler Homepage Documentation Source Code Bug Tracker Wiki

manga-crawler

0.0

No commit activity in last 3 years

No release in over 3 years

A gem that collects mangas from websites

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

16,471

Releases

0.2.0

2013-04-16

2013-04-22

Activity

100%

2013-07-11

govuk_seed_crawler alphagov/govuk_seed_crawler Homepage Documentation Source Code Bug Tracker

govuk_seed_crawler

0.0

Repository is archived

No release in over a year

Retrieves a list of URLs to seed the crawler by publishing them to a RabbitMQ exchange.

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

11,082

Releases

3.2.1

2015-08-28

2023-03-22

Activity

100%

91%

2023-03-13

polipus-elasticsearch stefanofontanelli/polipus-elasticsearch Homepage Documentation Source Code Bug Tracker Wiki

polipus-elasticsearch

0.0

No commit activity in last 3 years

No release in over 3 years

Add support for ElasticSearch in Polipus crawler

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

11,761

Releases

0.0.4

2015-07-17

2015-09-14

Activity

2015-08-03

otacrawler Homepage Documentation

otacrawler

0.0

Repository is gone

No release in over 3 years

This is web comic crawler that Akihabara otaku.

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

10,698

Releases

0.1.3

2016-07-12

2016-07-31

Activity