Search results for 'crawler' - The Ruby Toolbox

64%

2012-07-30

crawler tylercunnion/crawler Homepage Documentation Source Code Bug Tracker Wiki

crawler

0.01

No commit activity in last 3 years

No release in over 3 years

BFS webcrawler that implements Observable

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

13,876

Releases

0.2.1

2010-01-25

2010-01-25

Activity

100%

2010-02-03

kabutops reneklacan/kabutops Homepage Documentation Source Code Bug Tracker Wiki

kabutops

0.0

No commit activity in last 3 years

No release in over 3 years

Dead simple yet powerful Ruby crawler for easy parallel crawling with support for an anonymity.

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

61,107

Releases

0.3.0

2014-06-16

2015-11-23

Activity

2014-11-28

tia-crawler alaphao/tia-crawler Homepage Documentation Source Code Bug Tracker Wiki

tia-crawler

0.0

No commit activity in last 3 years

No release in over 3 years

Gem para acessar os dados do TIA Mackenzista

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

3,797

Releases

0.0.3

2017-08-14

2017-08-14

Activity

2017-10-13

catflap nyk/catflap Homepage Documentation Source Code Bug Tracker Wiki

catflap

0.0

No commit activity in last 3 years

No release in over 3 years

A simple solution to provide on-demand service access (e.g. port 80 on webserver), where a more robust and secure VPN solution is not available. Essentially, it is a more user-friendly form of "port knocking". The original proof-of-concept implementation was run for almost three years by Demotix, to protect development and staging servers from search engine crawlers and other unwanted traffic.

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

10,603

Releases

1.0.1

2013-12-01

2016-03-14

Activity

100%

2016-01-18

rack_staging glenngillen/rack_staging Homepage Documentation Source Code Bug Tracker Wiki

rack_staging

0.0

No commit activity in last 3 years

No release in over 3 years

Automatically protects your staging app from web crawlers and casual visitors.

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

20,519

Releases

0.2.0

2011-08-14

2011-10-08

Activity

50%

50%

2011-11-08

marmiton_crawler madeindjs/marmiton_crawler Homepage Documentation Source Code Bug Tracker Wiki

marmiton_crawler

0.01

Repository is archived

No commit activity in last 3 years

No release in over 3 years

A web scrawler to get a Marmiton's recipe

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

4,683

Releases

1.0.3

2016-10-09

2016-11-28

Activity

75%

100%

2017-09-23

driller shashikant86/driller Homepage Documentation Source Code Bug Tracker Wiki

driller

0.01

No commit activity in last 3 years

No release in over 3 years

Driller is a command line Ruby based web crawler based on Anemone. Driller can crawl website and reports error pages and slow pages and generates HTML reports.

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

33,873

Releases

0.1.4

2015-05-10

2015-05-18

Activity

2015-05-14

webget rubycoco/webclient Homepage Documentation Source Code Bug Tracker

webget

0.0

No release in over 3 years

Low commit activity in last 3 years

webget gem - a web (go get) crawler incl. web cache

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

13,020

Releases

0.2.5

2020-10-04

2021-02-21

Activity

100%

2018-02-10

cangrejo platanus/cangrejo-gem Homepage Documentation Source Code Bug Tracker Wiki

cangrejo

0.0

No commit activity in last 3 years

No release in over 3 years

Cangrejo lets you consume crabfarm crawlers using a simple DSL

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

64,265

Releases

0.2.5

2015-01-02

2016-03-30

Activity

33%

2015-06-27

medusa-crawler brutuscat/medusa-crawler Homepage Documentation Source Code Bug Tracker Wiki

medusa-crawler

0.01

No commit activity in last 3 years

No release in over 3 years

== Medusa: a ruby crawler framework {rdoc-image:https://badge.fury.io/rb/medusa-crawler.svg}[https://rubygems.org/gems/medusa-crawler] rdoc-image:https://github.com/brutuscat/medusa-crawler/workflows/Ruby/badge.svg?event=push Medusa is a framework for the ruby language to crawl and collect useful information about the pages it visits. It is versatile, allowing you to write your own specialized tasks quickly and easily. === Features * Choose the links to follow on each page with +focus_crawl+ * Multi-threaded design for high performance * Tracks +301+ HTTP redirects * Allows exclusion of URLs based on regular expressions * Records response time for each page * Obey _robots.txt_ directives (optional, but recommended) * In-memory or persistent storage of pages during crawl, provided by Moneta[https://github.com/moneta-rb/moneta] * Inherits OpenURI behavior (redirects, automatic charset and encoding detection, proxy configuration options). <b>Do you have an idea or a suggestion? {Open an issue and talk about it}[https://github.com/brutuscat/medusa-crawler/issues/new]</b> === Examples Medusa is versatile and to be used programatically, you can start with one or multiple URIs: require 'medusa' Medusa.crawl('https://www.example.com', depth_limit: 2) Or you can pass a block and it will yield the crawler back, to manage configuration or drive its crawling focus: require 'medusa' Medusa.crawl('https://www.example.com', depth_limit: 2) do |crawler| crawler.discard_page_bodies = some_flag # Persist all the pages state across crawl-runs. crawler.clear_on_startup = false crawler.storage = Medusa::Storage.Moneta(:Redis, 'redis://redis.host.name:6379/0') crawler.skip_links_like(/private/) crawler.on_pages_like(/public/) do |page| logger.debug "[public page] #{page.url} took #{page.response_time} found #{page.links.count}" end # Use an arbitrary logic, page by page, to continue customize the crawling. crawler.focus_crawl(/public/) do |page| page.links.first end end

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

4,255

Releases

1.0.0

2020-08-06

2020-08-17

Activity

80%

2020-05-23

creepy-crawler udryan10/creepy-crawler Homepage Documentation Source Code Bug Tracker Wiki

creepy-crawler

0.0

No commit activity in last 3 years

No release in over 3 years

web crawler that generates a sitemap to a neo4j database. It will also store broken_links and total number of pages on site

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

6,595

Releases

1.0.2

2014-05-10

2014-05-10

Activity

2014-05-07

fua behdadahmadi/fua Homepage Documentation Source Code Bug Tracker Wiki

fua

0.0

No commit activity in last 3 years

No release in over 3 years

Fake User-Agents of about %80 of real devices to use in headers of web crawlers. It keeps your script away from being nested by many UA strings.

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

2,964

Releases

1.0.3

2016-12-04

2016-12-04

Activity

2016-12-04

bank-crawlers-hapoalim joaomilho/bank-crawlers-hapoalim Homepage Documentation Source Code Bug Tracker Wiki

bank-crawlers-hapoalim

0.0

No commit activity in last 3 years

No release in over 3 years

A crappy crawler for a crappy bank interface

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

6,149

Releases

0.0.7

2015-04-03

2015-04-03

Activity

2014-04-11

botch namusyaka/botch Homepage Documentation Source Code Bug Tracker Wiki

botch

0.0

No commit activity in last 3 years

No release in over 3 years

Botch is a DSL for quickly creating web crawlers. Inspired by Sinatra.

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

20,630

Releases

0.1.5

2013-07-14

2013-08-08

Activity

100%

100%

2013-07-22

bankcrawlers-hapoalim joaomilho/bank-crawlers-hapoalim Homepage Documentation Source Code Bug Tracker Wiki

bankcrawlers-hapoalim

0.0

No commit activity in last 3 years

No release in over 3 years

A crappy crawler for a crappy bank interface

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

6,649

Releases

0.0.8

2014-04-07

2015-04-04

Activity

2014-04-11

middleman-crawler welaika/middleman-crawler Homepage Documentation Source Code Bug Tracker Wiki

middleman-crawler

0.0

No commit activity in last 3 years

No release in over 3 years

it starts a crawler for Middleman sites

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

5,335

Releases

0.0.2

2016-07-05

2016-07-05

Activity

2016-11-12

sledgehammer growthrepublic/sledgehammer Homepage Documentation Source Code Bug Tracker Wiki

sledgehammer

0.0

No commit activity in last 3 years

No release in over 3 years

Website crawler harvesting e-mails. Uses Sidekiq and Typhoeus.

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

3,298

Releases

0.1.0

2014-07-10

2014-07-10

Activity

25%

2014-07-09

web_crawler webgago/web_crawler Homepage Documentation Source Code Bug Tracker Wiki

web_crawler

0.0

No commit activity in last 3 years

No release in over 3 years

Web crawler help you with parse and collect data from the web

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

19,340

Releases

0.5.4

2011-05-30

2011-06-24

Activity

100%

100%

2011-09-08

file_crawler hirohisa/file_crawler Homepage Documentation Source Code Bug Tracker Wiki

file_crawler

0.0

No commit activity in last 3 years

No release in over 3 years

FileCrawler searches and controls files in local directory

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

29,101

Releases

0.6.0

2016-05-05

2018-06-29

Activity

100%

2017-06-15