Search results for 'crawler' - The Ruby Toolbox

80%

2020-05-23

cobweb stewartmckee/cobweb Homepage Documentation Source Code Bug Tracker Wiki

cobweb

Web Content Scrapers

0.13

Web Content Scrapers

No release in over 3 years

Low commit activity in last 3 years

There's a lot of open issues

Cobweb is a web crawler that can use resque to cluster crawls to quickly crawl extremely large sites which is much more performant than multi-threaded crawlers. It is also a standalone crawler that has a sophisticated statistics monitoring interface to monitor the progress of the crawls.

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

323,609

228

Releases

1.2.1

2010-11-10

2021-01-09

Activity

50%

57%

2016-04-07

wombat felipecsl/wombat Homepage Documentation Source Code Bug Tracker Wiki

wombat

Web Content Scrapers

0.55

Web Content Scrapers

Low commit activity in last 3 years

There's a lot of open issues

No release in over a year

Generic Web crawler with a DSL that parses structured data from web pages

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

204,807

1,303

129

Releases

3.0.0

2011-12-27

2022-08-23

Activity

59%

80%

2019-09-27

crawler_detect loadkpi/crawler_detect Homepage Documentation Source Code Bug Tracker

crawler_detect

User Agent Detection

0.07

User Agent Detection

Low commit activity in last 3 years

A long-lived project that still receives updates

CrawlerDetect is a library to detect bots/crawlers via the user agent

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

1,016,280

110

Releases

1.2.4

2018-08-05

2024-03-20

Activity

87%

73%

2021-01-07

crawler-engine Homepage Documentation

crawler-engine

0.0

No release in over 3 years

Crawler Engine provides function of crawl all news from the customized website

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

4,765

Releases

0.1.0

2011-11-22

2011-11-22

Activity

bank-crawlers-hapoalim joaomilho/bank-crawlers-hapoalim Homepage Documentation Source Code Bug Tracker Wiki

bank-crawlers-hapoalim

0.0

No commit activity in last 3 years

No release in over 3 years

A crappy crawler for a crappy bank interface

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

6,118

Releases

0.0.7

2015-04-03

2015-04-03

Activity

2014-04-11

embulk-filter-crawler toyama0919/embulk-filter-crawler Homepage Documentation Source Code Bug Tracker Wiki

embulk-filter-crawler

0.0

No commit activity in last 3 years

No release in over 3 years

Crawler4J filter plugin for Embulk

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

9,329

Releases

0.1.3

2016-03-25

2016-04-06

Activity

2016-03-28

voight_kampff biola/voight-kampff Homepage Documentation Source Code Bug Tracker Wiki

voight_kampff

0.26

Low commit activity in last 3 years

No release in over a year

Voight-Kampff detects bots, spiders, crawlers and replicants

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

6,440,324

177

Releases

2.0.0

2011-05-11

2023-03-12

Activity

94%

60%

2018-09-03

is_crawler ccashwell/is_crawler Homepage Documentation Source Code Bug Tracker Wiki

is_crawler

0.02

No commit activity in last 3 years

No release in over 3 years

is_crawler does exactly what you might think it does: determine if the supplied string matches a known crawler or bot.

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

160,690

Releases

0.1.5

2013-02-27

2013-05-23

Activity

60%

2013-12-05

zy_crawler uuensky/zycrawler Homepage Documentation Source Code Bug Tracker Wiki

zy_crawler

0.0

No release in over a year

A simple crawler demo crawler

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

1,174

Releases

0.0.1

2022-03-08

2022-03-08

Activity

2022-03-08

murmuring_spider Homepage Documentation

murmuring_spider

0.0

Repository is gone

No release in over 3 years

MurmuringSpider is a concise Twitter crawler. When we write a data-mining / text-mining application based on twitter timeline, we have to collect and store tweets first. I am irritated with writing such crawler repeatedly, so I wrote this. What you have to do is only to add query and to run them periodically. Thanks to consistent Twitter API and twitter gem (http://twitter.rubyforge.org/), it is quite easy to track various types of timelines (such as user_timeline, home_timeline, search...)

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

4,169

Releases

0.0.2

2012-04-13

2012-04-13

Activity

baidu_crawler debbbbie/baidu_crawler Homepage Documentation Source Code Bug Tracker Wiki

baidu_crawler

0.0

No commit activity in last 3 years

No release in over 3 years

The Baidu Crawler is to crawl data with your demmand

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

5,953

Releases

0.0.1

2012-09-01

2012-09-01

Activity

2012-08-21

resay_crawler Homepage Documentation

resay_crawler

0.0

No release in over 3 years

A simple web crawler gem

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

3,013

Releases

0.0.1

2015-05-23

2015-05-23

Activity

arb-bs arybin-cn/arb-bs Homepage Documentation Source Code Bug Tracker Wiki

arb-bs

0.0

No commit activity in last 3 years

No release in over 3 years

A demo of Web Crawler using arb-crawler

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

19,629

Releases

1.1.4

2017-02-13

2018-04-12

Activity

2017-09-11

recipe_crawler madeindjs/recipe_crawler Homepage Documentation Source Code Bug Tracker Wiki

recipe_crawler

0.0

Repository is archived

No commit activity in last 3 years

No release in over 3 years

This crawler will use my personnal scraper named 'RecipeScraper' to dowload recipes data from Marmiton, 750g or cuisineaz

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

12,280

Releases

4.0.0

2016-11-29

2018-12-08

Activity

33%

2017-02-01

arb-crawler arybin-cn/arb-crawler Homepage Documentation Source Code Bug Tracker Wiki

arb-crawler

0.0

No commit activity in last 3 years

No release in over 3 years

Web page crawler.

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

8,834

Releases

1.0.3

2017-02-12

2017-08-06

Activity

2017-03-24

ruby-crawler abams/crawler Documentation Source Code Bug Tracker Wiki

ruby-crawler

0.0

No commit activity in last 3 years

No release in over 3 years

Simple ruby web crawler

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

3,369

Releases

0.0.1

2014-05-20

2014-05-20

Activity

2014-05-17

rubygems-crawler Homepage Documentation

rubygems-crawler

0.0

No release in over 3 years

A very simple crawler for RubyGems.org used to demo the power of ElasticSearch at RubyConf 2013

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

6,015

Releases

0.1.0

2013-10-28

2013-10-28

Activity

crawler_guru Homepage Documentation Source Code

crawler_guru

0.0

Repository is gone

No release in over a year

Crawler Guru provides all basic functionalities to extract data from web pages

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

1,832

Releases

0.1.0

2021-09-03

2021-09-03

Activity

simple_crawler anupom/crawler Homepage Documentation Source Code Bug Tracker Wiki

simple_crawler

0.0

No commit activity in last 3 years

No release in over 3 years

Simple web crawler to crawl a domain and generate sitemap

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

3,385

Releases

0.0.1

2014-02-18

2014-02-18

Activity

2014-02-17