Search results for 'crawler' - The Ruby Toolbox

50%

57%

2016-04-07

rubyretriever joenorton/rubyretriever Homepage Documentation Source Code Bug Tracker Wiki

rubyretriever

0.08

No release in over 3 years

Low commit activity in last 3 years

There's a lot of open issues

Asynchronous web crawler, scraper and file harvester

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

67,263

141

Releases

1.4.6

2014-05-25

2016-04-11

Activity

69%

76%

2015-02-18

instagram-crawler mgleon08/instagram-crawler Homepage Documentation Source Code Bug Tracker Wiki

instagram-crawler

0.08

No release in over 3 years

Low commit activity in last 3 years

There's a lot of open issues

Crawl instagram photos, posts and videos for download.

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

7,362

197

Releases

0.3.0

2018-11-23

2019-04-14

Activity

16%

50%

2018-12-12

polipus taganaka/polipus Homepage Documentation Source Code Bug Tracker Wiki

polipus

0.07

No commit activity in last 3 years

No release in over 3 years

There's a lot of open issues

An easy to use distributed web-crawler framework based on Redis

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

51,543

Releases

0.5.1

2014-01-05

2015-07-17

Activity

65%

84%

2015-03-02

crawler_detect loadkpi/crawler_detect Homepage Documentation Source Code Bug Tracker

crawler_detect

User Agent Detection

0.07

User Agent Detection

Low commit activity in last 3 years

A long-lived project that still receives updates

CrawlerDetect is a library to detect bots/crawlers via the user agent

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

1,027,490

111

Releases

1.2.4

2018-08-05

2024-03-20

Activity

87%

73%

2021-01-07

validate-website spk/validate-website Homepage Documentation Source Code Bug Tracker Wiki

validate-website

0.03

Low commit activity in last 3 years

No release in over a year

validate-website is a web crawler for checking the markup validity with XML Schema / DTD and not found urls.

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

125,542

Releases

1.12.0

2009-10-24

2022-11-15

Activity

100%

83%

2021-01-02

wayback_archiver buren/wayback_archiver Homepage Documentation Source Code Bug Tracker Wiki

wayback_archiver

0.03

No release in over 3 years

Low commit activity in last 3 years

Post URLs to Wayback Machine (Internet Archive), using a crawler, from Sitemap(s) or a list of URLs.

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

46,170

Releases

1.4.0

2014-07-17

2021-04-23

Activity

77%

68%

2019-08-15

arachnid dchuk/arachnid Homepage Documentation Source Code Bug Tracker Wiki

arachnid

0.03

No commit activity in last 3 years

No release in over 3 years

Arachnid is a web crawler that relies on Bloom Filters to efficiently store visited urls and Typhoeus to avoid the overhead of Mechanize when crawling every page on a domain.

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

48,791

Releases

0.4.1

2011-11-11

2014-01-16

Activity

66%

2012-04-10

google_ajax_crawler benkitzelman/google-ajax-crawler Homepage Documentation Source Code Bug Tracker Wiki

google_ajax_crawler

0.03

No commit activity in last 3 years

No release in over 3 years

Rack Middleware adhering to the Google Ajax Crawling Scheme, using a headless browser to render JS heavy pages and serve a dom snapshot of the rendered state to a requesting search engine.

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

15,855

Releases

0.2.0

2013-03-16

2013-07-13

Activity

50%

100%

2013-07-10

is_crawler ccashwell/is_crawler Homepage Documentation Source Code Bug Tracker Wiki

is_crawler

0.02

No commit activity in last 3 years

No release in over 3 years

is_crawler does exactly what you might think it does: determine if the supplied string matches a known crawler or bot.

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

160,932

Releases

0.1.5

2013-02-27

2013-05-23

Activity

60%

2013-12-05

grell mdsol/grell Homepage Documentation Source Code Bug Tracker Wiki

grell

0.02

No commit activity in last 3 years

No release in over 3 years

Ruby web crawler using PhantomJS

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

86,603

481

Releases

2.1.2

2015-05-07

2021-02-17

Activity

87%

2017-01-26

cosmicrawler bash0c7/cosmicrawler Homepage Documentation Source Code Bug Tracker Wiki

cosmicrawler

0.02

Repository is archived

No commit activity in last 3 years

No release in over 3 years

Cosmicrawler is crawler library for Ruby. It provides scalable asynchronous crawling by (http|file|etc) using EventMachine.

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

5,076

Releases

0.0.1

2013-03-11

2013-03-11

Activity

100%

2013-03-17

render_static herval/render_static Homepage Documentation Source Code Bug Tracker Wiki

render_static

0.01

No commit activity in last 3 years

No release in over 3 years

render_static allows you to make your single-page apps (Backbone, Angular, etc) built on Rails SEO-friendly. It works by injecting a small rack middleware that will render pages as plain html, when the requester is one of the most common crawlers/bots out there (Google, Yahoo Baidu and Bing)

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

4,145

Releases

0.0.0

2013-05-08

2013-05-08

Activity

2013-05-07

semantic-crawler obale/semantic_crawler Homepage Documentation Source Code Bug Tracker Wiki

semantic-crawler

0.01

No commit activity in last 3 years

No release in over 3 years

There's a lot of open issues

SemanticCrawler is a ruby library that encapsulates data gathering from different sources. Currently microdata from websites, country information from Freebase, Factbook and FAO (Food and Agriculture Organization of the United Nations), crisis information from GDACS.org and geo data from LinkedGeoData are supported. Additional the GeoNames module allows to get Factbook and FAO country information from GPS coordinates.

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

38,334

Releases

0.7.1

2012-03-25

2013-04-07

Activity

64%

2012-07-30

spiderman bkeepers/spiderman Homepage Documentation Source Code Bug Tracker Wiki

spiderman

0.01

No release in over 3 years

Low commit activity in last 3 years

your friendly neighborhood web crawler

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

4,535

Releases

2.0.0

2020-03-22

2020-03-22

Activity

100%

2020-08-22

wriggle tsigo/wriggle Homepage Documentation Source Code Bug Tracker

wriggle

0.01

No commit activity in last 3 years

No release in over 3 years

A simple directory crawler DSL.

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

14,020

Releases

1.3.0

2010-10-09

2011-03-09

Activity

2011-08-02