Search results for 'crawler' - The Ruby Toolbox

SemanticCrawler is a ruby library that encapsulates data gathering from different sources. Currently microdata from websites, country information from Freebase, Factbook and FAO (Food and Agriculture Organization of the United Nations), crisis information from GDACS.org and geo data from LinkedGe...

2019

2020

2021

2022

2023

2024

38,334

0.7.1

2012-03-25

2013-04-07

Show more project details Compare

email_crawler

0.0

Repository is gone

No release in over 3 years

email_crawler Homepage

Email crawler: crawls the top ten Google search results looking for email addresses and exports them to CSV.

2019

2020

2021

2022

2023

2024

37,706

0.1.1

2014-02-25

2015-10-01

Show more project details Compare

ruby-cheerio

0.0

No commit activity in last 3 years

No release in over 3 years

ruby-cheerio dineshsprabu/ruby-cheerio Homepage

Ruby Cheerio is a jQuery style HTML parser, which take selectors as input. This is a Ruby version NodeJS package named 'Cheerio', which is extensively used by crawlers. Please visit the home page for usage details.

2019

2020

2021

2022

2023

2024

35,577

0.0.5

2016-08-09

2016-08-09

Show more project details Compare

preadly-bulbasaur

0.0

No commit activity in last 3 years

No release in over 3 years

preadly-bulbasaur preadly/bulbasaur Homepage

Bulbasaur is a helper for crawler operations used in Pread.ly

2019

2020

2021

2022

2023

2024

34,046

0.9.0

2015-07-13

2015-12-23

Show more project details Compare

driller

0.01

No commit activity in last 3 years

No release in over 3 years

driller shashikant86/driller Homepage

Driller is a command line Ruby based web crawler based on Anemone. Driller can crawl website and reports error pages and slow pages and generates HTML reports.

2019

2020

2021

2022

2023

2024

33,787

0.1.4

2015-05-10

2015-05-18

Show more project details Compare

rcrawl

0.0

No release in over 3 years

rcrawl Homepage

A web crawler written in ruby

2019

2020

2021

2022

2023

2024

31,727

0.5.1

2006-09-20

2006-10-06

Show more project details Compare

bot_detection

0.0

No commit activity in last 3 years

No release in over 3 years

bot_detection sumy/bot_detection Homepage

Checks a user agent for a web crawler

2019

2020

2021

2022

2023

2024

31,423

1.0.9

2014-08-01

2015-11-03

Show more project details Compare

flyerhzm-regexp_crawler

0.0

No release in over 3 years

flyerhzm-regexp_crawler

RegexpCrawler is a Ruby library for crawl data from website using regular expression.

2019

2020

2021

2022

2023

2024

31,164

0.9.1

2009-08-02

2009-09-14

Show more project details Compare

arachnid2

0.01

Web Content Scrapers

No release in over 3 years

Low commit activity in last 3 years

arachnid2 samnissen/arachnid2 Homepage

A simple, fast web crawler

2019

2020

2021

2022

2023

2024

29,097

0.4.0

2018-05-29

2020-07-15

Show more project details Compare

file_crawler

0.0

No commit activity in last 3 years

No release in over 3 years

file_crawler hirohisa/file_crawler Homepage

FileCrawler searches and controls files in local directory

2019

2020

2021

2022

2023

2024

29,008

0.6.0

2016-05-05

2018-06-29

Show more project details Compare

iudex-worker

0.0

No release in over 3 years

iudex-worker Homepage

Iudex is a general purpose web crawler and feed processor in ruby/java. The iudex-worker gem provides a worker deamon for feed/page processing.

2019

2020

2021

2022

2023

2024

28,701

1.5.0

2011-04-04

2015-05-04

Show more project details Compare

attribute_imagifiable

0.0

Repository is archived

No commit activity in last 3 years

No release in over 3 years

attribute_imagifiable zealot128/attribute_imagifiable Homepage

Using paperclip to generate images from sensible attributes like e-mails and telephone numbers, in order to reduce crawler's success

2019

2020

2021

2022

2023

2024

27,687

0.0.8

2012-10-08

2013-07-31

Show more project details Compare

iudex-html

0.0

No release in over 3 years

iudex-html Homepage

Iudex is a general purpose web crawler and feed processor in ruby/java. The iudex-html gem contains filters for HTML parsing, filtering, exracting text and links.

2019

2020

2021

2022

2023

2024

26,650

1.7.0

2011-04-04

2017-07-07

Show more project details Compare

iudex-barc

0.0

No release in over 3 years

iudex-barc Homepage

Iudex is a general purpose web crawler and feed processor in ruby/java. The iudex-barc gem contains support for the BARC Basic ARChive format.

2019

2020

2021

2022

2023

2024

26,404

1.7.0

2011-04-04

2018-10-29

Show more project details Compare

kudzu

0.0

Low commit activity in last 3 years

A long-lived project that still receives updates

kudzu kanety/kudzu Homepage

A simple web crawler for ruby

2019

2020

2021

2022

2023

2024

25,918

1.3.1

2017-12-20

2023-06-23

Show more project details Compare

iudex-jetty-httpclient

0.0

No release in over 3 years

iudex-jetty-httpclient Homepage

Iudex is a general purpose web crawler and feed processor in ruby/java. This gem is a Jetty HTTP Client based implementation of the iudex-http interfaces.

2019

2020

2021

2022

2023

2024

25,807

1.7.0

2011-11-13

2017-02-13

Show more project details Compare