Search results for 'crawler' - The Ruby Toolbox

Projects

Bugfix forks are hidden

expedia-crawler

0.0

No commit activity in last 3 years

No release in over 3 years

expedia-crawler elsoul/expedia-crawler Homepage Documentation Source Code Bug Tracker Wiki

Empower World Travel Information Technology

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

3,057

0

0

2

Releases

Current version

0.1.1

2

2020-08-03

2020-08-03

Activity

Average date of last 50 commits

2020-08-03

Reverse Dependencies

0

Show more project details Compare

jalan-crawler

0.0

No commit activity in last 3 years

No release in over 3 years

jalan-crawler elsoul/jalan-crawler Homepage Documentation Source Code Bug Tracker Wiki

Empower World Travel Information Technology

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

1,678

0

0

2

Releases

Current version

0.1.0

1

2020-08-03

2020-08-03

Activity

Average date of last 50 commits

2020-08-03

Reverse Dependencies

0

Show more project details Compare

booking-com-crawler

0.0

No commit activity in last 3 years

No release in over 3 years

booking-com-crawler elsoul/booking-com-crawler Homepage Documentation Source Code Bug Tracker Wiki

Empower World Travel Information Technology

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

1,823

0

1

2

Releases

Current version

0.1.0

1

2020-08-03

2020-08-03

Activity

Average date of last 50 commits

2020-08-03

Reverse Dependencies

0

Show more project details Compare

medusa-crawler

0.01

No commit activity in last 3 years

No release in over 3 years

medusa-crawler brutuscat/medusa-crawler Homepage Documentation Source Code Bug Tracker Wiki

== Medusa: a ruby crawler framework {rdoc-image:https://badge.fury.io/rb/medusa-crawler.svg}[https://rubygems.org/gems/medusa-crawler] rdoc-image:https://github.com/brutuscat/medusa-crawler/workflows/Ruby/badge.svg?event=push Medusa is a framework for the ruby language to crawl and collect useful information about the pages it visits. It is versatile, allowing you to write your own specialized tasks quickly and easily. === Features * Choose the links to follow on each page with +focus_crawl+ * Multi-threaded design for high performance * Tracks +301+ HTTP redirects * Allows exclusion of URLs based on regular expressions * Records response time for each page * Obey _robots.txt_ directives (optional, but recommended) * In-memory or persistent storage of pages during crawl, provided by Moneta[https://github.com/moneta-rb/moneta] * Inherits OpenURI behavior (redirects, automatic charset and encoding detection, proxy configuration options). <b>Do you have an idea or a suggestion? {Open an issue and talk about it}[https://github.com/brutuscat/medusa-crawler/issues/new]</b> === Examples Medusa is versatile and to be used programatically, you can start with one or multiple URIs: require 'medusa' Medusa.crawl('https://www.example.com', depth_limit: 2) Or you can pass a block and it will yield the crawler back, to manage configuration or drive its crawling focus: require 'medusa' Medusa.crawl('https://www.example.com', depth_limit: 2) do |crawler| crawler.discard_page_bodies = some_flag # Persist all the pages state across crawl-runs. crawler.clear_on_startup = false crawler.storage = Medusa::Storage.Moneta(:Redis, 'redis://redis.host.name:6379/0') crawler.skip_links_like(/private/) crawler.on_pages_like(/public/) do |page| logger.debug "[public page] #{page.url} took #{page.response_time} found #{page.links.count}" end # Use an arbitrary logic, page by page, to continue customize the crawling. crawler.focus_crawl(/public/) do |page| page.links.first end end

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

4,255

5

3

3

Releases

Current version

1.0.0

3

2020-08-06

2020-08-17

Activity

Issue Closure Rate

80%

Average date of last 50 commits

2020-05-23

Reverse Dependencies

0

Show more project details Compare

reddit_junkie

0.0

No release in over 3 years

reddit_junkie Homepage Documentation

This little library helps people download images from different subs much easier. It's actually like a crawler for the images posted on a subreddit. Actually, it's a great tool to have your favorite memes locally!

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

9,181

Releases

Current version

0.0.7

7

2020-09-25

2020-10-09

Activity

Reverse Dependencies

0

Show more project details Compare

webget

0.0

No release in over 3 years

Low commit activity in last 3 years

webget rubycoco/webclient Homepage Documentation Source Code Bug Tracker

webget gem - a web (go get) crawler incl. web cache

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

13,020

6

1

2

Releases

Current version

0.2.5

9

2020-10-04

2021-02-21

Activity

Pull Request Acceptance Rate

100%

Average date of last 50 commits

2018-02-10

Reverse Dependencies

3

Show more project details Compare

crawler_guru

0.0

Repository is gone

No release in over a year

crawler_guru Homepage Documentation Source Code

Crawler Guru provides all basic functionalities to extract data from web pages

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

1,851

Releases

Current version

0.1.0

1

2021-09-03

2021-09-03

Activity

Reverse Dependencies

0

Show more project details Compare

vscinemas

0.0

No release in over a year

vscinemas elct9620/vscinemas-rb Homepage Documentation Source Code Bug Tracker Wiki

The Taiwan VSCinema crawler to get latest film list.

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

3,432

0

0

3

Releases

Current version

0.2.1

3

2021-12-20

2021-12-21

Activity

Average date of last 50 commits

2021-12-25

Reverse Dependencies

0

Show more project details Compare

zy_crawler

0.0

No release in over a year

zy_crawler uuensky/zycrawler Homepage Documentation Source Code Bug Tracker Wiki

A simple crawler demo crawler

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

1,190

0

0

1

Releases

Current version

0.0.1

1

2022-03-08

2022-03-08

Activity

Average date of last 50 commits

2022-03-08

Reverse Dependencies

0

Show more project details Compare

coolCrawler

0.0

No release in over a year

coolCrawler willwright1213/coolcrawler Homepage Documentation Source Code Bug Tracker Wiki

Simple Web Crawler

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

3,851

2

0

1

Releases

Current version

0.4.4

8

2022-09-29

2022-11-01

Activity

Average date of last 50 commits

2022-10-01

Reverse Dependencies

0

Show more project details Compare

WebTitle

0.0

The project is in a healthy, maintained state

WebTitle Homepage Documentation

With just a few lines of code, developers can effortlessly integrate this gem into their projects, enabling seamless retrieval of page titles from HTML documents. Whether you're building web scrapers, crawlers, or any application that requires fetching webpage titles, WebTitle streamlines the process, providing a clean and efficient solution.

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

Popularity

297

Releases

Current version

1.0.0

1

2023-12-13

2023-12-13

Activity

Reverse Dependencies

0

Show more project details Compare