0.01
== Medusa: a ruby crawler framework {rdoc-image:https://badge.fury.io/rb/medusa-crawler.svg}[https://rubygems.org/gems/medusa-crawler] rdoc-image:https://github.com/brutuscat/medusa-crawler/workflows/Ruby/badge.svg?event=push
Medusa is a framework for the ruby language to crawl and collect useful information about the pages
it visits. It is versatile, allowing you to write your own specialized tasks quickly and easily.
=== Features
* Choose the links to follow on each page with +focus_crawl+
* Multi-threaded design for high performance
* Tracks +301+ HTTP redirects
* Allows exclusion of URLs based on regular expressions
* Records response time for each page
* Obey _robots.txt_ directives (optional, but recommended)
* In-memory or persistent storage of pages during crawl, provided by Moneta[https://github.com/moneta-rb/moneta]
* Inherits OpenURI behavior (redirects, automatic charset and encoding detection, proxy configuration options).
<b>Do you have an idea or a suggestion? {Open an issue and talk about it}[https://github.com/brutuscat/medusa-crawler/issues/new]</b>
=== Examples
Medusa is versatile and to be used programatically, you can start with one or multiple URIs:
require 'medusa'
Medusa.crawl('https://www.example.com', depth_limit: 2)
Or you can pass a block and it will yield the crawler back, to manage configuration or drive its crawling focus:
require 'medusa'
Medusa.crawl('https://www.example.com', depth_limit: 2) do |crawler|
crawler.discard_page_bodies = some_flag
# Persist all the pages state across crawl-runs.
crawler.clear_on_startup = false
crawler.storage = Medusa::Storage.Moneta(:Redis, 'redis://redis.host.name:6379/0')
crawler.skip_links_like(/private/)
crawler.on_pages_like(/public/) do |page|
logger.debug "[public page] #{page.url} took #{page.response_time} found #{page.links.count}"
end
# Use an arbitrary logic, page by page, to continue customize the crawling.
crawler.focus_crawl(/public/) do |page|
page.links.first
end
end
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0.0
Empower World Travel Information Technology
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0.0
Empower World Travel Information Technology
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0.0
Empower World Travel Information Technology
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0.0
Empower World Travel Information Technology
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0.0
Empower World Travel Information Technology
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
A simple, fast web crawler
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
Activity
0.0
a demo for study ruby
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0.0
Simple site crawler using Capybara
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
Activity
0.01
your friendly neighborhood web crawler
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
Activity
0.0
This file crawler helps to decect if there are new files in a directory.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0.0
Crawls the mails in a IMAP Folder
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0.0
A simple crawler that gets posts and pages from wordpress websites that have an exposed api
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
Activity
0.0
Easy way to enable AdSense crawler to login and see private or custom pages in your rails application. Basically one custom login filter. Gem enables you to easily slightly increase revenues from Google AdSense/AdWords. It makes it easy to enable crawling on private pages and so get better targeted ads even in pages behind login screen.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
Activity
0.0
Show DMM and DMM.R18's crawled data. e.g. ranking
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
Activity
0.0
This gem crawls the latest CircleCI artifact file you specified. For Example, you can get the result JSON of simplecov.gem etc.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0.0
Simple Gem Using Watir For Phantom Crawler
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0.08
Crawl instagram photos, posts and videos for download.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
Activity
0.0
Crawl the senegalese web, looking for jobs using the excellent wombat gem
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
Activity
0.0
A simple web crawler gem
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
Activity