0.0
Empower World Travel Information Technology
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0.0
Empower World Travel Information Technology
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0.0
Empower World Travel Information Technology
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0.01
== Medusa: a ruby crawler framework {rdoc-image:https://badge.fury.io/rb/medusa-crawler.svg}[https://rubygems.org/gems/medusa-crawler] rdoc-image:https://github.com/brutuscat/medusa-crawler/workflows/Ruby/badge.svg?event=push
Medusa is a framework for the ruby language to crawl and collect useful information about the pages
it visits. It is versatile, allowing you to write your own specialized tasks quickly and easily.
=== Features
* Choose the links to follow on each page with +focus_crawl+
* Multi-threaded design for high performance
* Tracks +301+ HTTP redirects
* Allows exclusion of URLs based on regular expressions
* Records response time for each page
* Obey _robots.txt_ directives (optional, but recommended)
* In-memory or persistent storage of pages during crawl, provided by Moneta[https://github.com/moneta-rb/moneta]
* Inherits OpenURI behavior (redirects, automatic charset and encoding detection, proxy configuration options).
<b>Do you have an idea or a suggestion? {Open an issue and talk about it}[https://github.com/brutuscat/medusa-crawler/issues/new]</b>
=== Examples
Medusa is versatile and to be used programatically, you can start with one or multiple URIs:
require 'medusa'
Medusa.crawl('https://www.example.com', depth_limit: 2)
Or you can pass a block and it will yield the crawler back, to manage configuration or drive its crawling focus:
require 'medusa'
Medusa.crawl('https://www.example.com', depth_limit: 2) do |crawler|
crawler.discard_page_bodies = some_flag
# Persist all the pages state across crawl-runs.
crawler.clear_on_startup = false
crawler.storage = Medusa::Storage.Moneta(:Redis, 'redis://redis.host.name:6379/0')
crawler.skip_links_like(/private/)
crawler.on_pages_like(/public/) do |page|
logger.debug "[public page] #{page.url} took #{page.response_time} found #{page.links.count}"
end
# Use an arbitrary logic, page by page, to continue customize the crawling.
crawler.focus_crawl(/public/) do |page|
page.links.first
end
end
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0.0
This little library helps people download images from different subs much easier. It's actually like a crawler for the images posted on a subreddit. Actually, it's a great tool to have your favorite memes locally!
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
Activity
0.0
webget gem - a web (go get) crawler incl. web cache
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
Activity
0.0
Crawler Guru provides all basic functionalities to extract data from web pages
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
Activity
0.0
The Taiwan VSCinema crawler to get latest film list.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0.0
A simple crawler demo crawler
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0.0
Simple Web Crawler
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0.0
With just a few lines of code, developers can effortlessly integrate this gem into their projects, enabling seamless retrieval of page titles from HTML documents. Whether you're building web scrapers, crawlers, or any application that requires fetching webpage titles, WebTitle streamlines the process, providing a clean and efficient solution.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
Activity