0.0
No release in over 3 years
A very simple crawler for RubyGems.org used to demo the power of ElasticSearch at RubyConf 2013
2019
2020
2021
2022
2023
2024
0.0
No commit activity in last 3 years
No release in over 3 years
This gem is a web crawler sample code.So I don't reccmmend that you use.
2019
2020
2021
2022
2023
2024
0.0
No commit activity in last 3 years
No release in over 3 years
A demo of Web Crawler using arb-crawler
2019
2020
2021
2022
2023
2024
0.0
No commit activity in last 3 years
No release in over 3 years
web crawler that generates a sitemap to a neo4j database. It will also store broken_links and total number of pages on site
2019
2020
2021
2022
2023
2024
0.0
No release in over 3 years
Low commit activity in last 3 years
Stupid crawler that looks for URLs on a given site. Result is saved as two CSV files one with found URLs and another with failed URLs.
2019
2020
2021
2022
2023
2024
0.08
No release in over 3 years
Low commit activity in last 3 years
There's a lot of open issues
Crawl instagram photos, posts and videos for download.
2019
2020
2021
2022
2023
2024
0.08
No release in over 3 years
Low commit activity in last 3 years
There's a lot of open issues
Asynchronous web crawler, scraper and file harvester
2019
2020
2021
2022
2023
2024
0.07
No commit activity in last 3 years
No release in over 3 years
There's a lot of open issues
An easy to use distributed web-crawler framework based on Redis
2019
2020
2021
2022
2023
2024
0.03
No release in over 3 years
Low commit activity in last 3 years
Post URLs to Wayback Machine (Internet Archive), using a crawler, from Sitemap(s) or a list of URLs.
2019
2020
2021
2022
2023
2024
0.03
No commit activity in last 3 years
No release in over 3 years
Arachnid is a web crawler that relies on Bloom Filters to efficiently store visited urls and Typhoeus to avoid the overhead of Mechanize when crawling every page on a domain.
2019
2020
2021
2022
2023
2024
0.03
No commit activity in last 3 years
No release in over 3 years
Rack Middleware adhering to the Google Ajax Crawling Scheme, using a headless browser to render JS heavy pages and serve a dom snapshot of the rendered state to a requesting search engine.
2019
2020
2021
2022
2023
2024
0.03
Low commit activity in last 3 years
No release in over a year
validate-website is a web crawler for checking the markup validity with XML Schema / DTD and not found urls.
2019
2020
2021
2022
2023
2024
0.02
No commit activity in last 3 years
No release in over 3 years
Ruby web crawler using PhantomJS
2019
2020
2021
2022
2023
2024
0.02
Repository is archived
No commit activity in last 3 years
No release in over 3 years
Cosmicrawler is crawler library for Ruby. It provides scalable asynchronous crawling by (http|file|etc) using EventMachine.
2019
2020
2021
2022
2023
2024
0.01
No commit activity in last 3 years
No release in over 3 years
Gem for crawling data from external sources
2019
2020
2021
2022
2023
2024