0.0
No commit activity in last 3 years
No release in over 3 years
Simple web crawler to crawl a domain and generate sitemap
2020
2021
2022
2023
2024
2025
0.0
No release in over 3 years
A simple news crawler. You can specify the structure of your xml or rss feeds.
2020
2021
2022
2023
2024
2025
0.0
Repository is archived
No release in over a year
Retrieves a list of URLs to seed the crawler by publishing them to a RabbitMQ exchange.
2020
2021
2022
2023
2024
2025
0.0
No release in over 3 years
A very simple crawler for RubyGems.org used to demo the power of ElasticSearch at RubyConf 2013
2020
2021
2022
2023
2024
2025
0.0
No commit activity in last 3 years
No release in over 3 years
Stupid crawler that looks for URLs on a given site. Result is saved as two CSV files one with found URLs and another with failed URLs.
2020
2021
2022
2023
2024
2025
0.0
No commit activity in last 3 years
No release in over 3 years
Web crawler help you with parse and collect data from the web
2020
2021
2022
2023
2024
2025
0.0
No release in over 3 years
Low commit activity in last 3 years
Easy way to enable AdSense crawler to login and see private or custom pages in your rails application. Basically one custom login filter. Gem enables you to easily slightly increase revenues from Google AdSense/AdWords. It makes it easy to enable crawling on private pages and so get better target...
2020
2021
2022
2023
2024
2025
0.06
No release in over 3 years
Low commit activity in last 3 years
There's a lot of open issues
Crawl instagram photos, posts and videos for download.
2020
2021
2022
2023
2024
2025
0.06
No release in over 3 years
Low commit activity in last 3 years
There's a lot of open issues
Asynchronous web crawler, scraper and file harvester
2020
2021
2022
2023
2024
2025
0.05
No commit activity in last 3 years
No release in over 3 years
There's a lot of open issues
An easy to use distributed web-crawler framework based on Redis
2020
2021
2022
2023
2024
2025
0.02
Low commit activity in last 3 years
A long-lived project that still receives updates
Post URLs to Wayback Machine (Internet Archive), using a crawler, from Sitemap(s) or a list of URLs.
2020
2021
2022
2023
2024
2025
0.02
Low commit activity in last 3 years
No release in over a year
validate-website is a web crawler for checking the markup validity with XML Schema / DTD and not found urls.
2020
2021
2022
2023
2024
2025
0.02
No commit activity in last 3 years
No release in over 3 years
Ruby web crawler using PhantomJS
2020
2021
2022
2023
2024
2025
0.02
No commit activity in last 3 years
No release in over 3 years
Arachnid is a web crawler that relies on Bloom Filters to efficiently store visited urls and Typhoeus to avoid the overhead of Mechanize when crawling every page on a domain.
2020
2021
2022
2023
2024
2025
0.02
No commit activity in last 3 years
No release in over 3 years
Rack Middleware adhering to the Google Ajax Crawling Scheme, using a headless browser to render JS heavy pages and serve a dom snapshot of the rendered state to a requesting search engine.
2020
2021
2022
2023
2024
2025
0.01
No commit activity in last 3 years
No release in over 3 years
Driller is a command line Ruby based web crawler based on Anemone. Driller can crawl website and reports error pages and slow pages and generates HTML reports.
2020
2021
2022
2023
2024
2025