0.0
Repository is gone
No release in over 3 years
Generic Web crawler with a DSL that parses event-related data from web pages
2020
2021
2022
2023
2024
2025
0.0
No commit activity in last 3 years
No release in over 3 years
Web crawler with JSON-based DSL and EventMachine-powered page fetching
2020
2021
2022
2023
2024
2025
0.0
No commit activity in last 3 years
No release in over 3 years
Web crawler help you with parse and collect data from the web
2020
2021
2022
2023
2024
2025
0.0
No release in over 3 years
A very simple crawler for RubyGems.org used to demo the power of ElasticSearch at RubyConf 2013
2020
2021
2022
2023
2024
2025
0.0
Repository is archived
No commit activity in last 3 years
No release in over 3 years
This crawler will use my personnal scraper named 'RecipeScraper' to dowload recipes data from Marmiton, 750g or cuisineaz
2020
2021
2022
2023
2024
2025
0.06
No release in over 3 years
Low commit activity in last 3 years
There's a lot of open issues
Asynchronous web crawler, scraper and file harvester
2020
2021
2022
2023
2024
2025
0.06
No release in over 3 years
Low commit activity in last 3 years
There's a lot of open issues
Crawl instagram photos, posts and videos for download.
2020
2021
2022
2023
2024
2025
0.05
No commit activity in last 3 years
No release in over 3 years
There's a lot of open issues
An easy to use distributed web-crawler framework based on Redis
2020
2021
2022
2023
2024
2025
0.03
No commit activity in last 3 years
No release in over 3 years
Rack Middleware adhering to the Google Ajax Crawling Scheme, using a headless browser to render JS heavy pages and serve a dom snapshot of the rendered state to a requesting search engine.
2020
2021
2022
2023
2024
2025
0.02
No commit activity in last 3 years
No release in over 3 years
Arachnid is a web crawler that relies on Bloom Filters to efficiently store visited urls and Typhoeus to avoid the overhead of Mechanize when crawling every page on a domain.
2020
2021
2022
2023
2024
2025
0.02
Low commit activity in last 3 years
A long-lived project that still receives updates
Post URLs to Wayback Machine (Internet Archive), using a crawler, from Sitemap(s) or a list of URLs.
2020
2021
2022
2023
2024
2025
0.02
No commit activity in last 3 years
No release in over 3 years
Ruby web crawler using PhantomJS
2020
2021
2022
2023
2024
2025
0.02
Low commit activity in last 3 years
No release in over a year
validate-website is a web crawler for checking the markup validity with XML Schema / DTD and not found urls.
2020
2021
2022
2023
2024
2025
0.01
No release in over 3 years
Low commit activity in last 3 years
your friendly neighborhood web crawler
2020
2021
2022
2023
2024
2025
0.01
No commit activity in last 3 years
No release in over 3 years
render_static allows you to make your single-page apps (Backbone, Angular, etc) built on Rails SEO-friendly. It works by injecting a small rack middleware that will render pages as plain html, when the requester is one of the most common crawlers/bots out there (Google, Yahoo Baidu and Bing)
2020
2021
2022
2023
2024
2025