0.0
No commit activity in last 3 years
No release in over 3 years
This gem is a web crawler sample code.So I don't reccmmend that you use.
2020
2021
2022
2023
2024
2025
0.0
Repository is archived
No release in over a year
Retrieves a list of URLs to seed the crawler by publishing them to a RabbitMQ exchange.
2020
2021
2022
2023
2024
2025
0.0
No commit activity in last 3 years
No release in over 3 years
web crawler that generates a sitemap to a neo4j database. It will also store broken_links and total number of pages on site
2020
2021
2022
2023
2024
2025
0.0
Repository is gone
No release in over 3 years
MurmuringSpider is a concise Twitter crawler. When we write a data-mining / text-mining application based on twitter timeline, we have to collect and store tweets first. I am irritated with writing such crawler repeatedly, so I wrote this. What you have to do is only to add query and to run th...
2020
2021
2022
2023
2024
2025
0.0
Repository is archived
No commit activity in last 3 years
No release in over 3 years
This crawler will use my personnal scraper named 'RecipeScraper' to dowload recipes data from Marmiton, 750g or cuisineaz
2020
2021
2022
2023
2024
2025
0.0
No commit activity in last 3 years
No release in over 3 years
There's a lot of open issues
Simple site crawler using Capybara
2020
2021
2022
2023
2024
2025
0.06
No release in over 3 years
Low commit activity in last 3 years
There's a lot of open issues
Crawl instagram photos, posts and videos for download.
2020
2021
2022
2023
2024
2025
0.06
No release in over 3 years
Low commit activity in last 3 years
There's a lot of open issues
Asynchronous web crawler, scraper and file harvester
2020
2021
2022
2023
2024
2025
0.05
No commit activity in last 3 years
No release in over 3 years
There's a lot of open issues
An easy to use distributed web-crawler framework based on Redis
2020
2021
2022
2023
2024
2025
0.03
No commit activity in last 3 years
No release in over 3 years
Rack Middleware adhering to the Google Ajax Crawling Scheme, using a headless browser to render JS heavy pages and serve a dom snapshot of the rendered state to a requesting search engine.
2020
2021
2022
2023
2024
2025
0.02
No commit activity in last 3 years
No release in over 3 years
Ruby web crawler using PhantomJS
2020
2021
2022
2023
2024
2025
0.02
No commit activity in last 3 years
No release in over 3 years
Arachnid is a web crawler that relies on Bloom Filters to efficiently store visited urls and Typhoeus to avoid the overhead of Mechanize when crawling every page on a domain.
2020
2021
2022
2023
2024
2025
0.02
Low commit activity in last 3 years
A long-lived project that still receives updates
Post URLs to Wayback Machine (Internet Archive), using a crawler, from Sitemap(s) or a list of URLs.
2020
2021
2022
2023
2024
2025
0.02
Low commit activity in last 3 years
No release in over a year
validate-website is a web crawler for checking the markup validity with XML Schema / DTD and not found urls.
2020
2021
2022
2023
2024
2025
0.01
No commit activity in last 3 years
No release in over 3 years
render_static allows you to make your single-page apps (Backbone, Angular, etc) built on Rails SEO-friendly. It works by injecting a small rack middleware that will render pages as plain html, when the requester is one of the most common crawlers/bots out there (Google, Yahoo Baidu and Bing)
2020
2021
2022
2023
2024
2025