0.0
No commit activity in last 3 years
No release in over 3 years
web crawler that generates a sitemap to a neo4j database. It will also store broken_links and total number of pages on site
2019
2020
2021
2022
2023
2024
0.0
Repository is gone
No release in over 3 years
Email crawler: crawls the top ten Google search results looking for email addresses and exports them to CSV.
2019
2020
2021
2022
2023
2024
0.0
No commit activity in last 3 years
No release in over 3 years
Web crawler with JSON-based DSL and EventMachine-powered page fetching
2019
2020
2021
2022
2023
2024
0.0
No commit activity in last 3 years
No release in over 3 years
The Baidu Crawler is to crawl data with your demmand
2019
2020
2021
2022
2023
2024
0.0
No commit activity in last 3 years
No release in over 3 years
There's a lot of open issues
Simple site crawler using Capybara
2019
2020
2021
2022
2023
2024
0.0
No commit activity in last 3 years
No release in over 3 years
Crawler for http://legendas.tv to see the most dowloaded subtitles
2019
2020
2021
2022
2023
2024
0.0
No commit activity in last 3 years
No release in over 3 years
Web crawler help you with parse and collect data from the web
2019
2020
2021
2022
2023
2024
0.08
No release in over 3 years
Low commit activity in last 3 years
There's a lot of open issues
Asynchronous web crawler, scraper and file harvester
2019
2020
2021
2022
2023
2024
0.08
No release in over 3 years
Low commit activity in last 3 years
There's a lot of open issues
Crawl instagram photos, posts and videos for download.
2019
2020
2021
2022
2023
2024
0.07
No commit activity in last 3 years
No release in over 3 years
There's a lot of open issues
An easy to use distributed web-crawler framework based on Redis
2019
2020
2021
2022
2023
2024
0.03
Low commit activity in last 3 years
No release in over a year
validate-website is a web crawler for checking the markup validity with XML Schema / DTD and not found urls.
2019
2020
2021
2022
2023
2024
0.03
No commit activity in last 3 years
No release in over 3 years
Arachnid is a web crawler that relies on Bloom Filters to efficiently store visited urls and Typhoeus to avoid the overhead of Mechanize when crawling every page on a domain.
2019
2020
2021
2022
2023
2024
0.03
No commit activity in last 3 years
No release in over 3 years
Rack Middleware adhering to the Google Ajax Crawling Scheme, using a headless browser to render JS heavy pages and serve a dom snapshot of the rendered state to a requesting search engine.
2019
2020
2021
2022
2023
2024
0.03
No release in over 3 years
Low commit activity in last 3 years
Post URLs to Wayback Machine (Internet Archive), using a crawler, from Sitemap(s) or a list of URLs.
2019
2020
2021
2022
2023
2024
0.02
Repository is archived
No commit activity in last 3 years
No release in over 3 years
Cosmicrawler is crawler library for Ruby. It provides scalable asynchronous crawling by (http|file|etc) using EventMachine.
2019
2020
2021
2022
2023
2024
0.02
No commit activity in last 3 years
No release in over 3 years
Ruby web crawler using PhantomJS
2019
2020
2021
2022
2023
2024
0.01
Repository is archived
No commit activity in last 3 years
No release in over 3 years
A web scrawler to get a Marmiton's recipe
2019
2020
2021
2022
2023
2024