0.0
No release in over 3 years
Iudex is a general purpose web crawler and feed processor in ruby/java. The iudex-da gem provides a PostgreSQL-based content meta-data store and work priority queue.
2019
2020
2021
2022
2023
2024
0.0
No release in over 3 years
Iudex is a general purpose web crawler and feed processor in ruby/java. The iudex-simhash gem contains support for generation and searching over simhash fingerprints
2019
2020
2021
2022
2023
2024
0.0
No release in over 3 years
Iudex is a general purpose web crawler and feed processor in ruby/java. The iudex-core gem contains core facilities and notably, does not contain such facilities as database-backed state management.
2019
2020
2021
2022
2023
2024
0.0
No release in over 3 years
Iudex is a general purpose web crawler and feed processor in ruby/java. This gem is an rjack-httpclient-3 based implementation of the iudex-http interfaces.
2019
2020
2021
2022
2023
2024
0.0
No release in over 3 years
Iudex is a general purpose web crawler and feed processor in ruby/java. The iudex-rome gems is an adaption of rjack-rome for feed parsing in Iudex.
2019
2020
2021
2022
2023
2024
0.0
No release in over 3 years
Iudex is a general purpose web crawler and feed processor in ruby/java. The iudex-barc gem contains support for the BARC Basic ARChive format.
2019
2020
2021
2022
2023
2024
0.0
No release in over 3 years
Iudex is a general purpose web crawler and feed processor in ruby/java. The iudex-html gem contains filters for HTML parsing, filtering, exracting text and links.
2019
2020
2021
2022
2023
2024
0.26
Low commit activity in last 3 years
No release in over a year
Voight-Kampff detects bots, spiders, crawlers and replicants
2019
2020
2021
2022
2023
2024
0.0
No commit activity in last 3 years
No release in over 3 years
Web crawler help you with parse and collect data from the web
2019
2020
2021
2022
2023
2024
0.0
No commit activity in last 3 years
No release in over 3 years
Automatically protects your staging app from web crawlers and casual visitors.
2019
2020
2021
2022
2023
2024
0.03
No commit activity in last 3 years
No release in over 3 years
Arachnid is a web crawler that relies on Bloom Filters to efficiently store visited urls and Typhoeus to avoid the overhead of Mechanize when crawling every page on a domain.
2019
2020
2021
2022
2023
2024
0.0
No release in over 3 years
Iudex is a general purpose web crawler and feed processor in ruby/java. The iudex-http-test gem contains a HTTP test server for testing HTTP client implementations.
2019
2020
2021
2022
2023
2024
0.0
No release in over 3 years
Iudex is a general purpose web crawler and feed processor in ruby/java. This gem is an rjack-async-httpclient based implementation of the iudex-http interfaces.
2019
2020
2021
2022
2023
2024
0.0
No release in over 3 years
Iudex is a general purpose web crawler and feed processor in ruby/java. The iudex-char-detector gem provides charset detection support.
2019
2020
2021
2022
2023
2024
0.0
No release in over 3 years
Iudex is a general purpose web crawler and feed processor in ruby/java. This gem is a Jetty HTTP Client based implementation of the iudex-http interfaces.
2019
2020
2021
2022
2023
2024
0.0
Repository is gone
No release in over 3 years
Generic Web crawler with a DSL that parses event-related data from web pages
2019
2020
2021
2022
2023
2024
0.0
No release in over 3 years
Crawler Engine provides function of crawl all news from the customized website
2019
2020
2021
2022
2023
2024
0.0
No commit activity in last 3 years
No release in over 3 years
Crawler for http://legendas.tv to see the most dowloaded subtitles
2019
2020
2021
2022
2023
2024
0.55
Low commit activity in last 3 years
There's a lot of open issues
No release in over a year
Generic Web crawler with a DSL that parses structured data from web pages
2019
2020
2021
2022
2023
2024
0.0
No release in over 3 years
Low commit activity in last 3 years
Easy way to enable AdSense crawler to login and see private or custom pages in your rails application. Basically one custom login filter. Gem enables you to easily slightly increase revenues from Google AdSense/AdWords. It makes it easy to enable crawling on private pages and so get better target...
2019
2020
2021
2022
2023
2024