Categories

Category results are hidden when using a custom project result order
0.0
No release in over 3 years
Iudex is a general purpose web crawler and feed processor in ruby/java. The iudex-core gem contains core facilities and notably, does not contain such facilities as database-backed state management.
2019
2020
2021
2022
2023
2024
0.0
No release in over 3 years
Iudex is a general purpose web crawler and feed processor in ruby/java. The iudex-http gem contains and http client agnostic abstraction layer.
2019
2020
2021
2022
2023
2024
0.54
Low commit activity in last 3 years
There's a lot of open issues
No release in over a year
Generic Web crawler with a DSL that parses structured data from web pages
2019
2020
2021
2022
2023
2024
0.0
No release in over 3 years
Rails Analyzer Tools contains Bench, a simple web page benchmarker, Crawler, a tool for beating up on web sites, RailsStat, a tool for monitoring Rails web sites, and IOTail, a tail(1) method for Ruby IOs.
2019
2020
2021
2022
2023
2024
0.0
No release in over 3 years
Iudex is a general purpose web crawler and feed processor in ruby/java. The iudex-http-test gem contains a HTTP test server for testing HTTP client implementations.
2019
2020
2021
2022
2023
2024
0.0
No release in over 3 years
Low commit activity in last 3 years
webget gem - a web (go get) crawler incl. web cache
2019
2020
2021
2022
2023
2024
0.13
No release in over 3 years
Low commit activity in last 3 years
There's a lot of open issues
Cobweb is a web crawler that can use resque to cluster crawls to quickly crawl extremely large sites which is much more performant than multi-threaded crawlers. It is also a standalone crawler that has a sophisticated statistics monitoring interface to monitor the progress of the crawls.
2019
2020
2021
2022
2023
2024
0.26
Low commit activity in last 3 years
No release in over a year
Voight-Kampff detects bots, spiders, crawlers and replicants
2019
2020
2021
2022
2023
2024
0.0
No release in over 3 years
Iudex is a general purpose web crawler and feed processor in ruby/java. This gem is an rjack-httpclient-3 based implementation of the iudex-http interfaces.
2019
2020
2021
2022
2023
2024
0.0
No release in over 3 years
Iudex is a general purpose web crawler and feed processor in ruby/java. The iudex-html gem contains filters for HTML parsing, filtering, exracting text and links.
2019
2020
2021
2022
2023
2024
0.07
No commit activity in last 3 years
No release in over 3 years
There's a lot of open issues
An easy to use distributed web-crawler framework based on Redis
2019
2020
2021
2022
2023
2024
0.0
No release in over 3 years
Iudex is a general purpose web crawler and feed processor in ruby/java. The iudex-simhash gem contains support for generation and searching over simhash fingerprints
2019
2020
2021
2022
2023
2024
0.0
No release in over 3 years
Iudex is a general purpose web crawler and feed processor in ruby/java. This gem is a Jetty HTTP Client based implementation of the iudex-http interfaces.
2019
2020
2021
2022
2023
2024
0.0
No release in over 3 years
Iudex is a general purpose web crawler and feed processor in ruby/java. This gem is an rjack-async-httpclient based implementation of the iudex-http interfaces.
2019
2020
2021
2022
2023
2024
0.0
No release in over 3 years
Iudex is a general purpose web crawler and feed processor in ruby/java. The iudex-worker gem provides a worker deamon for feed/page processing.
2019
2020
2021
2022
2023
2024
0.0
No release in over 3 years
Iudex is a general purpose web crawler and feed processor in ruby/java. The iudex-da gem provides a PostgreSQL-based content meta-data store and work priority queue.
2019
2020
2021
2022
2023
2024
0.0
No release in over 3 years
Iudex is a general purpose web crawler and feed processor in ruby/java. The iudex-brutefuzzy-protobuf gem contains the protocol buffer generated java classes for the iudex-brutefuzzy-service.
2019
2020
2021
2022
2023
2024
0.07
Low commit activity in last 3 years
A long-lived project that still receives updates
CrawlerDetect is a library to detect bots/crawlers via the user agent
2019
2020
2021
2022
2023
2024
0.0
No commit activity in last 3 years
No release in over 3 years
Show DMM and DMM.R18's crawled data. e.g. ranking
2019
2020
2021
2022
2023
2024