Categories

Category results are hidden when using a custom project result order
0.07
Low commit activity in last 3 years
A long-lived project that still receives updates
CrawlerDetect is a library to detect bots/crawlers via the user agent
2019
2020
2021
2022
2023
2024
0.0
A long-lived project that still receives updates
初级开发工程师,基于 http 写的爬虫扩展包。请不要随意下载里面有很多坑。
2019
2020
2021
2022
2023
2024
0.0
The project is in a healthy, maintained state
With just a few lines of code, developers can effortlessly integrate this gem into their projects, enabling seamless retrieval of page titles from HTML documents. Whether you're building web scrapers, crawlers, or any application that requires fetching webpage titles, WebTitle streamlines the pro...
2019
2020
2021
2022
2023
2024
0.0
The project is in a healthy, maintained state
Hushes worthless Rails exceptions & logs, such as those caused by bots and crawlers.
2019
2020
2021
2022
2023
2024
0.0
Low commit activity in last 3 years
A long-lived project that still receives updates
A simple web crawler for ruby
2019
2020
2021
2022
2023
2024
0.0
Repository is archived
No release in over a year
Retrieves a list of URLs to seed the crawler by publishing them to a RabbitMQ exchange.
2019
2020
2021
2022
2023
2024
0.26
Low commit activity in last 3 years
No release in over a year
Voight-Kampff detects bots, spiders, crawlers and replicants
2019
2020
2021
2022
2023
2024
0.03
Low commit activity in last 3 years
No release in over a year
validate-website is a web crawler for checking the markup validity with XML Schema / DTD and not found urls.
2019
2020
2021
2022
2023
2024
0.55
Low commit activity in last 3 years
There's a lot of open issues
No release in over a year
Generic Web crawler with a DSL that parses structured data from web pages
2019
2020
2021
2022
2023
2024
0.0
Repository is gone
No release in over a year
Crawler Guru provides all basic functionalities to extract data from web pages
2019
2020
2021
2022
2023
2024
0.03
No release in over 3 years
Low commit activity in last 3 years
Post URLs to Wayback Machine (Internet Archive), using a crawler, from Sitemap(s) or a list of URLs.
2019
2020
2021
2022
2023
2024
0.0
No release in over 3 years
Low commit activity in last 3 years
webget gem - a web (go get) crawler incl. web cache
2019
2020
2021
2022
2023
2024
0.02
No commit activity in last 3 years
No release in over 3 years
Ruby web crawler using PhantomJS
2019
2020
2021
2022
2023
2024
0.13
No release in over 3 years
Low commit activity in last 3 years
There's a lot of open issues
Cobweb is a web crawler that can use resque to cluster crawls to quickly crawl extremely large sites which is much more performant than multi-threaded crawlers. It is also a standalone crawler that has a sophisticated statistics monitoring interface to monitor the progress of the crawls.
2019
2020
2021
2022
2023
2024
0.0
No commit activity in last 3 years
No release in over 3 years
Block crawlers who spam your site with fake HTTP referers
2019
2020
2021
2022
2023
2024
0.0
No release in over 3 years
This little library helps people download images from different subs much easier. It's actually like a crawler for the images posted on a subreddit. Actually, it's a great tool to have your favorite memes locally!
2019
2020
2021
2022
2023
2024