Categories

Category results are hidden when using a custom project result order
No release in over 3 years
Low commit activity in last 3 years
There's a lot of open issues
Cobweb is a web crawler that can use resque to cluster crawls to quickly crawl extremely large sites which is much more performant than multi-threaded crawlers. It is also a standalone crawler that has a sophisticated statistics monitoring interface to monitor the progress of the crawls.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0.01
No commit activity in last 3 years
No release in over 3 years
Gem for crawling data from external sources
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
Low commit activity in last 3 years
There's a lot of open issues
No release in over a year
Generic Web crawler with a DSL that parses structured data from web pages
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0.02
No commit activity in last 3 years
No release in over 3 years
is_crawler does exactly what you might think it does: determine if the supplied string matches a known crawler or bot.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0.03
Low commit activity in last 3 years
No release in over a year
validate-website is a web crawler for checking the markup validity with XML Schema / DTD and not found urls.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0.0
A long-lived project that still receives updates
初级开发工程师,基于 http 写的爬虫扩展包。请不要随意下载里面有很多坑。
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0.0
No commit activity in last 3 years
No release in over 3 years
This gem helps Crawler Writers to interact with the PromoQui REST API
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0.08
No release in over 3 years
Low commit activity in last 3 years
There's a lot of open issues
Asynchronous web crawler, scraper and file harvester
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0.0
No commit activity in last 3 years
No release in over 3 years
Cangrejo lets you consume crabfarm crawlers using a simple DSL
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0.0
No commit activity in last 3 years
No release in over 3 years
Dead simple yet powerful Ruby crawler for easy parallel crawling with support for an anonymity.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0.0
Repository is gone
No release in over 3 years
Simple little website crawler.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Popularity
0.07
No commit activity in last 3 years
No release in over 3 years
There's a lot of open issues
An easy to use distributed web-crawler framework based on Redis
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0.0
Repository is gone
No release in over 3 years
Ruby web crawler to access omelete informations
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Popularity
0.03
No commit activity in last 3 years
No release in over 3 years
Arachnid is a web crawler that relies on Bloom Filters to efficiently store visited urls and Typhoeus to avoid the overhead of Mechanize when crawling every page on a domain.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0.01
No commit activity in last 3 years
No release in over 3 years
Website crawler and fulltext indexer.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0.0
No release in over 3 years
Iudex is a general purpose web crawler and feed processor in ruby/java. The iudex-da gem provides a PostgreSQL-based content meta-data store and work priority queue.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Popularity