Project

clownfish

0.0
No commit activity in last 3 years
No release in over 3 years
Anemone helper making common crawls easier to repeat.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

~> 2.12

Runtime

~> 0.7.2
 Project Readme

Clownfish

Helper for Anemone. Makes common crawls easier to repeat.

Installation

Add this line to your application's Gemfile:

gem 'clownfish'

And then execute:

$ bundle

Or install it yourself as:

$ gem install clownfish

Usage

require 'clownfish'

clownfish = MyClownfish.new

Anemone.crawl_with_clownfish(start_url, clownfish)

# query clownfish for data from crawl

Clownfish Spec

A clownfish is an object that has one or more of the following instance methods:

Reference: Anemone RDocs

anemone_options

Returns a Hash of Symbol to values. See Anemone::Core::DEFAULT_OPTS for available options. This is forwarded as the second argument to Anemone.crawl (rdoc). Invoked once before crawl.

skip_links_like

Returns a single Regexp or Array of Regexp. Urls matching any of these will not be crawled. Invoked once before crawl.

on_every_page

Takes one argument, an Anemone::Page (rdoc). Invoked once per page during crawl.

focus_crawl

Takes one argument, an Anemone::Page (rdoc). Returns the links (Array of URI) on that page that should be crawled. See Anemone::Page#links for a starting point. Invoked once per page during crawl.

after_crawl

Takes one argument, an Anemone::PageStore (rdoc). Invoked once after crawl is done.

What's Included

See wiki for examples.

Clownfish::LinksByPage

Lists every page that has links, the links and the status code when following those links.

Clownfish::ResponseTimes

Record every url and it's response time.

Clownfish::Count

Count pages.

Contributing

  1. Fork it
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request