Fetch
Fetch enables easy fetching of data from multiple web sources. It was extracted from Bogrobotten where we use it to fetch prices and other stuff from multiple merchants. We use it for price comparison, but you can use it for anything that involves fetching data from external sources.
Fetch uses the Typhoeus gem for fast and reliable asynchronous fetches from multiple URLs.
Installation
Add this line to your application's Gemfile:
gem "fetch"Then run:
$ bundleExample
In app/models/user.rb:
class User < ActiveRecord::Base
def fetcher
@fetcher ||= UserFetcher.new(self)
end
endIn app/fetchers/user_fetcher.rb:
class UserFetcher < Fetch::Base
modules Facebook::UserInfoFetch,
Github::UserInfoFetch
endIn lib/facebook/user_info_fetch.rb:
module Facebook
class UserInfoFetch < Fetch::Module
include Fetch::Simple
include Fetch::JSON
url do
"http://graph.facebook.com/#{fetchable.login}"
end
process do |user_info|
fetchable.update_attribute :facebook_id, user_info["id"]
end
end
endIn lib/github/user_info_fetch.rb
module Github
class UserInfoFetch < Fetch::Module
include Fetch::JSON
# Request for user ID
request do |req|
req.url = "https://api.github.com/users/#{fetchable.login}"
req.process do |user|
fetchable.update_attribute :github_id, user["id"]
end
end
# Request for repos
request do |req|
req.url = "https://api.github.com/users/#{fetchable.login}/repos"
req.process do |repos|
repo_names = repos.map { |r| r["name"] }
fetchable.update_attribute :github_repos, repo_names
end
end
end
endThen, when everything is set up, you can do:
user = User.find(123)
user.fetcher.fetchThis will run three requests – one for Facebook and two for GitHub – and update the user model with a Facebook user ID, a GitHub user ID, and a list of GitHub repos.
Good to know
Doing something before a fetch
If you need to run something before a fetch is started, you can do it using the
before_fetch callback.
class UserFetcher < Fetch::Module
modules Facebook::UserInfoFetch,
Github::UserInfoFetch
before_fetch do
# Do something before the fetch.
end
end
user = User.find(123)
UserFetcher.new(user).fetch
# => `before_fetch` is run before fetchingNote: If you define more than one before_fetch callback, they are run in the order
in which they were defined.
Doing something after a fetch
If you need to run something after a fetch is completed, you can do it using
the after_fetch callback.
class UserFetcher < Fetch::Module
modules Facebook::UserInfoFetch,
Github::UserInfoFetch
after_fetch do
# Do something after the fetch has completed.
end
end
user = User.find(123)
UserFetcher.new(user).fetch
# => `after_fetch` is run after fetchingNote: If you define more than one after_fetch callback, they are run in
the reverse order of which they were defined.
Adding defaults to your requests
Each fetch module has a defaults callback that you can use to set up defaults
for all requests in that modules.
class UserInfoFetch < Fetch::Module
defaults do |req|
req.user_agent = "My Awesome Bot!"
end
request do |req|
req.url = "http://test.com"
req.process do |body|
# Do some processing
end
end
endThis will add the user agent My Awesome Bot! to all requests in the
UserInfoFetch module.
The defaults callback is inherited, like all other callbacks, so if you have
a base fetch class that you subclass, the defaults callback in the superclass
will be run in all subclasses.
Handling HTTP failures
HTTP failures can be handled using the failure callback. If you want to
handle failures for all requests generally, you can use the module-wide
failure callback:
class UserInfoFetch < Fetch::Module
request do |req|
req.url = "http://test.com/something-failing"
req.process do |body|
# Do something if successful.
end
end
failure do |code, url|
Rails.logger.info "Fetching from #{url} failed: #{code}"
end
endIf you want to handle failures on the specific requests instead:
class UserInfoFetch < Fetch::Module
request do |req|
req.url = "http://test.com/something-failing"
req.process do |body|
# Do something if successful.
end
req.failure do |code, url|
# Handle the failure
end
end
endWhen you handle failures directly on the request, the general failure
callback isn't called.
Note: If you don't specify a failure callback at all, HTTP failures are ignored,
and processing skipped for the failed request.
Handling fetch errors
Sometimes a URL will return something that potentially makes your processing
code fail. To prevent this from breaking your whole fetch, you can handle
errors using the error callback:
class UserInfoFetch < Fetch::Module
request do |req|
req.url = "http://test.com/something-failing"
req.process do |body|
# Do something if successful.
end
end
error do |exception|
Rails.logger.info "An error occured: #{exception.message}\n" +
exception.backtrace.join("\n")
raise exception if ["development", "test"].include?(Rails.env)
end
endYou can also do it directly on the requests:
class UserInfoFetch < Fetch::Module
request do |req|
req.url = "http://test.com/something-failing"
req.process do |body|
# Do something if successful.
end
req.error do |exception|
# Handle the error
end
end
endIf you handle errors directly on the requests, the general error callback
isn't run.
Note: If you don't do any error handling in one of the two ways shown above, any exceptions that occur when processing will be raised, causing the whole fetch to fail. So please add error handling 😊
General error handling
If you need to ensure that something is run, even if anything in the fetch
fails, you can add an error callback to your Fetch::Base subclass.
class UserFetcher < Fetch::Base
modules Facebook::UserInfoFetch,
Github::UserInfoFetch
before_fetch do
this_fails!
end
error do |e|
# Do something that must be done,
# even if the fetch fails.
end
end
user = User.find(123)
UserFetcher.new(user).fetch
# => raises an exception, but the error callback will be run before that.Parsing JSON
Fetch has a module for automatically parsing the request body as JSON before it is sent to the process block.
class UserInfoFetch < Fetch::Module
include Fetch::JSON
request do |req|
req.url = "http://api.test.com/user"
req.process do |json|
# Do something with the JSON.
end
end
endDynamically loading fetch modules
You can load fetch modules dynamically using the load callback. Normally, the
modules defined with modules are instantiated directly. When you use the
load callback, this will determine how your modules are loaded.
class UserFetcher < Fetch::Base
modules :user_info_fetch, :status_fetch
load do |modules|
namespaces.product(modules).map do |path|
path.join("/").camelize.safe_constantize
end.compact
end
private
def namespaces
[:github, :facebook]
end
endThis will load the modules Github::UserInfoFetch, Github::StatusFetch,
Facebook::UserInfoFetch and Facebook::StatusFetch, if they are present.
The load callback is only run once, so you can safely inherit it – only the
last one defined will be run.
Initializing fetch modules
Normally, a fetcher is initialized with an optional fetchable that is sent
along to the fetch modules when they are initialized. You can change how this
works with the init callback.
Let's say you have a Search model with a SearchFetcher that gets results
from various search engines. Normally, the Search instance would be sent to
the fetch modules as a fetchable. Let's say you just want to send the keyword
to reduce coupling.
In app/fetchers/search_fetcher.rb:
class SearchFetcher < Fetch::Base
modules Google::KeywordFetch,
Bing::KeywordFetch
init do |klass|
klass.new(fetchable.keyword)
end
endIn lib/base/keyword_fetch.rb:
module Base
class KeywordFetch < Fetch::Module
attr_reader :keyword
def initialize(keyword)
@keyword = keyword
end
end
endIn lib/google/keyword_fetch.rb:
module Google
class KeywordFetch < Base::KeywordFetch
request do |req|
req.url = "https://www.google.com/search?q=#{CGI::escape(keyword)}"
req.process do |body|
# Do something with the body.
end
end
end
endAnd lib/bing/keyword_fetch.rb something similar to Google.
Then:
search = Search.find(123)
SearchFetcher.new(search).fetchNow the keyword will be sent to the fetch modules instead of the fetchable.
Changelog
See the changelog for changes in the different versions.
Contributing
Contributions are much appreciated. To contribute:
- Fork the project
- Create a feature branch (
git checkout -b my-new-feature) - Make your changes, including tests so it doesn't break in the future
- Commit your changes (
git commit -am 'Add feature') - Push to the branch (
git push origin my-new-feature) - Create new pull request
Please do not touch the version, as this will be updated by the owners when the gem is ready for a new release.