DhEasy
Description
DhEasy gem collection allow advance DataHen features possible by including a collection of specialized gems.
Install gem:
gem install 'dh_easy'Require gem:
require 'dh_easy'Included gems documentation:
dh_easy-core: http://rubydoc.org/gems/dh_easy-core/frames
dh_easy-config: http://rubydoc.org/gems/dh_easy-config/frames
dh_easy-router: http://rubydoc.org/gems/dh_easy-router/frames
dh_easy-text: http://rubydoc.org/gems/dh_easy-text/frames
dh_easy-login: http://rubydoc.org/gems/dh_easy-login/frames
How to implement
Sample DataHen project
Lets take a simple project without dh_easy:
# ./config.yaml
seeder:
file: ./seeder/seeder.rb
disabled: false
parsers:
- page_type: search
file: ./parsers/search.rb
disabled: false
- page_type: product
file: ./parsers/product.rb
disabled: false# ./seeder/seeder.rb
pages << {
'url' => 'https://example.com/login.rb?query=food',
'page_type' => 'search'
}# ./parsers/search.rb
require 'cgi'
html = Nokogiri.HTML content
html.css('.name').each do |element|
name = element.text.strip
pages << {
'url' => "https://example.com/product/#{CGI::escape name}",
'page_type' => 'product',
'vars' => {'name' => name}
}
end# ./parsers/product.rb
html = Nokogiri.HTML content
description = html.css('.description').first.text.strip
outputs << {
'_collection' => 'product',
'name' => page['vars']['name'],
'description' => description
}Adding dh_easy to sample project
One of DhEasy's main feature is to allow users to use classes instead of raw scripts with the whole datahen gem contexts (seeder, parsers, finishers, etc.) functions and objects integreated directly on our classes.
Converting seeders, parsers and finishers to DhEasy supported classes is quite easy, just wrap your seeders and parsers like this:
class MySeeder
include DhEasy::Core::Plugin::Seeder
# Create "initialize_hook_*" methods instead of "initialize" method
# to prevent overriding the logic behind DhEasy
def initialize_hook_my_seeder opts = {}
@my_param = opts[:my_param]
end
def seed
# Your seeder code goes here
end
endclass MyParser
include DhEasy::Core::Plugin::Parser
# Create "initialize_hook_*" methods instead of "initialize" method
# to prevent overriding the logic behind DhEasy
def initialize_hook_my_parser opts = {}
@my_param = opts[:my_param]
end
def parse
# Your parser code goes here
end
endclass MyFinisher
include DhEasy::Core::Plugin::Finisher
# Create "initialize_hook_*" methods instead of "initialize" method
# to prevent overriding the logic behind DhEasy
def initialize_hook_my_parser opts = {}
@my_param = opts[:my_param]
end
def finish
# Your finisher code goes here
end
endYou can also add initialize_hook_ methods to extend the default initialize provided by DhEasy plugins.
Now let's try this on our sample project's seeders and parsers:
# ./seeder/seeder.rb
module Seeder
class Seeder
include DhEasy::Core::Plugin::Seeder
def seed
pages << {
'url' => 'https://example.com/search?query=food',
'page_type' => 'search'
}
end
end
end# ./parsers/search.rb
module Parsers
class Search
include DhEasy::Core::Plugin::Parser
def parse
html = Nokogiri.HTML content
html.css('.name').each do |element|
name = element.text.strip
pages << {
'url' => "https://example.com/product/#{CGI::escape name}",
'page_type' => 'product',
'vars' => {'name' => name}
}
end
end
end
end# ./parsers/product.rb
module Parsers
class Product
include DhEasy::Core::Plugin::Parser
def parse
html = Nokogiri.HTML content
description = html.css('.description').first.text.strip
outputs << {
'_collection' => 'product',
'name' => page['vars']['name'],
'description' => description
}
end
end
endNext step is to add router capabilities to consume these classes. To do this, let's create the routers and require our seeder and parsers classes, like this:
# ./router/seeder.rb
require 'dh_easy/router'
require './seeder/seeder'
DhEasy::Router::Seeder.new.route context: self# ./router/parser.rb
require 'cgi'
require 'dh_easy/router'
require './parsers/search'
require './parsers/product'
DhEasy::Router::Parser.new.route context: selfNow lets create our ./dh_easy.yaml config file to link our routers to our new seeder and parsers classes:
# ./dh_easy.yaml
router:
parser:
routes:
- page_type: search
class: Parsers::Search
- page_type: product
class: Parsers::Product
seeder:
routes:
- class: Seeder::SeederFinally, we need to modify our ./config.yaml to use our routers:
# ./config.yaml
seeder:
file: ./router/seeder.rb
disabled: false
parsers:
- page_type: search
file: ./router/parser.rb
disabled: false
- page_type: product
file: ./router/parser.rb
disabled: falseHurray! you have successfullly implemented DhEasy on your project.