0.01
No commit activity in last 3 years
No release in over 3 years
Hadoop Ruby DSL is now changed to Hadoop Papyrus. Please use it. Hadoop papyrus - Ruby DSL for Hadoop
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
 Dependencies

Runtime

 Project Readme

hadoop-papyrus¶ ↑

Enable to run Ruby DSL script on your Hadoop.

Description¶ ↑

You can write DSL by Ruby to run Hadoop as Mapper / Reducer. This gem depends on ‘jruby-on-hadoop’ project.

Install¶ ↑

Required gems are all on GemCutter.

  1. Upgrade your rubygem to 1.3.5

  2. Install gems

$ gem install hadoop-papyrus

Usage¶ ↑

  1. Run Hadoop cluster on your machines and put your ‘hadoop’ executable to your PATH or set HADOOP_HOME env variable.

  2. put files into your hdfs. ex) wc/inputs/file1

  3. Now you can run ‘papyrus’ like below:

$ papyrus examples/word_count_test.rb

You can get Hadoop job results in your hdfs wc/outputs/part-*

Examples¶ ↑

Word Count DSL script

dsl 'WordCount'

from 'wc/inputs'
to 'wc/outputs'

count_uniq
total :bytes, :words, :lines

Log Analysis DSL script

dsl 'LogAnalysis'

data 'apache log on test2' do
  from 'apachelog/inputs'
  to 'apachelog/outputs'

  each_line do
    pattern /(.*) (.*) (.*) \[(.*)\] (".*") (\d*) (\d*) (.*) "(.*)"/
    column_name 'remote_host', 'pass', 'user', 'access_date', 'request', 'status', 'bytes', 'pass', 'ua'

    topic 'ua counts', :label => 'ua' do
      count_uniq column[:ua]
    end
  end
end

Run spec¶ ↑

Set HADOOP_HOME on your env and run ‘jruby -S rake spec’

Author¶ ↑

Koichi Fujikawa <fujibee@gmail.com>

License: Apache License