0.02
No commit activity in last 3 years
No release in over 3 years
JRuby on Hadoop
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Runtime

 Project Readme

JRuby on Hadoop¶ ↑

JRuby on Hadoop is a thin wrapper for Hadoop Mapper / Reducer by JRuby. We recommend to use this with hadoop-papyrus on the github / gemcutter.

Description¶ ↑

Install¶ ↑

Required gems are all on GemCutter.

  1. Upgrade your rubygem to 1.3.5

  2. Install gems

$ gem install jruby-on-hadoop

Usage¶ ↑

  1. Run Hadoop cluster on your machines and set HADOOP_HOME env variable.

  2. put files into your hdfs. ex) test/inputs/file1

  3. Now you can run ‘joh’ like below:

$ joh examples/wordcount.rb test/inputs test/outputs

You can get Hadoop job results in your hdfs test/outputs/part-*

Example ¶ ↑

see also examples/wordcount.rb

def setup(conf)
  # setup jobconf
end

def map(key, value, output, reporter)
  # mapper process
  # (wordcount example)
  value.split.each do |word|
    output.collect(word, 1)
  end
end

def reduce(key, values, output, reporter)
  # reducer process
  # (wordcount example)
  sum = 0
  values.each {|v| sum += v }
  output.collect(key, sum)
end

Build¶ ↑

You can build hadoop-ruby.jar by “ant”.

ant

Required to set env HADOOP_HOME for your system. Assumed Hadoop version is 0.19.2.

Author¶ ↑

Koichi Fujikawa <fujibee@gmail.com>

License: Apache License