Project

masamune

0.01
Repository is archived
No commit activity in last 3 years
No release in over 3 years
Hybrid Data & Work Flow
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

Runtime

 Project Readme

Build Status License

Masamune

A Dataflow Programming Library.

Description

Masamune provides a dataflow programming framework on top of Thor. In the framework, dataflows are constructed as Thor tasks that transform source data into the target data. Source and target data descriptions are encoded as annotations associated with the Thor command. From these source and target annotations, Masamune constructs a data dependency tree that describes how to automatically construct a target data set.

Usage

Describe your dataflow as source, target data transformations:

class ExampleThor  < Thor
  # Mix in Masamune specific Data Flow Behavior
  include Masamune::Thor
  include Masamune::Actions::DataFlow

  # Mix in Masamune Actions for Data Processing
  include Masamune::Actions::Streaming
  include Masamune::Actions::Hive

  # Describe a Data Processing Job
  desc 'extract_logs', 'Organize log files by YYYY-MM-DD'

  target fs.path(:target_dir, '%Y-%m-%d')
  source fs.path(:source_dir, '%Y%m%d*.log')
  def extract_logs
    targets.missing.each do |target|
      target.sources.each do |source|
        # Transform source into target
        fs.copy(source.path, target.path)
      end
    end
  end
end

Execute your dataflow with the goal of processing all data from the start of the year:

thor extract_logs --start '1 year ago'

Testing

rake spec             # Run Rspec unit code examples
rake spec:acceptance  # Run Rspec acceptance code examples
rake spec:all         # Run All Rspec code examples
rake spec:unit        # Run Rspec unit code examples

Contributing

  • Fork the project
  • Fix the issue
  • Add unit tests
  • Submit pull request on github