metacrunch-file
This is the official file package for the metacrunch ETL toolkit.
Note: For working examples on how to use this package check out our demo repository.
Installation
Include the gem in your Gemfile
gem "metacrunch-file", "~> 1.5.0"and run $ bundle install to install it.
Or install it manually
$ gem install metacrunch-file
Usage
Metacrunch::File::FileSource
This class provides a metacrunch source implementation that can be used to read data from files in the file system into a metacrunch job. The class can be used to read regular files, compressed files (gzip), tar archives and compressed tar archives (gzip).
# my_job.metacrunch
# If you call this example like so
#   $ metacrunch my_job.metacrunch *.xml
# ARGV will contain all the XML files in the current directory.
source Metacrunch::File::FileSource.new(ARGV)
# ... or you can set the filenames directly
source Metacrunch::File::FileSource.new(["my-data.xml", "my-other-data.xml", "..."])Options
NONE.
The source yields objects of type Metacrunch::File::Entry for every file it reads.
# my_job.metacrunch
transformation ->(file_entry) do
  puts "** Got file entry (Metacrunch::File::Entry)"
  puts "  Filename: #{file_entry.filename}"
  puts "  From archive?: #{file_entry.from_archive?}"
  puts "  Name in archive: #{file_entry.archive_filename || '-'}"
  puts "  Contents: #{file_entry.contents}"
endMetacrunch::File::FileDestination
This class provides a metacrunch destination to write data to a file. Every data that gets passed to the destination is appended to the given file. If the data is an Array every element of that array is appended to the file. Non existing files will be created automatically.
# my_job.metacrunch
destination Metacrunch::File::FileDestination.new("/tmp/my-data.txt" [, OPTIONS])Options
- 
override_existing_file: Overrides an existing file if set totrue. If set tofalsean error is raised if the file already exists. Defaults tofalse. 
Metacrunch::File::CSVSource
This class provides a metacrunch source for reading CSV files. It is a simple wrapper around smarter_csv gem.
# my_job.metacrunch
source Metacrunch::File::CSVSource.new(
  "source.csv" # filename
  [, OPTIONS]) # optionsOptions
Using the options argument you can pass any CSV reading option supported by smarter_csv using the key csv_options.
- 
csv_options: Hash with any option supported by smarter_csv for CVS reading. Our defaults areheaders_in_file: true,col_sep: ",",row_sep: "\n",quote_char: '"',file_encoding: "utf-8" 
Metacrunch::File::CSVDestination
This class provides a metacrunch desination for writing CSV files. Like the CSVSource this uses smarter_csv under the hood.
# my_job.metacrunch
destination Metacrunch::File::CSVDestination.new(
  "result.csv" # filename
  [, OPTIONS]  # options
)Options
- 
override_existing_file: Overrides an existing file if set totrue. If set tofalsean error is raised if the file already exists. Defaults tofalse. - 
csv_options: Set options for CSV generation ascol_sep. Full list is here. 
Metacrunch::File::XLSXDestination
This class provides a metacrunch destination implementation to create simple Excel (xlsx) files.
To use this destination a transformation is required to format the data in a proper array that can be passed to the destination. When defining the destination you must provide an array of column names. Each data row passed to the destination must be an array of the same size as the column array.
# my_job.metacrunch
transformation ->(data) do
  [data["foo"], data["bar"], ...]
end
destination Metacrunch::File::XLSXDestination.new(
    "/tmp/my-data.xlsx",           # filename
    ["Column 1", "Column 2", ...], # header columns
    OPTIONS
)Options
- 
worksheet_title: The name of the worksheet. Defaults toMy data. 
License
metacrunch-file is available at github under MIT license.