0.0
No commit activity in last 3 years
No release in over 3 years
Database package for the metacrunch ETL toolkit.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
 Dependencies

Runtime

>= 5.0.0
 Project Readme

metacrunch-db

Gem Version Code Climate Test Coverage CircleCI

This is the official SQL database package for the metacrunch ETL toolkit. The implementation uses the Sequel Gem as a dependency. Every database that is supported by Sequel can be used with this package.

Installation

Include the gem in your Gemfile

gem "metacrunch-db", "~> 1.0.0"

and run $ bundle install to install it.

Or install it manually

$ gem install metacrunch-db

Usage

Note: For working examples on how to use this package check out our demo repository.

Metacrunch::DB::Source

This class provides a metacrunch source implementation that can be used to read data from SQL databases into a metacrunch job.

# my_job.metacrunch

# Create a Sequel database connection 
SOURCE_DB = Sequel.connect(...)

# Create a Sequel dataset with an unambiguous order.
my_source_dataset = SOURCE_DB[:my_table].order(:id)

# Set the source
source Metacrunch::DB::Source.new(my_source_dataset [, OPTIONS])

The implementation uses Sequel's paged_each to efficiently iterate even over large result sets. You can provide the following options, to control how paged_each works.

Options

For a detailed descriptions consult the Sequel documentation of paged_each. Please note that the default for strategy has been changed to :filter.

  • :rows_per_fetch: Defaults to 1000.
  • :strategy: :offset or :filter, Defaults to :filter.
  • :filter_values: Defaults to nil

Metacrunch::DB::Destination

This class provides a metacrunch destination implementation that can be used to write data from a metacrunch job to SQL databases.

# my_job.metacrunch

# Create a Sequel database connection 
DEST_DB = Sequel.connect(...)

# Create a Sequel dataset where data should be written
my_target_dataset = DEST_DB[:my_table]

# For performance reasons it may be useful to create a batch
# of records that gets written to the database
transformation ->(row) { row }, buffer: 1000

# Set the destination
destination Metacrunch::DB::Destination.new(my_target_dataset [, OPTIONS])

Options

  • :use_upsert: When set to true it will perform an upsert (Update an existing record) and not an insert. Defaults to false.
  • :primary_key: The primary key to use to identify an existing record in case of an upsert. It defaults to :id.
  • :transaction_options: A hash of options to control how the database should handle the transaction. For a complete list of available options checkout out the Sequel documentation here.

License

metacrunch-db is available at github under MIT license.