Project

naginegi

0.0
No release in over 3 years
Low commit activity in last 3 years
Generate Embulk config and BigQuery schema from RDBMS schema
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

= 2.1.4
= 12.3.3
= 3.8.0
= 0.49.1
= 0.9.1

Runtime

= 0.5.3
= 1.2.2
 Project Readme

Naginegi

Generate Embulk config and BigQuery schema from MySQL and PostgreSQL schema and run Embulk.

Installation

Add this line to your application's Gemfile:

gem 'naginegi'

And then execute:

$ bundle

Or install it yourself as:

$ gem install naginegi

Embulk setup

Naginegi is utility for Embulk . You need to install Embulk and install some gems like below.

embulk gem install embulk-input-mysql --version 0.10.1
embulk gem install embulk-input-postgresql --version 0.10.1
embulk gem install embulk-output-bigquery --version 0.6.4
embulk gem install embulk-parser-jsonl --version 0.2.0
embulk gem install embulk-formatter-jsonl --version 0.1.4

Usage

Require database.yml and table.yml.
database.yml requires db_type (mysql or postgresql).

RDBMS db_type
MySQL mysql
PostgreSQL postgresql

Below is a sample config file.

database.yml

db01:
  db_type: mysql
  host: localhost
  username: root
  password: pswd
  database: production
  bq_dataset: mysql

db02:
  db_type: postgresql
  host: localhost
  username: root
  password: pswd
  database: production
  bq_dataset: pg

Caution: Embulk doesn't allow no password for MySQL

table.yml

db01:
  tables:
    - name: users
    - name: events
    - name: hobbies

db02:
  tables:
    - name: administrators
    - name: configs

sample

Naginegi requires BigQuery parameters like below.

using json key file path

[sample.rb]
require 'naginegi'

config = {
 'project_id' => 'BIGQUERY_PROJECT_ID',
 'service_email' => 'SERVICE_ACCOUNT_EMAIL',
 'auth_method' => 'json_key',
 'json_keyfile' => 'JSON_KEYFILE_PATH',
 'schema_dir' => '/var/tmp/embulk/schema',
 'config_dir' => '/var/tmp/embulk/config'
}

client = Naginegi::EmbulkRunner.new
client.generate_config(config)
client.run(config)
ruby sample.rb

using key values

[sample.rb]
require 'naginegi'

json_key = {
  "type" => "...",
  "project_id" => "...",
  "private_key_id" => "...",
  "private_key" => "...",
  ...
}

config = {
 'project_id' => 'BIGQUERY_PROJECT_ID',
 'service_email' => 'SERVICE_ACCOUNT_EMAIL',
 'auth_method' => 'json_key',
 'json_key' => json_key,
 'schema_dir' => '/var/tmp/embulk/schema',
 'config_dir' => '/var/tmp/embulk/config'
}

client = Naginegi::EmbulkRunner.new
client.generate_config(config)
client.run(config)
ruby sample.rb

Features

process status

Naginegi returns process status as boolean.
If all tables are succeed, then returns true, else false .
It is useful to control system flow.

process_status = Naginegi::EmbulkClient.new.run(config)
exit 1 unless process_status

narrow tables

You can narrow actual target tables from table.yml for test or to retry.
If no target tables is given, Naginegi will execute all tables.

# in case, all tables are ['users', 'purchases', 'items']
target_tables = ['users', 'purchases']
Naginegi::EmbulkClient.new.run(config, target_tables)

retry

You can set retry count.
If any table failed, only failed table will be retried until retry count.
If no retry count is given, Naginegi dosen't retry.

# 2 times retry will execute
Naginegi::EmbulkClient.new.run(config, [], 2)

SQL condition

If you set condition to a table in table.yml , SQL is generated like below.
It is useful for large size table.

[table.yml]
production:
  tables:
    - name: users
    - name: events
      conditon: created_at < CURRENT_DATE()
SELECT * FROM users
SELECT * FROM events WHERE created_at < CURRENT_DATE()

daily snapshot

BigQuery supports table wildcard expression of a specific set of daily tables, for example, sales20150701 .
If you need daily snapshot of a table for BigQuery, use daily_snapshot option to database.yml or table.yml like below.
daily_snapshot option effects all tables in case of database.yml .
On the other hand, only target table in table.yml .
Daily part is determined by execute date.

[database.yml]
production:
  host: localhost
  username: root
  password: pswd
  database: production
  bq_dataset: mysql
  daily_snapshot: true
[table.yml]
production:
  tables:
    - name: users
    - name: events
      daily_snapshot: true
    - name: hobbies

Only `events` is renamed to `eventsYYYYMMDD` for BigQuery.

Contributing

  1. Fork it ( https://github.com/[my-github-username]/naginegi/fork )
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create a new Pull Request