0.0
No commit activity in last 3 years
No release in over 3 years
Using the Redshift COPY command to load a single file or a collection of files specified in a manifest file
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

~> 1.11
~> 10.0
~> 3.0

Runtime

= 0.18.3
 Project Readme

RedshiftCopier

The redshift_copier gem can be used to copy a single CSV file or a collection of files in S3 into Redshift. If you would like to move data from one Redshift cluster to another, see the redshift_extractor gem.

Copying without a manifest file

Here is how to use the redshift_copier gem to copy file(s) without a manifest file.

db_config = {
  user: "some user",
  password: "password",
  host: "host",
  sslmode: 'require',
  port: 5439,
  dbname: "database name"
}
args = {
  schema: "schema name",
  table: "table name",
  create_sql: File.read("path to create.sql file"),
  aws_access_key_id: "aws_access_key_id",
  aws_secret_access_key: "aws_secret_access_key",
  s3_path: "s3://some_bucket/some_file.csv",
  db_config: db_config,
  copy_command_options: "dateformat 'auto' timeformat 'auto' blanksasnull emptyasnull escape removequotes delimiter ',' ignoreheader 1;"
}
copier = RedshiftCopier::Copy.new(args)
copier.run

The copy_command_options need to be updated depending on your data. For example, if your CSV file does not have a header row, the ignoreheader 1 option should be removed.

To copy a collection of files, set the s3_path to a folder: "s3://some_bucket/some_folder/". All the files in the folder will be copied and they all must have the same CSV structure.

Copying with a manifest file

Copying a collection of files is similar to copying a single file, but it requires a manifest file.

{
  "entries": [
    {"url":"s3://some_bucket/some_folder/0000_part_00.gz"},
    {"url":"s3://some_bucket/some_folder/0001_part_00.gz"}
  ]
}

The path to the manifest file must be set in s3_path and the copy_command_options must specify the manifest option.

args = {
  schema: "schema name",
  table: "table name",
  create_sql: File.read("path to create.sql file"),
  aws_access_key_id: "aws_access_key_id",
  aws_secret_access_key: "aws_secret_access_key",
  s3_path: "s3://some_bucket/some_folder/manifest",
  db_config: db_config,
  copy_command_options: "manifest dateformat 'auto' timeformat 'auto' blanksasnull emptyasnull escape gzip removequotes delimiter '|';"
}
copier = RedshiftCopier::Copy.new(args)
copy.run

Installation

Add this line to your application's Gemfile:

gem 'redshift_copier'

And then execute:

$ bundle

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/MrPowers/redshift_copier.

License

The gem is available as open source under the terms of the MIT License.