Low commit activity in last 3 years
No release in over a year
Stores files on Amazon S3 using aws-sdk-java-v2.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies
 Project Readme

Amazon S3 output plugin for Embulk


embulk-output-s3v2 is a plugin for Embulk, which based on aws-sdk-java-v2. Files stores on Amazon S3.

Overview

  • Plugin type: output
  • Load all or nothing: no
  • Resume supported: no
  • Cleanup supported: yes, but development in progress

Configuration

  • region: AWS region name. (string, required)
  • enable_profile: If true, AWS credentials profile will be used when authenticating AWS. If false, IamRole will be used. (boolean, default: false)
    • Supported in v0.2.0 or later
  • profile: AWS credentials profile name. If enable_profile is false, this parameter will be ignored. (string, default: default)
    • Supported in v0.2.0 or later
  • bucket: S3 bucket name. (string, required)
  • object_key_prefix: Prefix of S3 Objects key name. (string, required)
  • enable_multi_part_upload: If true, multipart upload will be enable. (boolean, default: false)
    • If enable_temp_file_output is false, this parameter must be false or are not specified.
  • max_concurrent_requests: Maximum concurrently requests to upload an object divided into multipart. If enable_multi_part_upload is false, this parameter will be ignored. (int, default: 10)
  • multipart_chunksize: Once the operation have decided to use multipart operation, the file will be divided into chunks specified this parameter. If enable_multi_part_upload is false, this parameter will be ignored. (string, default: 8MB)
    • Minimum size: 5MB
    • Maximum size: 2GB
    • Enable semantics
      • Same as that of multipart_threshold
  • multipart_threshold: The size threshold the plugin uses for multipart transfers of individual divided bulk-data. If enable_multi_part_upload is false, this parameter will be ignored. (string, default: 8MB)
    • Enable semantics
      • KB
      • MB
      • GB
      • TB
  • extension: File extension. (string, required)
  • enable_temp_file_output: If true, temp file will be created in temp_path directory. If false, bulk data will be treated on only buffer. (boolean, default: true)
  • temp_path: Directory for temp file output. (string, default: /tmp)
  • temp_file_prefix: Prefix of temp file name. (string, default: embulk-output-s3v2)

Example

Basic sample with IAMRole authentication

out:
  type: s3v2
  region: ap-northeast-1
  bucket: s3-bucket-name
  object_key_prefix: embulk/embulk-output-s3v2
  temp_path: /tmp
  extension: .csv
  formatter:
    type: csv
    delimeter: ","

Basic sample with Credentials-Profile authentication

out:
  type: s3v2
  region: ap-northeast-1
  bucket: s3-bucket-name
  object_key_prefix: embulk/embulk-output-s3v2
  temp_path: /tmp
  enable_profile: true
  profile: default
  extension: .csv
  formatter:
    type: csv
    delimeter: ","

Multipart Upload Sample with gzip encode

out:
  type: s3v2
  region: ap-northeast-1
  bucket: s3-bucket-name
  object_key_prefix: embulk/embulk-output-s3v2
  temp_path: /tmp
  enable_multi_part_upload: true
  multipart_chunksize: 10MB
  max_concurrent_requests: 20
  extension: csv.gz
  formatter:
    type: csv
    delimeter: ","
  encoders:
  - type: gzip
    level: 1

Usage

Build

$ ./gradlew gem