Repository is archived
Low commit activity in last 3 years
No release in over a year
Stores files on Amazon S3 using aws-sdk-java-v2.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
 Dependencies
 Project Readme

Amazon S3 output plugin for Embulk


embulk-output-s3v2 is a plugin for Embulk, which based on aws-sdk-java-v2. Files stores on Amazon S3.

Overview

  • Plugin type: output
  • Load all or nothing: no
  • Resume supported: no
  • Cleanup supported: yes, but development in progress

Configuration

  • region: AWS region name. (string, required)
  • enable_profile: If true, AWS credentials profile will be used when authenticating AWS. If false, IamRole will be used. (boolean, default: false)
    • Supported in v0.2.0 or later
  • profile: AWS credentials profile name. If enable_profile is false, this parameter will be ignored. (string, default: default)
    • Supported in v0.2.0 or later
  • bucket: S3 bucket name. (string, required)
  • object_key_prefix: Prefix of S3 Objects key name. (string, required)
  • enable_multi_part_upload: If true, multipart upload will be enable. (boolean, default: false)
    • If enable_temp_file_output is false, this parameter must be false or are not specified.
  • max_concurrent_requests: Maximum concurrently requests to upload an object divided into multipart. If enable_multi_part_upload is false, this parameter will be ignored. (int, default: 10)
  • multipart_chunksize: Once the operation have decided to use multipart operation, the file will be divided into chunks specified this parameter. If enable_multi_part_upload is false, this parameter will be ignored. (string, default: 8MB)
    • Minimum size: 5MB
    • Maximum size: 2GB
    • Enable semantics
      • Same as that of multipart_threshold
  • multipart_threshold: The size threshold the plugin uses for multipart transfers of individual divided bulk-data. If enable_multi_part_upload is false, this parameter will be ignored. (string, default: 8MB)
    • Enable semantics
      • KB
      • MB
      • GB
      • TB
  • extension: File extension. (string, required)
  • enable_temp_file_output: If true, temp file will be created in temp_path directory. If false, bulk data will be treated on only buffer. (boolean, default: true)
  • temp_path: Directory for temp file output. (string, default: /tmp)
  • temp_file_prefix: Prefix of temp file name. (string, default: embulk-output-s3v2)

Example

Basic sample with IAMRole authentication

out:
  type: s3v2
  region: ap-northeast-1
  bucket: s3-bucket-name
  object_key_prefix: embulk/embulk-output-s3v2
  temp_path: /tmp
  extension: .csv
  formatter:
    type: csv
    delimeter: ","

Basic sample with Credentials-Profile authentication

out:
  type: s3v2
  region: ap-northeast-1
  bucket: s3-bucket-name
  object_key_prefix: embulk/embulk-output-s3v2
  temp_path: /tmp
  enable_profile: true
  profile: default
  extension: .csv
  formatter:
    type: csv
    delimeter: ","

Multipart Upload Sample with gzip encode

out:
  type: s3v2
  region: ap-northeast-1
  bucket: s3-bucket-name
  object_key_prefix: embulk/embulk-output-s3v2
  temp_path: /tmp
  enable_multi_part_upload: true
  multipart_chunksize: 10MB
  max_concurrent_requests: 20
  extension: csv.gz
  formatter:
    type: csv
    delimeter: ","
  encoders:
  - type: gzip
    level: 1

Usage

Build

$ ./gradlew gem