Orc output plugin for Embulk
Warning
Embulk into the maintenance mode | Embulk
Embulk got into the maintenance mode. Therefore, this plugin will also end its maintenance.
Overview
- Plugin type: output
- Load all or nothing: no
- Resume supported: no
- Cleanup supported: yes
Configuration
-
path_prefix: A prefix of output path. (string, required)
- support:
file,s3,s3nands3a.
- support:
-
file_ext: An extension of output file. (string, default:
.orc) -
sequence_format: (string, default:
.%03d) -
buffer_size: Set the ORC buffer size (integer, default:
262144(256KB)) -
strip_size: Set the ORC strip size (integer, default:
67108864(64MB)) -
block_size: Set the ORC block size (integer, default:
268435456(256MB)) -
compression_kind: description (string, default:
'ZLIB')-
NONE,ZLIB,SNAPPY,LZO,LZ4
-
-
overwrite: Overwrite if output files already exist. (boolean, default:
false)- Support:
LocalFileSystem,S3(s3, s3a, s3n)
- Support:
-
default_from_timezone Time zone of timestamp columns. This can be overwritten for each column using column_options (DateTimeZone, default:
UTC) -
auth_method: name of mechanism to authenticate requests (basic, env, instance, profile, properties, anonymous, or session. default: basic)
see: https://github.com/embulk/embulk-input-s3#configuration-
env,basic,profile,default,session,anonymous,properties
-
Example
out:
type: orc
path_prefix: "/tmp/output"
compression_kind: ZLIB
overwrite: trueChangeLog
ver 0.3.4
- Bump
orclibrary to1.5.4 - bugfix
ver 0.3.3
- bugfix
- Bump
orclibrary to1.4.4
ver 0.3.2
- Update
orclibraries to1.4.3
ver 0.3.0
-
Change default value : (block_size, buffer_size, strip_size)
- default value is Hive's default value.
(see: https://orc.apache.org/docs/hive-config.html)
- default value is Hive's default value.
ver 0.2.0
-
support: output to s3
-
s3n,s3aprotocol
-
ver 0.1.0
- initial release
Build
$ ./gradlew gem # -t to watch change of files and rebuild continuously