No commit activity in last 3 years
No release in over 3 years
Unicode normalize string value. see http://site.icu-project.org/
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

~> 1.0
>= 10.0
 Project Readme

Icu4j filter plugin for Embulk

Unicode normalize string value.

Icu4j filter plugin for Embulk. see. http://site.icu-project.org/

Overview

  • Plugin type: filter

Configuration

  • key_names: target key names. (list, required)
  • keep_input: keep input columns. (bool, default: true)
  • settings: settings. (list, required)

Example normalize NFKC

filters:
  - type: icu4j
    key_names:
      - title
    settings:
      - { transliterators: 'Any-NFKC', case: upper }

Example

filters:
  - type: icu4j
    keep_input: false
    key_names:
      - catchcopy
    settings:
      - { suffix: _katakana, transliterators: 'Katakana-Hiragana,Fullwidth-Halfwidth', case: upper }
      - { transliterators: 'Katakana-Hiragana', case: lower }
      - { suffix: _romaji_lower, transliterators: 'Katakana-Hiragana,Hiragana-Latin', case: lower }

input

{
    "catchcopy" : "ホゲホゲ"
}

As below

{
    "catchcopy" : "ほげほげ",
    "catchcopy_katakana" : "ホゲホゲ",
    "catchcopy_romaji_lower" : "hogehoge"
}

transliterator rules

see. http://hondou.homedns.org/pukiwiki/pukiwiki.php?Java%20ICU4J

Build

$ ./gradlew gem  # -t to watch change of files and rebuild continuously