No commit activity in last 3 years
No release in over 3 years
XML parser plugin is Embulk plugin to fetch entries in xml format.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

~> 1.0
>= 0.9.15
~> 10.0

Runtime

~> 1.10.1
 Project Readme

XML parser plugin for Embulk

Parser plugin for Embulk.

Read data from input as xml and fetch each entries to output.

Overview

  • Plugin type: parser
  • Load all or nothing: yes
  • Resume supported: no

Types

  • xml: Find rows by SAX.
  • xpath: Find finds rows by Xpath, so you can process XML by more complex condition than xml type.

Configuration

XML

parser:
  type: xml
  root: data/students/student
  schema:
    - {name: name, type: string}
    - {name: age, type: long}
  • type: specify this plugin as xml .
  • root: root property to start fetching each entries, specify in path/to/node style, required.
  • schema: specify the attribute of table and data type, required.

If you need to parse column as timestamp type, schema supports 2 optional parameters:

schema:
  - {name: timestamp_column, type: timestamp, format: "%Y-%m-%d", timezone: "+0000"}
  • format: timestamp format to parse, required.
  • timezone: timestamp will be parsing in this timezone, "+0900" is used by default.

Xpath

parser:
  type: xpath
  root: //data/students/student
  schema:
    - {path: name, type: string, name: name}
    - {path: age, type: long, name: age}
    - {path: hobbies/hobby, type: json, name: hobbies}
  • type: specify this plugin as xpath .
  • root: root property to start fetching each entries, specify in Xpath, '/'' is used by default.
  • schema: specify the attribute of table and data type, required.
  • namespaces: xml namespaces

If you need to parse column as timestamp type, schema supports 2 optional parameters:

schema:
  - {name: timestamp_column, type: timestamp, format: "%Y-%m-%d", timezone: "+0000"}
  • format: timestamp format to parse, required.
  • timezone: timestamp will be parsing in this timezone, "+0900" is used by default.

Here is XML for xample:

<data>
  <result>true</result>
  <students>
    <student>
      <name>John</name>
      <age>10</age>
      <hobbies>
        <hobby>music</hobby>
        <hobby>movie</hobby>
      </hobbies>
    </student>
    <student>
      <name>Paul</name>
      <age>16</age>
      <hobbies>
        <hobby>game</hobby>
      </hobbies>
    </student>
    <student>
      <name>George</name>
      <age>17</age>
    </student>
    <student>
      <name>Ringo</name>
      <age>18</age>
    </student>
  </students>
</data>