Low commit activity in last 3 years
A long-lived project that still receives updates
Takes a stream to a spreadsheet file and produces an XML or CSV representation of its contents
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

~> 5.1
~> 13.0
~> 1.25

Runtime

 Project Readme

Simple Spreadsheet Extractor¶ ↑

Authors

Stuart Owen, Finn Bacall

Version

0.17.0

Contact

stuart.owen@manchester.ac.uk

Licence

BSD (See LICENCE or www.opensource.org/licenses/bsd-license.php)

Copyright

© 2010-2015 The University of Manchester, UK

<img src=“https://codeclimate.com/github/myGrid/simple-spreadsheet-extractor-gem/badges/gpa.svg” />

<img src=“https://travis-ci.org/myGrid/simple-spreadsheet-extractor-gem.svg?branch=master” alt=“Build Status” />

Synopsis¶ ↑

This is a simple gem that provides a facility to read an XLS or XLSX Excel spreadsheet document and produce an XML representation of its content.

CSV output can also be generated for a single sheet.

Internally it uses Apache POI, using the sister github.com/myGrid/simple-spreadsheet-extractor tool.

This is a simple tool developed for use within SysMO-DB.

Installation¶ ↑

Java 8 or above (JRE) is required.

gem install simple-spreadsheet-extractor

*Note that Windows is no longer supported* (since version 0.7.2.1)

Usage¶ ↑

  • require ‘simple-spreadsheet-extractor’

  • include the module SysMODB::SpreadsheetExtractor

  • pass an IO object to the method spreedsheet_to_xml which responds with the XML for the contents of the spreadsheet. Alternatively use spreadsheet_to_csv for CSV.

    • you can now also pass in the filepath to the Excel file instead of an IO object

  • if something goes wrong with the extraction then a SysMODB::SpreadsheetExtractionException will be thrown

  • by default the JVM is allocated 512M of memory, you can override this by passing a string as the last argument. This will be passed to -Xmx in the java command.

e.g.

#examples/example.rb - takes path, i.e. ruby example.rb /tmp/spreadsheet.xls
require 'rubygems'
require 'simple-spreadsheet-extractor'

include SysMODB::SpreadsheetExtractor

path=ARGV.first

f=open path
begin
  puts spreadsheet_to_xml(f)
rescue SysMODB::SpreadsheetExtractionException=>e
  puts "Something went wrong #{e.message}"
end

Formulas are evaluated placing the result in the XML produced for that cell, however the original formula is included as an attribute.

Row and column indexes start at 1, rather than 0, to keep consistent with namings of the cells in Excel.

An XSD schema for the XML is available in doc/schema-v1.xsd

The desired spreadsheet extractor jar can be specified by defining SPREADSHEET_EXTRACTOR_JAR_PATH in a config file (e.g. environment.rb)

CSV can be generated in a similar way, and also takes an optional sheet number. If the sheet number is missing then the first sheet is used.

Note that the sheet number for the first sheet is 1, and can either be a string or integer.

e.g.

puts spreadsheet_to_csv(f,"1")

Example XML¶ ↑

<?xml version="1.0" encoding="UTF-8"?>
<workbook xmlns="http://www.sysmo-db.org/2010/xml/spreadsheet">
  <named_ranges/>
  <styles>
    <style id="style21">
      <border-top>dotted 1pt</border-top>
      <border-bottom>dotted 1pt</border-bottom>
      <border-left>dotted 1pt</border-left>
      <border-right>dotted 1pt</border-right>
      <background-color>#ffcc99</background-color>
    </style>
    <style id="style22">
      <border-top>dotted 1pt</border-top>
      <border-bottom>dotted 1pt</border-bottom>
      <border-left>dotted 1pt</border-left>
      <border-right>dotted 1pt</border-right>
      <font-weight>bold</font-weight>
      <font-style>italics</font-style>
      <text-decoration>underline</text-decoration>
    </style>
    <style id="style23">
      <color>#2323dc</color>
    </style>
  </styles>
  <sheet name="Sheet1" index="1" hidden="false" very_hidden="false">
    <data_validations/>
    <columns first_column="1" last_column="3">
      <column index="1" column_alpha="A" width="2048"/>
      <column index="2" column_alpha="B" width="2048"/>
      <column index="3" column_alpha="C" width="2048"/>
    </columns>
    <rows first_row="1" last_row="6">
      <row index="1">
        <cell column="1" column_alpha="A" row="1" type="numeric">13.0</cell>
        <cell column="2" column_alpha="B" row="1" type="numeric">654153.0</cell>
        <cell column="27" column_alpha="AA" row="1" type="string">AA</cell>
        <cell column="28" column_alpha="AB" row="1" type="string">AB</cell>
        <cell column="53" column_alpha="BA" row="1" type="string">BA</cell>
        <cell column="54" column_alpha="BB" row="1" type="string">BB</cell>
        <cell column="55" column_alpha="BC" row="1" type="string">BC</cell>
      </row>
      <row index="2">
        <cell column="1" column_alpha="A" row="2" type="numeric" style="style21">547654.0</cell>
      </row>
      <row index="3">
        <cell column="1" column_alpha="A" row="3" type="numeric">45465.0</cell>
      </row>
      <row index="4" height="14.05pt">
        <cell column="1" column_alpha="A" row="4" type="numeric" style="style22" formula="A1+1">14.0</cell>
      </row>
      <row index="6">
        <cell column="1" column_alpha="A" row="6" type="datetime" style="style23" formula="DATE(2009,6,15)">2009-06-15T0:0:0+0100</cell>
      </row>
    </rows>
  </sheet>
  <sheet name="Sheet2" index="2" hidden="false" very_hidden="false">
    <data_validations/>
    <columns first_column="1" last_column="1">
      <column index="1" column_alpha="A" width="2048"/>
    </columns>
    <rows first_row="1" last_row="1"/>
  </sheet>
  <sheet name="Sheet3" index="3" hidden="false" very_hidden="false">
    <columns first_column="1" last_column="1">
      <column index="1" column_alpha="A" width="2048"/>
    </columns>
    <rows first_row="1" last_row="1"/>
  </sheet>
</workbook>