Conformist
Bend CSVs to your will with declarative schemas. Map one or many columns, preprocess cells and lazily enumerate. Declarative schemas are easier to understand, quicker to setup and independent of I/O. Use CSV (Formally FasterCSV), Spreadsheet or any array of array-like data structure.
Quick and Dirty Examples
Open a CSV file and declare a schema. A schema compromises of columns. A column takes an arbitrary name followed by its position in the input. A column may be derived from multiple positions.
require 'conformist'
require 'csv'
csv = CSV.open '~/transmitters.csv'
schema = Conformist.new do
column :callsign, 1
column :latitude, 1, 2, 3
column :longitude, 3, 4, 5
column :name, 0 do |value|
value.upcase
end
endInsert the transmitters into a SQLite database.
require 'sqlite3'
db = SQLite3::Database.new 'transmitters.db'
schema.conform(csv).each do |transmitter|
db.execute "INSERT INTO transmitters (callsign, ...) VALUES ('#{transmitter.callsign}', ...);"
endOnly insert the transmitters with the name "Mount Cooth-tha" using ActiveRecord or DataMapper.
transmitters = schema.conform(csv).select do |transmitter|
transmitter.name == 'Mount Coot-tha'
end
transmitters.each do |transmitter|
Transmitter.create! transmitter.attributes
endSource from multiple, different input files and insert transmitters together into a single database.
require 'conformist'
require 'csv'
require 'sqlite3'
au_schema = Conformist.new do
column :callsign, 8
column :latitude, 10
end
us_schema = Conformist.new do
column :callsign, 1
column :latitude, 1, 2, 3
end
au_csv = CSV.open '~/au/transmitters.csv'
us_csv = CSV.open '~/us/transmitters.csv'
db = SQLite3::Database.new 'transmitters.db'
[au_schema.conform(au_csv), us_schema.conform(us_csv)].each do |schema|
schema.each do |transmitter|
db.execute "INSERT INTO transmitters (callsign, ...) VALUES ('#{transmitter.callsign}', ...);"
end
endOpen a Microsoft Excel spreadsheet and declare a schema.
require 'conformist'
require 'spreadsheet'
book = Spreadsheet.open '~/states.xls'
sheet = book.worksheet 0
schema = Conformist.new do
column :state, 0, 1 do |values|
"#{values.first}, #{values.last}"
end
column :capital, 2
endPrint each state's attributes to standard out.
schema.conform(sheet).each do |state|
$stdout.puts state.attributes
endFor more examples see test/fixtures, test/schemas and test/unit/integration_test.rb.
Installation
Conformist is available as a gem. Install it at the command line.
$ [sudo] gem install conformistOr add it to your Gemfile and run $ bundle install.
gem 'conformist'Usage
Anonymous Schema
Anonymous schemas are quick to declare and don't have the overhead of creating an explicit class.
citizen = Conformist.new do
column :name, 0, 1
column :email, 2
end
citizen.conform [['Tate', 'Johnson', 'tate@tatey.com']]Class Schema
Class schemas are explicit. Class schemas were the only type available in earlier versions of Conformist.
class Citizen
extend Conformist
column :name, 0, 1
column :email, 2
end
Citizen.conform [['Tate', 'Johnson', 'tate@tatey.com']]Implicit Indexing
Column indexes are implicitly incremented when the index argument is omitted. Implicit indexing is all or nothing.
column :account_number # => 0
column :date { |v| Time.new *v.split('/').reverse } # => 1
column :description # => 2
column :debit # => 3
column :credit # => 4Conform
Conform is the principle method for lazily applying a schema to the given input.
enumerator = schema.conform CSV.open('~/file.csv')
enumerator.each do |row|
puts row.attributes
endInput
#conform expects any object that responds to #each to return an array-like object.
CSV.open('~/file.csv').responds_to? :each # => true
[[], [], []].responds_to? :each # => trueHeader Row
#conform takes an option to skip the first row of input. Given a typical CSV document,
the first row is the header row and irrelevant for enumeration.
schema.conform CSV.open('~/file_with_headers.csv'), :skip_first => trueNamed Columns
Strings can be used as column indexes instead of integers. These strings will be matched against the first row to determine the appropriate numerical index.
citizen = Conformist.new do
column :email, 'EM'
column :name, 'FN', 'LN'
end
citizen.conform [['FN', 'LN', 'EM'], ['Tate', 'Johnson', 'tate@tatey.com']], :skip_first => trueEnumerator
#conform is lazy, returning an Enumerator. Input is not parsed until you call #each, #map or any method defined in Enumerable. That means schemas can be assigned now and evaluated later. #each has the lowest memory footprint because it does not build a collection.
Struct
The argument passed into the block is a struct-like object. You can access columns as methods or keys. Columns were only accessible as keys in earlier versions of Conformist. Methods are now the preferred syntax.
citizen[:name] # => "Tate Johnson"
citizen.name # => "Tate Johnson"For convenience the #attributes method returns a hash of key-value pairs suitable for creating ActiveRecord or DataMapper records.
citizen.attributes # => {:name => "Tate Johnson", :email => "tate@tatey.com"}One Column
Maps the first column in the input file to :first_name. Column indexing starts at zero.
column :first_name, 0Many Columns
Maps the first and second columns in the input file to :name.
column :name, 0, 1Indexing is completely arbitrary and you can map any combination.
column :name_and_city 0, 1, 2Many columns are implicitly concatenated. Behaviour can be changed by passing a block. See preprocessing.
Preprocessing
Sometimes values need to be manipulated before they're conformed. Passing a block gets access to values. The return value of the block becomes the conformed output.
column :name, 0, 1 do |values|
values.map(&:upcase) * ' '
endWorks with one column too. Instead of getting a collection of objects, one object is passed to the block.
column :first_name, 0 do |value|
value.upcase
endIt's also possible to provide a context object that is made available during preprocessing.
citizen = Conformist.new do
column :name, 0, 1 do |values, context|
(context[:upcase?] ? values.map(&:upcase) : values) * ' '
end
end
citizen.conform [['tate', 'johnson']], context: {upcase?: true}Virtual Columns
Virtual columns are not sourced from input. Omit the index to create a virtual column. Like real columns, virtual columns are included in the conformed output.
column :day do
1
endInheritance
Inheriting from a schema gives access to all of the parent schema's columns.
Anonymous Schema
Anonymous inheritance takes inspiration from Ruby's syntax for instantiating new classes.
parent = Conformist.new do
column :name, 0, 1
end
child = Conformist.new parent do
column :category do
'Child'
end
endClass Schema
Classical inheritance works as expected.
class Parent
extend Conformist
column :name, 0, 1
end
class Child < Parent
column :category do
'Child'
end
endUpgrading from <= 0.0.3 to >= 0.1.0
Where previously you had
class Citizen
include Conformist::Base
column :name, 0, 1
end
Citizen.load('~/file.csv').foreach do |citizen|
# ...
endYou should now do
require 'fastercsv'
class Citizen
extend Conformist
column :name, 0, 1
end
Citizen.conform(FasterCSV.open('~/file.csv')).each do |citizen|
# ...
endSee CHANGELOG.md for a full list of changes.
Compatibility
- MRI 2.4.0, 2.3.1, 2.2.0, 2.1.0, 2.0.0, 1.9.3
- JRuby
Dependencies
No explicit dependencies, although CSV and Spreadsheet are commonly used.
Contributing
- Fork
- Install dependancies by running
$ bundle install - Write tests and code
- Make sure the tests pass locally by running
$ bundle exec rake - Push to GitHub and make sure continuous integration tests pass at https://travis-ci.org/tatey/conformist/pull_requests
- Send a pull request on GitHub
Please do not increment the version number in lib/conformist/version.rb.
The version number will be incremented by the maintainer after the patch
is accepted.
Motivation
Motivation for this project came from the desire to simplify importing data from various government organisations into Antenna Mate. The data from each government was similar, but had completely different formatting. Some pieces of data needed preprocessing while others simply needed to be concatenated together. Not wanting to write a parser for each new government organisation, I created Conformist.
Copyright
Copyright © 2016 Tate Johnson. Conformist is released under the MIT license. See LICENSE for details.