Auto-detecting CSV parser
A Ruby gem to read CSVs with auto-detection of encoding and column separator. Just let people provide a CSV and don't require them to think about format details if it can be figured out automatically (while providing a way to set it in case auto-detection fails).
gem install rchardet gem install acsv
or, when using Ruby on Rails, put this in your Gemfile
gem 'rchardet' gem 'acsv'
You can use this exactly as the regular CSV
module. Just make sure to load a character-detection library before you
require 'acsv'. Then
ACSV::CSV wherever you would have used
require 'rchardet' require 'acsv' ACSV::CSV.foreach("spec/files/test_02_semicolon_utf16.csv", headers: true) do |row| puts row # => '1234' end
When running this with Ruby's standard CSV, you'll see the error "invalid byte sequence in UTF-8".
Other methods like
open are also supported. When passing strings,
parse, only the separator is auto-detected.
Instead of rchardet, use can also use
Just load them before loading acsv. When multiple are loaded, the first one that
returns an encoding above the confidence level (see below) is used. You can also
specify which method to use by passing the
method option to one of the
ACSV::CSV methods. Possible values are
Available methods are also available from
Character encoding detection also returns a confidence level (between 0 and 1).
By default, each method has its own confidence level which matches its performance,
but you can override it by passing the
This gem also provides some lower-level methods for encoding and separator detection:
require 'rchardet' require 'acsv' data = File.read("spec/files/test_02_semicolon_iso8859.csv") encoding = ACSV::Detect.encoding(data) puts encoding # => 'ISO-8859-1' data.force_encoding(encoding) separator = ACSV::Detect.separator(data) puts separator # => ';'
Please see the documentation for
ACSV::Detect for more information.
Copyright © 2014 wvengen, released under GPLv3+ (see LICENSE.md for details).