roo-smarter_csv
roo-smarter_csv replaces Roo's CSV backend with SmarterCSV while keeping the Roo spreadsheet API.
What it does
- Uses SmarterCSV for parsing CSV input
- Uses SmarterCSV defaults unless overridden by Roo compatibility behavior or explicit options
SmarterCSV Benefits
- SmarterCSV is 3-4.6x faster than Roo::CSV
- SmarterCSV automatically detects
col_sep,row_sep - SmarterCSV is more robust against real-world data
- See Ruby CSV Pitfalls for examples of silent data loss and corruption cases in Ruby CSV
- See Migrating from Ruby CSV for behavior differences and migration guidance
- See SmarterCSV 1.15.2: Faster Than Raw CSV Arrays for benchmark background
Performance
Speedup vs Roo::CSV with SmarterCSV 1.17.1
| File | Speedup |
|---|---|
| PEOPLE_IMPORT_B.csv | 2.98x |
| uscities.csv | 4.22x |
| uszips.csv | 4.45x |
| worldcities.csv | 4.58x |
| embedded_newlines_60k.csv | 3.84x |
| heavy_quoting_60k.csv | 3.42x |
| many_empty_fields_60k.csv | 3.36x |
| sample_100k.csv | 3.17x |
| sensor_data_50krows_50cols.csv | 3.23x |
| tab_separated_60k.tsv | 3.14x |
| utf8_multibyte_60k.csv | 3.17x |
Roo API
- Keeps Roo's spreadsheet-style API:
cellcelltyperowcolumneachparse-
first_row/last_row -
first_column/last_column
- Preserves Roo's single-sheet CSV behavior
- Supports Roo's
Roo::Spreadsheet.open(...)entry point - Supports CSV export through Roo's existing
to_csv
Installation
Add to your Gemfile:
gem "roo-smarter_csv"Then run:
bundle installActivation
require "roo-smarter_csv"
spreadsheet = Roo::Spreadsheet.open("data.csv")require "roo-smarter_csv" automatically loads both roo and smarter_csv and registers Roo::SmarterCSV as Roo's CSV handler.
Supported behavior
roo-smarter_csv reads the full CSV input and exposes it through Roo's spreadsheet abstraction.
It supports:
- local files
-
StringIO/ stream input - Roo's
Roo::Spreadsheet.open(...) - CSV files with a UTF-8 BOM
- tab-delimited input via
col_sep: "\t" - SmarterCSV type conversion
- warnings emitted by SmarterCSV
- Roo's
to_csvexport for the in-memory spreadsheet representation
Architecture note
SmarterCSV is used as the parser, but Roo remains the public model.
That means:
- SmarterCSV row hashes are an internal parsing representation
- Roo still stores data in its coordinate-based cell grid
- Roo's public API remains spreadsheet-like
- hash-based rows are only an intermediate step for parser-to-grid conversion
Options
- SmarterCSV options are handled as nested options, e.g.
options = { smarter_csv: {} } -
roo-smarter_csvdefaults the SmarterCSV optionremove_empty_hashestofalse, so that it is compatible with Roo. -
roo-smarter_csvhonors some of thecsv_optionsfrom Roo, but we encourage that you pass those undersmarter_csvoptions.
Option precedence
roo-smarter_csv understands two option namespaces:
1. SmarterCSV options
Primary namespace:
smarter_csv: {
col_sep: ";",
row_sep: "\n",
quote_char: '"',
encoding: "utf-8"
}2. Roo compatibility options
Roo already uses:
csv_options: {
col_sep: ";",
row_sep: "\n",
quote_char: '"',
encoding: "utf-8"
}Only these four keys are copied from csv_options into the effective SmarterCSV options:
col_seprow_sepquote_charencoding
Precedence rules
- Start with SmarterCSV defaults.
- Apply
roo-smarter_csvcompatibility overrides. - Copy supported keys from
csv_optionsinto the SmarterCSV options. - Apply
smarter_csvon top. - If the same key exists in both places,
smarter_csvwins. - Conflicts emit a warning.
Only the following Roo-compatible CSV keys are bridged from csv_options:
col_seprow_sepquote_charencoding
No other Roo options are treated as CSV parser settings.
Examples
Only Roo options
Roo::Spreadsheet.open("data.tsv", csv_options: { col_sep: "\t" })Only SmarterCSV options
Roo::Spreadsheet.open("data.csv", smarter_csv: { col_sep: ";" })Both, with conflict
Roo::Spreadsheet.open(
"data.csv",
csv_options: { col_sep: ";" },
smarter_csv: { col_sep: "\t" }
)In this case, smarter_csv[:col_sep] wins and a warning is emitted.
SmarterCSV defaults
When you do not pass any options, roo-smarter_csv starts from SmarterCSV defaults and then applies one compatibility override for Roo:
remove_empty_hashes: false
That override is intentional. Roo expects blank rows to remain addressable in the spreadsheet model, so roo-smarter_csv disables SmarterCSV's default behavior of dropping fully empty row hashes.
Some important effective defaults are therefore:
-
col_sep: :auto— auto-detects the separator -
row_sep: :auto— auto-detects line endings quote_char: '"'downcase_header: truestrings_as_keys: falseconvert_values_to_numeric: true-
remove_empty_hashes: false—roo-smarter_csvsets this for Roo compatibility so blank rows remain addressable through the spreadsheet API. headers_in_file: true
This means common CSV files work without extra configuration, and SmarterCSV can infer separators and convert numeric values automatically while still preserving Roo-compatible blank rows.
Default behavior examples
Auto-detected separator
spreadsheet = Roo::Spreadsheet.open("data.csv")No col_sep is needed for normal comma-separated CSV files.
Automatic numeric conversion
spreadsheet.cell(2, 2) # => 30
spreadsheet.cell(2, 4) # => 1.5Headers and keys
SmarterCSV downcases headers by default and returns symbol keys:
SmarterCSV.process(StringIO.new("Name,Email\nJohn,john@example.com\n")).first
# => { name: "John", email: "john@example.com" }If you want string keys instead, SmarterCSV supports:
SmarterCSV.process(
StringIO.new("Name,Email\nJohn,john@example.com\n"),
strings_as_keys: true
).first
# => { "name" => "John", "email" => "john@example.com" }In roo-smarter_csv, those row hashes are used internally to populate Roo's spreadsheet grid. The public Roo methods still behave like spreadsheet methods.
Examples
Basic Roo usage
require "roo"
require "roo-smarter_csv"
csv = Roo::Spreadsheet.open("people.csv")
csv.cell(2, 1) # => "John"
csv.cell(2, 2) # => 30
csv.row(2) # => ["John", 30, "john@example.com", 50000]
csv.first_row # => 1
csv.last_row # => 4TSV example
csv = Roo::Spreadsheet.open(
"people.tsv",
extension: :csv,
csv_options: { col_sep: "\t" }
)Explicit SmarterCSV options
csv = Roo::Spreadsheet.open(
"data.csv",
smarter_csv: {
col_sep: ";",
quote_char: '"'
}
)Development
bundle install
bundle exec rspecReporting Bugs / Feature Requests
Please open an Issue on GitHub if you have feedback, new feature requests, or want to report a bug. Thank you!
For reporting issues, please:
- include a small sample CSV file
- open a pull-request adding a test that demonstrates the issue
- mention your version of SmarterCSV, Ruby, Rails
Contributing
- Fork it
- Create your feature branch (
git checkout -b my-new-feature) - Commit your changes (
git commit -am 'Added some feature') - Push to the branch (
git push origin my-new-feature) - Create new Pull Request
License
MIT