Low commit activity in last 3 years
There's a lot of open issues
A long-lived project that still receives updates
A gem to import CSV files' content into a PostgreSQL or MySQL database. It is respectively based on PostgreSQL COPY and MySQL LOAD DATA INFILE which are designed to be as fast as possible.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

Runtime

 Project Readme

Gem Version Tests status Codacy Badge

CSV Fast Importer

A gem to import CSV files' content into a PostgreSQL or MySQL database. It is respectively based on PostgreSQL COPY and MySQL LOAD DATA INFILE which are designed to be as fast as possible.

Why?

CSV importation is a common task which can be done by more than 6 different gems, but none of them is able to import 1 million of lines in a few seconds (see benchmark below), hence the creation of this gem.

Here is an indicative benchmark to compare available solutions. It represents the duration (ms) to import a 10 000 lines csv file into a local PostgreSQL instance on a laptop running OSX (lower is better):

Benchmark

Like all benchmarks, some tuning can produce different results, yet this chart gives a big picture. See benchmark details.

Requirements

  • Rails (ActiveRecord in fact)
  • PostgreSQL or MySQL

Limitations

  • Usual ActiveRecord process (validations, callbacks, computed fields like created_at...) is bypassed. This is the price for performance
  • Custom enclosing field (ex: ") is not supported yet
  • Custom line separator (ex: \r\n for windows file) is not supported yet
  • MySQL: encoding is not supported yet
  • MySQL: transaction is not supported yet
  • MySQL: row_index is not supported yet
  • MySQL: database must have access to file to import

Note about custom line separator: it might work by opening the file with the universal_newline argument (e.g. file = File.new(path, universal_newline: true)). Unfortunately, we weren't able to reproduce and test it so we don't support it "officialy". You can find more information in this ticket (in French).

Installation

Add the dependency to your Gemfile:

gem 'csv_fast_importer'

Run bundle install.

You can install the gem by yourself too:

$ gem install csv_fast_importer

For MySQL ⚠️ enable local_infile for both client and server. In Rails application, juste add local_infile: true to your database config file databse.yml to configure the database client. See Security Issues with LOAD DATA LOCAL for more details.

Usage

Actually, CSV Fast Importer needs active_record to work. Setup your database configuration as in a usual Rails project. Then, use the CsvFastImporter class:

require 'csv_fast_importer'

file = File.new '/path/to/knights.csv'
imported_lines_count = CsvFastImporter.import(file)

puts imported_lines_count

Under the hood, CSV Fast Importer deletes data from the knights table and imports those from knights.csv by mapping columns' names to table's fields. Note: mapping is case insensitive so database fields' names must be lowercase. For instance, a FIRSTNAME CSV column will be mapped to the firstname field.

Options

Option key  Purpose  Default value
encoding File encoding. PostgreSQL only (see FAQ for more details) 'UTF-8'
col_sep Column separator in file ';'
destination Destination table given base filename (without extension)
mapping Column mapping {}
row_index_column Column name where inserting file row index (not used when nil). PostgreSQL only nil
transaction Execute DELETE and INSERT in same transaction. PostgreSQL only :enabled
deletion Row deletion method (:delete for SQL DELETE, :truncate for SQL TRUNCATE or :none for no deletion before import) :delete

If your CSV file is not encoded with same table than your database, you can specify encoding at the file opening (see FAQ for more details):

file = File.new '/path/to/knights.csv', encoding: 'ISO-8859-1'

You can specify a different separator column with the col_sep option (; by default):

CsvFastImporter.import file, col_sep: '|'

By default, CSV Fast Importer computes the database table's name by taking the basename of the imported file. For instance, considering the imported file /path/to/knights.csv, the table's name will be knights. To bypass this default behaviour, specify the destination option:

file = File.new '/path/to/clients.csv'
CsvFastImporter.import file, destination: 'knights'

Finally, you can precise a custom mapping between CSV file's columns and database fields with the mapping option.

Considering the following knights.csv file:

NAME;KNIGHT_EMAIL
Perceval;perceval@logre.cel
Lancelot;lancelot@logre.cel

To map the KNIGHT_EMAIL column to the email database field:

CsvFastImporter.import file, mapping: { knight_email: :email }

Need help?

See FAQ.

How to contribute?

You can fork and submit new pull request (with tests and explanations). First of all, you need to initialize your environment :

$ brew install postgresql # in macOS
$ apt-get install libpq-dev # in Linux
$ bundle install

Then, start your PostgreSQL database (ex: Postgres.app for the Mac) and setup database environment:

$ bundle exec rake test:db:create

This will connect to localhost PostgreSQL database without user (see config/database.postgres.yml) and create a new database dedicated to tests.

Warning: database instance have to allow database creation with UTF-8 encoding.

Finally, you can run all tests with RSpec like this:

$ bundle exec rspec

By default, PostgreSQL is used. You can set another database with environment variables like this for MySQL:

$ DB_TYPE=mysql DB_ROOT_PASSWORD=password DB_USERNAME=username bundle exec rake test:db:create
$ DB_TYPE=mysql DB_USERNAME=username bundle exec rspec

This will connect to mysql with root user (with password as password) and create database for user username. Use DB_TYPE=mysql DB_USERNAME= (with empty username) for anonymous account.

Warning: Mysql tests require your local database permits LOCAL works. Check your Mysql instance with following command: SHOW GLOBAL VARIABLES LIKE 'local_infile' (should be ON).

Versioning

master is the development branch and releases are published as tags.

We follow the Semantic Versioning 2.0.0 for our gem releases.

In few words:

Given a version number MAJOR.MINOR.PATCH, increment the:

  1. MAJOR version when you make incompatible API changes,
  2. MINOR version when you add functionality in a backwards-compatible manner, and
  3. PATCH version when you make backwards-compatible bug fixes.

Backlog (unordered)

  • Support any column and table case
  • Support custom enclosing field (ex: ")
  • Support custom line serparator (ex: \r\n for windows file)
  • Support custom type convertion
  • MySQL: support encoding parameter. See https://dev.mysql.com/doc/refman/5.5/en/charset-charsets.html
  • MySQL: support transaction parameter
  • MySQL: support row_index_column parameter
  • MySQL: run multiple SQL queries in single statement
  • Refactor tests (with should-> must / should -> expect / subject...)
  • Reduce technical debt on db connection (test & benchmark)
  • SQLite support
  • Add link to activerecord-copy
  • Ease local tests on multiple databases with testcontainers
  • Accept csv header which contains column separator

How to release new version?

Setup rubygems.org account:

curl -u {your_gem_account_name} https://rubygems.org/api/v1/api_key.yaml > ~/.gem/credentials
chmod 0600 ~/.gem/credentials

Make sure you are in master branch and run:

bundle exec rake "release:make[major|minor|patch|x.y.z]"

Example: bundle exec rake "release:make[minor]"

Then, follow instructions.