Project

vfcsv

0.0
No release in over 3 years
SIMD-accelerated CSV parser - drop-in replacement for Ruby's CSV library. Uses NEON on ARM64 and AVX2 on x86_64 for 2-6x faster parsing. Full API compatibility with CSV::Row, CSV::Table, converters, and all options.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
 Dependencies

Development

~> 5.0
~> 13.0
~> 0.9

Runtime

~> 0.9
 Project Readme

VFCSV - Very Fast CSV

The only SIMD-accelerated CSV parser for Ruby with full stdlib API compatibility.

Gem Version

VFCSV is a drop-in replacement for Ruby's CSV library that delivers 2-6x faster parsing through SIMD acceleration (NEON on ARM64, AVX2 on x86_64), while maintaining 100% API compatibility with Ruby's CSV—including Row, Table, converters, and all standard options.

Why VFCSV?

Library Speed Drop-in? Row/Table Converters SIMD Dependencies
VFCSV 2-6x Pure Rust
zsv-ruby 5-6x Partial C (zsv)
OSV 8x Rust
FastCSV 1.5x C (Ragel)
FastestCSV 3x C
CSV (stdlib) 1x N/A None

VFCSV is the only library that combines SIMD acceleration with full API compatibility.

Installation

Add to your Gemfile:

gem 'vfcsv'

Or install directly:

gem install vfcsv

Requires Rust toolchain for compilation. Works on Ruby 3.0+ (optimized for Ruby 4.0).

Quick Start

# Just replace your require
require 'vfcsv'  # instead of require 'csv'

# Use exactly like Ruby's CSV
data = VFCSV.parse("name,age\nAlice,30\nBob,25", headers: true)
data[0]["name"]  # => "Alice"
data["age"]      # => ["30", "25"]

# All the same methods work
VFCSV.read("data.csv")
VFCSV.foreach("data.csv") { |row| puts row }
VFCSV.generate { |csv| csv << [1, 2, 3] }

Performance

Benchmarks on Apple M1 (Ruby 4.0, no YJIT):

Data Type CSV stdlib VFCSV Speedup
Simple CSV 40 MB/s 90 MB/s 2.2x
Quoted CSV 21 MB/s 120 MB/s 5.6x
With Headers 10.7 i/s 27.0 i/s 2.5x

SIMD excels at quote detection, making quoted CSV parsing significantly faster.

# Check your SIMD capabilities
VFCSV.simd_info
# => {neon: true, arch: "aarch64", backend: "vfcsv-simd"}

Full API Compatibility

Parsing

# Basic parsing
VFCSV.parse("a,b,c\n1,2,3")
# => [["a", "b", "c"], ["1", "2", "3"]]

# With headers (returns Table with Row objects)
table = VFCSV.parse("name,age\nAlice,30", headers: true)
table.class        # => VFCSV::Table
table[0].class     # => VFCSV::Row
table[0]["name"]   # => "Alice"
table[0][0]        # => "Alice"
table["name"]      # => ["Alice"]

# Parse single line
VFCSV.parse_line("a,b,c")  # => ["a", "b", "c"]

# File operations
VFCSV.read("file.csv")
VFCSV.foreach("file.csv") { |row| process(row) }
VFCSV.table("file.csv")  # Shortcut for read with headers

Converters

# Built-in converters
VFCSV.parse("a,b\n1,2.5", headers: true, converters: :numeric)
# => a: 1 (Integer), b: 2.5 (Float)

# Available: :integer, :float, :numeric, :date, :date_time, :all

# Custom converters
upcase = ->(val) { val.upcase rescue val }
VFCSV.parse("a\nhello", headers: true, converters: [upcase])
# => a: "HELLO"

Header Converters

# Downcase headers
VFCSV.parse("NAME,AGE\na,1", headers: true, header_converters: :downcase)
# headers: ["name", "age"]

# Symbol headers
VFCSV.parse("Name,Age\na,1", headers: true, header_converters: :symbol)
# headers: [:name, :age]
# Access: row[:name]

Row Class

Full CSV::Row compatibility:

row = table[0]
row.headers        # => ["name", "age"]
row.fields         # => ["Alice", "30"]
row["name"]        # => "Alice"
row[0]             # => "Alice"
row.to_h           # => {"name" => "Alice", "age" => "30"}
row.to_csv         # => "Alice,30\n"
row.header?("name") # => true
row.field?("Alice") # => true

# Mutation
row["city"] = "NYC"
row << ["country", "USA"]
row.delete("country")

Table Class

Full CSV::Table compatibility:

table.headers      # => ["name", "age"]
table.size         # => 2
table[0]           # => Row
table["name"]      # => ["Alice", "Bob"] (column)

# Access modes
table.by_col["name"]  # Column access
table.by_row[0]       # Row access

# Mutation
table << ["Carol", "35"]
table.delete(0)

# Output
table.to_csv       # Full CSV string with headers
table.to_a         # Array of arrays

Generation

# Generate CSV string
csv = VFCSV.generate do |out|
  out << ["name", "age"]
  out << ["Alice", 30]
end
# => "name,age\nAlice,30\n"

# Generate single line
VFCSV.generate_line([1, 2, 3])          # => "1,2,3\n"
VFCSV.generate_line([1, 2], col_sep: "|") # => "1|2\n"

# Force quotes
VFCSV.generate_line([1, 2], force_quotes: true)  # => "\"1\",\"2\"\n"

# Write to file
VFCSV.open("out.csv", "w") do |csv|
  csv << [1, 2, 3]
end

Options

All standard CSV options are supported:

VFCSV.parse(data,
  col_sep: ",",           # Column separator
  row_sep: :auto,         # Row separator (:auto, "\n", "\r\n")
  quote_char: '"',        # Quote character
  headers: false,         # First row as headers
  converters: nil,        # Value converters
  header_converters: nil, # Header converters
  skip_blanks: false,     # Skip empty rows
  skip_lines: nil,        # Regexp to skip lines
  force_quotes: false,    # Quote all fields on output
  liberal_parsing: false  # Lenient parsing
)

Architecture

VFCSV uses a two-stage SIMD-accelerated parsing approach inspired by simdjson:

  1. Stage 1: Structural Detection - SIMD instructions process 16 bytes at a time to identify commas, quotes, and newlines
  2. Stage 2: Field Extraction - Extract fields based on structural indices with optimized quote handling

The Rust core is wrapped with Magnus for zero-copy Ruby string handling.

┌───────────────────────────────────────────────┐
│                   Ruby API                    │
│   VFCSV.parse / Row / Table / Generator       │
├───────────────────────────────────────────────┤
│                  Magnus FFI                   │
├───────────────────────────────────────────────┤
│               Rust SIMD Parser                │
│  ┌─────────────┐    ┌───────────────────┐     │
│  │ NEON (ARM64)│    │ Portable Fallback │     │
│  └─────────────┘    └───────────────────┘     │
└───────────────────────────────────────────────┘

When to Use VFCSV

Use VFCSV when:

  • You need faster CSV parsing without changing your code
  • You're processing large CSV files
  • You need full CSV API compatibility (Row, Table, converters)
  • You want SIMD acceleration with zero C dependencies

Consider alternatives when:

  • You only need hash output (OSV might be faster)
  • You don't need Row/Table classes (zsv-ruby is comparable speed)
  • You need streaming for files larger than memory

Running Tests

bundle exec rake test    # Run all tests (136 tests)
bundle exec rake bench   # Run benchmarks

Contributing

Bug reports and pull requests welcome at https://github.com/khasinski/vfcsv.

License

MIT License. See LICENSE for details.

Acknowledgments