Project

zsv

0.0
No release in over 3 years
A drop-in replacement for Ruby's CSV stdlib that uses zsv (SIMD-accelerated C library) for 5-6x faster CSV parsing
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
 Dependencies

Development

~> 13.0
~> 3.0
 Project Readme

ZSV - SIMD-Accelerated CSV Parser for Ruby ⚑

A drop-in replacement for Ruby's CSV stdlib that uses the zsv C library for 5-6x performance improvements via SIMD optimizations.

πŸ€– Built with Claude Code

πŸ“š Documentation

✨ Features

  • Blazing Fast: 5-6x faster than Ruby's CSV stdlib thanks to SIMD optimizations
  • Memory Efficient: Streaming parser that doesn't load entire files into memory
  • API Compatible: Familiar interface matching Ruby's CSV class
  • Native Extension: Direct C integration for minimal overhead
  • Ruby 3.3+: Modern Ruby support with proper encoding handling

πŸ“¦ Installation

Add to your Gemfile:

gem 'zsv'

Or install directly:

gem install zsv

The gem will automatically download and compile zsv 1.3.0 during installation.

πŸš€ Usage

Basic Parsing

require 'zsv'

# Parse entire file
rows = ZSV.read("data.csv")
# => [["a", "b", "c"], ["1", "2", "3"]]

# Stream rows (memory efficient)
ZSV.foreach("large_file.csv") do |row|
  puts row.inspect
end

# Parse string
rows = ZSV.parse("a,b,c\n1,2,3\n")

Headers Mode

# Use first row as headers
ZSV.foreach("data.csv", headers: true) do |row|
  puts row["name"]  # Hash access
end

# Provide custom headers
ZSV.foreach("data.csv", headers: ["id", "name", "email"]) do |row|
  puts row["name"]
end

Parser Instance

# Create parser
parser = ZSV.open("data.csv", headers: true)

# Read rows one at a time
row = parser.shift
row = parser.shift

# Iterate all rows
parser.each do |row|
  puts row
end

# Rewind to beginning
parser.rewind

# Clean up
parser.close

# Or use block form (auto-closes)
ZSV.open("data.csv") do |parser|
  parser.each { |row| puts row }
end

Enumerable Methods

The parser includes Enumerable, so you can use map, select, find, etc.:

# Transform rows
names = ZSV.open("users.csv", headers: true) do |parser|
  parser.map { |row| row["name"].upcase }
end

# Filter rows
adults = ZSV.open("users.csv", headers: true) do |parser|
  parser.select { |row| row["age"].to_i >= 18 }
end

# Find first match
admin = ZSV.open("users.csv", headers: true) do |parser|
  parser.find { |row| row["role"] == "admin" }
end

Options

All parsing methods accept these options:

Option Type Default Description
headers Boolean/Array false Use first row as headers or provide custom headers
col_sep String "," Column delimiter (single character)
quote_char String "\"" Quote character (single character)
skip_lines Integer 0 Number of lines to skip at start
encoding Encoding UTF-8 Source encoding
liberal_parsing Boolean false Handle malformed CSV gracefully
buffer_size Integer 262144 Buffer size in bytes (256KB default)
# Tab-separated values
ZSV.foreach("data.tsv", col_sep: "\t") { |row| puts row }

# Pipe-separated values
ZSV.parse("a|b|c\n1|2|3", col_sep: "|")

# Skip header comment lines
ZSV.foreach("data.csv", skip_lines: 2) { |row| puts row }

⚑ Performance

Benchmarks comparing ZSV vs Ruby CSV stdlib (Ruby 3.4.7):

=== Small file (1K rows, 5 cols) ===
CSV (stdlib):   163.4 i/s
ZSV:          1,013.7 i/s - 6.20x faster

=== Medium file (10K rows, 10 cols) ===
CSV (stdlib):    10.3 i/s
ZSV:             54.5 i/s - 5.27x faster

=== Large file (100K rows, 10 cols) ===
CSV (stdlib):     1.1 i/s
ZSV:              5.3 i/s - 5.00x faster

=== With headers (10K rows) ===
CSV (stdlib):     7.8 i/s
ZSV:             33.8 i/s - 4.33x faster

Memory Usage

ZSV uses significantly less memory than Ruby's CSV stdlib:

=== Memory Usage (100K rows) ===
CSV stdlib: 56.8 MB
ZSV:         9.9 MB - 82.6% less memory

=== String Allocations (10K rows) ===
CSV stdlib: 116,144 strings
ZSV:         50,005 strings - 56.9% fewer allocations

ZSV achieves ~6x lower memory usage through frozen strings and efficient C-level memory management.

Run benchmarks yourself:

bundle exec rake bench
bundle exec ruby benchmark/memory_bench.rb

API Reference

Module Methods

ZSV.foreach(path, **options) { |row| }

Stream rows from a CSV file. Returns an Enumerator if no block given.

ZSV.parse(string, **options) -> Array

Parse CSV string and return all rows as an array.

ZSV.read(path, **options) -> Array

Read entire CSV file into an array.

ZSV.open(path, mode="r", **options) -> Parser

Open a CSV file and return a Parser instance. If a block is given, the parser is automatically closed after the block completes.

ZSV.new(io, **options) -> Parser

Create a Parser from any IO-like object.

Parser Instance Methods

#shift -> Array|Hash|nil

Read and return the next row. Returns nil at EOF.

#each { |row| } -> self

Iterate over all rows. Returns Enumerator without block.

#rewind -> nil

Reset parser to the beginning (file-based parsers only).

#close -> nil

Close parser and release resources.

#headers -> Array|nil

Return headers if header mode is enabled.

#closed? -> Boolean

Check if parser is closed.

#read -> Array

Read all remaining rows into an array.

Exception Classes

  • ZSV::Error - Base exception class
  • ZSV::MalformedCSVError - Raised on CSV parsing errors
  • ZSV::InvalidEncodingError - Raised on encoding issues

Architecture

The gem follows SOLID principles with clear separation of concerns:

ext/zsv/
β”œβ”€β”€ zsv_ext.c     # Main extension entry point, Ruby API
β”œβ”€β”€ parser.c/h    # Parser state management and zsv wrapper
β”œβ”€β”€ row.c/h       # Row building and conversion (arrays/hashes)
β”œβ”€β”€ options.c/h   # Option parsing and validation
└── common.h      # Shared types and macros

Design Principles

  1. Single Responsibility: Each C module handles one concern
  2. Streaming First: Never load entire files into memory
  3. Zero-Copy Where Possible: Minimize data copying
  4. Proper Resource Management: RAII-style cleanup with Ruby GC

πŸ› οΈ Development

# Clone and setup
git clone https://github.com/sebyx07/zsv-ruby.git
cd zsv-ruby
bundle install

# Compile extension
bundle exec rake compile

# Run tests
bundle exec rake spec

# Run benchmarks
bundle exec rake bench

# Clean build artifacts
bundle exec rake clean

Running Tests

bundle exec rspec

The test suite includes:

  • Basic parsing tests
  • Header mode tests
  • Custom delimiter tests
  • Error handling tests
  • Memory leak detection
  • API compatibility tests

Compatibility

  • Ruby: 3.3+ required
  • Platforms: Linux, macOS (ARM and x86)
  • ZSV: Compiles against zsv 1.3.0

🀝 Contributing

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Write tests for your changes
  4. Ensure tests pass (bundle exec rake spec)
  5. Commit your changes (git commit -am 'Add amazing feature')
  6. Push to the branch (git push origin feature/amazing-feature)
  7. Open a Pull Request

License

MIT License - see LICENSE file for details.

πŸ™ Credits

  • Built on zsv by liquidaty
  • Inspired by Ruby's CSV stdlib
  • SIMD optimizations courtesy of zsv's excellent engineering
  • Developed with Claude Code

πŸ—ΊοΈ Roadmap

Phase 1: Core Parser (Current)

  • Basic parsing (foreach, parse, read)
  • Header mode
  • Custom delimiters
  • File and string input

Phase 2: CSV Stdlib Compatibility

  • Type converters (:numeric, :date, :date_time)
  • Header converters (:downcase, :symbol)
  • unconverted_fields option

πŸ’¬ Support