Zvec

Note

This gem is an experimental collaboration with Claude, exploring what it takes to wrap a C++ vector database for Ruby. It won't replace SQLite3 + sqlite-vec for lightweight use cases, and it certainly won't replace PostgreSQL + pgvector for production workloads. But for a narrow niche — embedding and searching a small number of short documents (2-3 paragraphs each) without standing up a database server — it's an interesting option to have in the toolbox.

Embedded vector search for Ruby

Key Features

Vector Indexes - HNSW, IVF, and Flat with cosine, L2, and inner-product metrics
Dense & Sparse Vectors - FP16, FP32, FP64, INT4, INT8 support
Scalar Fields - STRING, BOOL, INT32, INT64, FLOAT, DOUBLE, and array variants
Full CRUD - Insert, upsert, update, delete, and fetch by primary key
Filtered Queries - Combine vector similarity with scalar predicates
On-Disk Persistence - Flush and mmap support
Error Hierarchy - Structured exceptions mapping zvec status codes
Native Performance - C++ engine via Rice bindings
Full Documentation - Full Documentation Website

Ruby bindings for Alibaba's zvec C++ vector database library. Store, index, and query high-dimensional vectors alongside scalar metadata — all from Ruby. The native extension uses Rice (v4.11) for C++/Ruby interop and CMake for building. The zvec C++ source is included as a git submodule.

Caution

Without the Homebrew formula, gem install zvec compiles the entire zvec C++ dependency tree from source (~10 minutes). Pre-building with Homebrew reduces this to ~10 seconds. The source build may not work on all platforms.

Prerequisites

Ruby >= 3.2.0
CMake >= 3.26
A C++17 compiler (Clang or GCC)
ICU4C: brew install icu4c@78 (macOS)

Installation

Recommended: Pre-build with Homebrew (fast)

Pre-building the zvec C++ library avoids a lengthy source compilation during gem install:

brew tap madbomber/zvec https://github.com/MadBomber/zvec-ruby.git
brew install madbomber/zvec/zvec
gem install zvec          # ~10 seconds

Alternative: Build from source (slow)

Without Homebrew, the gem fetches and compiles the full C++ dependency tree (~10 minutes):

gem install zvec

Development (from source checkout)

git clone --recurse-submodules https://github.com/MadBomber/zvec-ruby.git
cd zvec-ruby
bundle install
cd ext && cmake --preset macos-release && cmake --build build/macos-release && cd ..

For debug builds use the macos-debug preset instead.

Quick Start

require "zvec"
require "tmpdir"

# Define a schema with a primary key, metadata, and a vector field
pk        = Zvec::FieldSchema.create("pk", Zvec::DataType::STRING)
title     = Zvec::FieldSchema.create("title", Zvec::DataType::STRING)
embedding = Zvec::FieldSchema.create("embedding", Zvec::DataType::VECTOR_FP32,
              dimension:    4,
              index_params: Zvec::HnswIndexParams.new(Zvec::MetricType::COSINE))

schema = Zvec::CollectionSchema.create("demo", [pk, title, embedding])

Dir.mktmpdir("zvec") do |dir|
  col = Zvec::Collection.create_and_open(File.join(dir, "demo"), schema)

  # Build and insert a document
  doc = Zvec::Doc.new
  doc.pk = "doc1"
  doc.set_field("pk",        Zvec::DataType::STRING,      "doc1")
  doc.set_field("title",     Zvec::DataType::STRING,      "Hello Zvec")
  doc.set_field("embedding", Zvec::DataType::VECTOR_FP32, [0.1, 0.2, 0.3, 0.4])
  col.insert([doc])
  col.flush

  # Query by vector similarity
  results = col.query_vector("embedding", [0.1, 0.2, 0.3, 0.4], top_k: 1)
  results.each do |d|
    h = d.to_h(col.schema)
    puts "#{h['title']}  (score: #{h['score']})"
  end
end

Usage

Schema Definition

Every collection requires a primary key field (STRING) and at least one vector field. Add scalar fields to store metadata alongside vectors.

pk    = Zvec::FieldSchema.create("pk", Zvec::DataType::STRING)
name  = Zvec::FieldSchema.create("name", Zvec::DataType::STRING)
year  = Zvec::FieldSchema.create("year", Zvec::DataType::INT32)
vec   = Zvec::FieldSchema.create("embedding", Zvec::DataType::VECTOR_FP32,
          dimension:    384,
          index_params: Zvec::HnswIndexParams.new(Zvec::MetricType::COSINE))

schema = Zvec::CollectionSchema.create("my_collection", [pk, name, year, vec])

Data Types

Constant	Description
`STRING`	UTF-8 string
`BOOL`	Boolean
`INT32`, `INT64`	Signed integers
`UINT32`, `UINT64`	Unsigned integers
`FLOAT`, `DOUBLE`	Floating point
`VECTOR_FP32`	Dense float32 vector
`VECTOR_FP16`, `VECTOR_FP64`	Dense float16/float64 vectors
`VECTOR_INT4`, `VECTOR_INT8`, `VECTOR_INT16`	Dense integer vectors
`SPARSE_VECTOR_FP16`, `SPARSE_VECTOR_FP32`	Sparse vectors
`ARRAY_*`	Array variants of scalar types

Index Types

# HNSW (default, best for most use cases)
Zvec::HnswIndexParams.new(Zvec::MetricType::COSINE, m: 50, ef_construction: 500)

# Flat (exact search, no approximation)
Zvec::FlatIndexParams.new(Zvec::MetricType::L2)

# IVF (inverted file index, good for large datasets)
Zvec::IVFIndexParams.new(Zvec::MetricType::IP, n_list: 1024)

# Invert (scalar field index for filtered queries)
Zvec::InvertIndexParams.new

Metric Types

Constant	Description
`COSINE`	Cosine distance (0 = identical)
`L2`	Euclidean distance
`IP`	Inner product

Collection Lifecycle

# Create a new collection on disk
col = Zvec::Collection.create_and_open("/path/to/collection", schema)

# Open an existing collection
col = Zvec::Collection.open("/path/to/collection")

# Block form — auto-flushes on exit
Zvec.open_collection("/path/to/collection") do |col|
  # work with col...
end

# Persist buffered writes to disk
col.flush

# Permanently delete from disk
col.destroy!

Collection Options

opts = Zvec::CollectionOptions.new
opts.read_only = true
opts.enable_mmap = true
opts.max_buffer_size = 4096

col = Zvec::Collection.open("/path/to/collection", options: opts)

Inserting Documents

doc = Zvec::Doc.new
doc.pk = "item1"
doc.set_field("pk",        Zvec::DataType::STRING,      "item1")
doc.set_field("title",     Zvec::DataType::STRING,      "Example")
doc.set_field("year",      Zvec::DataType::INT32,       2024)
doc.set_field("embedding", Zvec::DataType::VECTOR_FP32, [0.1, 0.2, 0.3, 0.4])

statuses = col.insert([doc])
statuses.each { |s| puts s }  # => "OK"

col.flush

The upsert and update methods work the same way:

col.upsert([doc])   # insert or replace
col.update([doc])   # update existing

Querying Vectors

Use query_vector for a convenient interface:

results = col.query_vector("embedding", [0.1, 0.2, 0.3, 0.4], top_k: 5)

results.each do |doc|
  h = doc.to_h(col.schema)
  puts "#{h['pk']}: #{h['title']} (score: #{h['score']})"
end

For more control, build a VectorQuery directly:

vq = Zvec::VectorQuery.new
vq.topk = 10
vq.field_name = "embedding"
vq.filter = "year > 2020"
vq.include_vector = false
vq.query_params = Zvec::HnswQueryParams.new(ef: 500)
vq.output_fields = ["title", "year"]

schema_field = col.schema.get_field("embedding")
vq.set_vector(schema_field, [0.1, 0.2, 0.3, 0.4])

results = col.query(vq)

Fetching by Primary Key

docs = col.fetch(["item1", "item2"])
docs.each do |pk, doc|
  h = doc.to_h(col.schema)
  puts "#{pk}: #{h['title']}"
end

Deleting Documents

statuses = col.delete(["item1", "item2"])
col.delete_by_filter("year < 2000")
col.flush

Error Handling

Zvec maps C++ status codes to a Ruby exception hierarchy under Zvec::Error:

begin
  col = Zvec::Collection.open("/nonexistent")
rescue Zvec::NotFoundError => e
  puts e.message
rescue Zvec::Error => e
  puts "Zvec error: #{e.message}"
end

Exception	Status Code
`Zvec::NotFoundError`	`NOT_FOUND`
`Zvec::AlreadyExistsError`	`ALREADY_EXISTS`
`Zvec::InvalidArgumentError`	`INVALID_ARGUMENT`
`Zvec::PermissionDeniedError`	`PERMISSION_DENIED`
`Zvec::FailedPreconditionError`	`FAILED_PRECONDITION`
`Zvec::ResourceExhaustedError`	`RESOURCE_EXHAUSTED`
`Zvec::UnavailableError`	`UNAVAILABLE`
`Zvec::InternalError`	`INTERNAL_ERROR`
`Zvec::NotSupportedError`	`NOT_SUPPORTED`

Examples

Working examples are in the examples/ directory:

01_basic_usage.rb — Schema definition, insert, query, fetch, delete, and block-form collection management with hand-crafted vectors.
02_semantic_search.rb — Semantic search over markdown documents using real 384-dim text embeddings from the informers gem (all-MiniLM-L6-v2 model).

Run them with:

bundle exec ruby examples/01_basic_usage.rb
bundle exec ruby examples/02_semantic_search.rb

Development

After checking out the repo, initialize the zvec submodule and install dependencies:

git submodule update --init --recursive
bundle install

Build the native extension:

cd ext && cmake --preset macos-debug && cmake --build build/macos-debug && cd ..

Run the tests:

bundle exec rake test

Launch an interactive console:

bin/console

License

The gem is available as open source under the terms of the MIT License.

zvec

Runtime