Zvec
Note
This gem is an experimental collaboration with Claude, exploring what it takes to wrap a C++ vector database for Ruby. It won't replace SQLite3 + sqlite-vec for lightweight use cases, and it certainly won't replace PostgreSQL + pgvector for production workloads. But for a narrow niche — embedding and searching a small number of short documents (2-3 paragraphs each) without standing up a database server — it's an interesting option to have in the toolbox.
![]() Embedded vector search for Ruby |
Key Features
|
Ruby bindings for Alibaba's zvec C++ vector database library. Store, index, and query high-dimensional vectors alongside scalar metadata — all from Ruby. The native extension uses Rice (v4.11) for C++/Ruby interop and CMake for building. The zvec C++ source is included as a git submodule.
Caution
Without the Homebrew formula, gem install zvec compiles the entire zvec C++ dependency tree from source (~10 minutes). Pre-building with Homebrew reduces this to ~10 seconds. The source build may not work on all platforms.
Prerequisites
- Ruby >= 3.2.0
- CMake >= 3.26
- A C++17 compiler (Clang or GCC)
- ICU4C:
brew install icu4c@78(macOS)
Installation
Recommended: Pre-build with Homebrew (fast)
Pre-building the zvec C++ library avoids a lengthy source compilation during gem install:
brew tap madbomber/zvec https://github.com/MadBomber/zvec-ruby.git
brew install madbomber/zvec/zvec
gem install zvec # ~10 secondsAlternative: Build from source (slow)
Without Homebrew, the gem fetches and compiles the full C++ dependency tree (~10 minutes):
gem install zvecDevelopment (from source checkout)
git clone --recurse-submodules https://github.com/MadBomber/zvec-ruby.git
cd zvec-ruby
bundle install
cd ext && cmake --preset macos-release && cmake --build build/macos-release && cd ..For debug builds use the macos-debug preset instead.
Quick Start
require "zvec"
require "tmpdir"
# Define a schema with a primary key, metadata, and a vector field
pk = Zvec::FieldSchema.create("pk", Zvec::DataType::STRING)
title = Zvec::FieldSchema.create("title", Zvec::DataType::STRING)
embedding = Zvec::FieldSchema.create("embedding", Zvec::DataType::VECTOR_FP32,
dimension: 4,
index_params: Zvec::HnswIndexParams.new(Zvec::MetricType::COSINE))
schema = Zvec::CollectionSchema.create("demo", [pk, title, embedding])
Dir.mktmpdir("zvec") do |dir|
col = Zvec::Collection.create_and_open(File.join(dir, "demo"), schema)
# Build and insert a document
doc = Zvec::Doc.new
doc.pk = "doc1"
doc.set_field("pk", Zvec::DataType::STRING, "doc1")
doc.set_field("title", Zvec::DataType::STRING, "Hello Zvec")
doc.set_field("embedding", Zvec::DataType::VECTOR_FP32, [0.1, 0.2, 0.3, 0.4])
col.insert([doc])
col.flush
# Query by vector similarity
results = col.query_vector("embedding", [0.1, 0.2, 0.3, 0.4], top_k: 1)
results.each do |d|
h = d.to_h(col.schema)
puts "#{h['title']} (score: #{h['score']})"
end
endUsage
Schema Definition
Every collection requires a primary key field (STRING) and at least one vector field. Add scalar fields to store metadata alongside vectors.
pk = Zvec::FieldSchema.create("pk", Zvec::DataType::STRING)
name = Zvec::FieldSchema.create("name", Zvec::DataType::STRING)
year = Zvec::FieldSchema.create("year", Zvec::DataType::INT32)
vec = Zvec::FieldSchema.create("embedding", Zvec::DataType::VECTOR_FP32,
dimension: 384,
index_params: Zvec::HnswIndexParams.new(Zvec::MetricType::COSINE))
schema = Zvec::CollectionSchema.create("my_collection", [pk, name, year, vec])Data Types
| Constant | Description |
|---|---|
STRING |
UTF-8 string |
BOOL |
Boolean |
INT32, INT64
|
Signed integers |
UINT32, UINT64
|
Unsigned integers |
FLOAT, DOUBLE
|
Floating point |
VECTOR_FP32 |
Dense float32 vector |
VECTOR_FP16, VECTOR_FP64
|
Dense float16/float64 vectors |
VECTOR_INT4, VECTOR_INT8, VECTOR_INT16
|
Dense integer vectors |
SPARSE_VECTOR_FP16, SPARSE_VECTOR_FP32
|
Sparse vectors |
ARRAY_* |
Array variants of scalar types |
Index Types
# HNSW (default, best for most use cases)
Zvec::HnswIndexParams.new(Zvec::MetricType::COSINE, m: 50, ef_construction: 500)
# Flat (exact search, no approximation)
Zvec::FlatIndexParams.new(Zvec::MetricType::L2)
# IVF (inverted file index, good for large datasets)
Zvec::IVFIndexParams.new(Zvec::MetricType::IP, n_list: 1024)
# Invert (scalar field index for filtered queries)
Zvec::InvertIndexParams.newMetric Types
| Constant | Description |
|---|---|
COSINE |
Cosine distance (0 = identical) |
L2 |
Euclidean distance |
IP |
Inner product |
Collection Lifecycle
# Create a new collection on disk
col = Zvec::Collection.create_and_open("/path/to/collection", schema)
# Open an existing collection
col = Zvec::Collection.open("/path/to/collection")
# Block form — auto-flushes on exit
Zvec.open_collection("/path/to/collection") do |col|
# work with col...
end
# Persist buffered writes to disk
col.flush
# Permanently delete from disk
col.destroy!Collection Options
opts = Zvec::CollectionOptions.new
opts.read_only = true
opts.enable_mmap = true
opts.max_buffer_size = 4096
col = Zvec::Collection.open("/path/to/collection", options: opts)Inserting Documents
doc = Zvec::Doc.new
doc.pk = "item1"
doc.set_field("pk", Zvec::DataType::STRING, "item1")
doc.set_field("title", Zvec::DataType::STRING, "Example")
doc.set_field("year", Zvec::DataType::INT32, 2024)
doc.set_field("embedding", Zvec::DataType::VECTOR_FP32, [0.1, 0.2, 0.3, 0.4])
statuses = col.insert([doc])
statuses.each { |s| puts s } # => "OK"
col.flushThe upsert and update methods work the same way:
col.upsert([doc]) # insert or replace
col.update([doc]) # update existingQuerying Vectors
Use query_vector for a convenient interface:
results = col.query_vector("embedding", [0.1, 0.2, 0.3, 0.4], top_k: 5)
results.each do |doc|
h = doc.to_h(col.schema)
puts "#{h['pk']}: #{h['title']} (score: #{h['score']})"
endFor more control, build a VectorQuery directly:
vq = Zvec::VectorQuery.new
vq.topk = 10
vq.field_name = "embedding"
vq.filter = "year > 2020"
vq.include_vector = false
vq.query_params = Zvec::HnswQueryParams.new(ef: 500)
vq.output_fields = ["title", "year"]
schema_field = col.schema.get_field("embedding")
vq.set_vector(schema_field, [0.1, 0.2, 0.3, 0.4])
results = col.query(vq)Fetching by Primary Key
docs = col.fetch(["item1", "item2"])
docs.each do |pk, doc|
h = doc.to_h(col.schema)
puts "#{pk}: #{h['title']}"
endDeleting Documents
statuses = col.delete(["item1", "item2"])
col.delete_by_filter("year < 2000")
col.flushError Handling
Zvec maps C++ status codes to a Ruby exception hierarchy under Zvec::Error:
begin
col = Zvec::Collection.open("/nonexistent")
rescue Zvec::NotFoundError => e
puts e.message
rescue Zvec::Error => e
puts "Zvec error: #{e.message}"
end| Exception | Status Code |
|---|---|
Zvec::NotFoundError |
NOT_FOUND |
Zvec::AlreadyExistsError |
ALREADY_EXISTS |
Zvec::InvalidArgumentError |
INVALID_ARGUMENT |
Zvec::PermissionDeniedError |
PERMISSION_DENIED |
Zvec::FailedPreconditionError |
FAILED_PRECONDITION |
Zvec::ResourceExhaustedError |
RESOURCE_EXHAUSTED |
Zvec::UnavailableError |
UNAVAILABLE |
Zvec::InternalError |
INTERNAL_ERROR |
Zvec::NotSupportedError |
NOT_SUPPORTED |
Examples
Working examples are in the examples/ directory:
- 01_basic_usage.rb — Schema definition, insert, query, fetch, delete, and block-form collection management with hand-crafted vectors.
-
02_semantic_search.rb — Semantic search over markdown documents using real 384-dim text embeddings from the
informersgem (all-MiniLM-L6-v2 model).
Run them with:
bundle exec ruby examples/01_basic_usage.rb
bundle exec ruby examples/02_semantic_search.rbDevelopment
After checking out the repo, initialize the zvec submodule and install dependencies:
git submodule update --init --recursive
bundle installBuild the native extension:
cd ext && cmake --preset macos-debug && cmake --build build/macos-debug && cd ..Run the tests:
bundle exec rake testLaunch an interactive console:
bin/consoleLicense
The gem is available as open source under the terms of the MIT License.
