Vecsearch

Vecsearch is an all-in-one semantic search library for ruby that uses a 4-bit (Q4_1) quantization of gte-tiny by using bert.cpp (a GGML implementation of BERT via FFI), and an in-process FAISS index.

Vecsearch embeds pre-built dynamic libraries for libbert and libggml, as well as a quantized model checkpoint for gte-tiny (total size: 14MB).

Currently only ARM64 macOS is supported, purely because I haven't bothered to build other dylibs yet. There is nothing difficult about this.

Usage

gem 'vecsearch'

Vecsearch.new('hello', 'goodbye').nearest('howdy') #=> 'hello'

require 'vecsearch'

vs = Vecsearch.new
vs << "hello"
vs << "behold, a non-sequitur"
vs << "how's it goin'"

puts(vs.query("hey there", 2).inspect)
# ["hello", "how's it goin'"]

Bugs

Haha. Yes.

Performance

All of these are haphazardly measured on my 2021 M1 MacBook Pro.

Embedding a 1-token document: 1.2ms
Embedding a 512-token document: 72ms
Adding a document to the database: negligible
Querying an empty database: negligible
Querying a database with 1000 entries: negligible (plus time to embed query)
Querying a database with 10000 entries: 300μs (plus time to embed query)
Querying a database with 100000 entries: 3.2ms (plus time to embed query)

Limitations / TODO

Trying to embed a document over 512 tokens long segfaults.
I haven't got the mean-pooling part of gte-tiny working. It seems to work well enough without it but we should do that and assert that ours generates approximately the same embedding as the canonical model.
Batching looks unimplemented in bert.cpp; it would be nice for prefilling the index.
Add more builds for platforms other than darwin/amd64.
Probably add a way to fetch an unquantized model, maybe other models entirely?

vecsearch

Runtime

Vecsearch

Usage

Bugs

Performance

Limitations / TODO