Neighbor
Nearest neighbor search for Rails
Supports:
- Postgres (cube and pgvector)
- MariaDB 11.8
- MySQL 9 (searching requires HeatWave) - experimental
- SQLite (sqlite-vec) - experimental
Also available for Redis and S3 Vectors
Installation
Add this line to your application’s Gemfile:
gem "neighbor"For Postgres
Neighbor supports two extensions: cube and pgvector. cube ships with Postgres, while pgvector supports more dimensions and approximate nearest neighbor search.
For cube, run:
rails generate neighbor:cube
rails db:migrateFor pgvector, install the extension and run:
rails generate neighbor:vector
rails db:migrateFor SQLite
Add this line to your application’s Gemfile:
gem "sqlite-vec"And run:
rails generate neighbor:sqliteGetting Started
Create a migration
class AddEmbeddingToItems < ActiveRecord::Migration[8.1]
def change
# cube
add_column :items, :embedding, :cube
# pgvector, MariaDB, and MySQL
add_column :items, :embedding, :vector, limit: 3 # dimensions
# sqlite-vec
add_column :items, :embedding, :binary
end
endAdd to your model
class Item < ApplicationRecord
has_neighbors :embedding
endUpdate the vectors
item.update(embedding: [1.0, 1.2, 0.5])Get the nearest neighbors to a record
item.nearest_neighbors(:embedding, distance: "euclidean").first(5)Get the nearest neighbors to a vector
Item.nearest_neighbors(:embedding, [0.9, 1.3, 1.1], distance: "euclidean").first(5)Records returned from nearest_neighbors will have a neighbor_distance attribute
nearest_item = item.nearest_neighbors(:embedding, distance: "euclidean").first
nearest_item.neighbor_distanceSee the additional docs for:
- cube
- pgvector
- MariaDB
- MySQL
- sqlite-vec
Or check out some examples
cube
Distance
Supported values are:
euclideancosinetaxicabchebyshev
For cosine distance with cube, vectors must be normalized before being stored.
class Item < ApplicationRecord
has_neighbors :embedding, normalize: true
endFor inner product with cube, see this example.
Dimensions
The cube type can have up to 100 dimensions by default. See the Postgres docs for how to increase this.
For cube, it’s a good idea to specify the number of dimensions to ensure all records have the same number.
class Item < ApplicationRecord
has_neighbors :embedding, dimensions: 3
endpgvector
Distance
Supported values are:
euclideaninner_productcosinetaxicabhammingjaccard
Dimensions
The vector type can have up to 16,000 dimensions, and vectors with up to 2,000 dimensions can be indexed.
The halfvec type can have up to 16,000 dimensions, and half vectors with up to 4,000 dimensions can be indexed.
The bit type can have up to 83 million dimensions, and bit vectors with up to 64,000 dimensions can be indexed.
The sparsevec type can have up to 16,000 non-zero elements, and sparse vectors with up to 1,000 non-zero elements can be indexed.
Indexing
Add an approximate index to speed up queries. Create a migration with:
class AddIndexToItemsEmbedding < ActiveRecord::Migration[8.1]
def change
add_index :items, :embedding, using: :hnsw, opclass: :vector_l2_ops
# or
add_index :items, :embedding, using: :ivfflat, opclass: :vector_l2_ops
end
endUse :vector_cosine_ops for cosine distance and :vector_ip_ops for inner product.
Set the size of the dynamic candidate list with HNSW
Item.connection.execute("SET hnsw.ef_search = 100")Or the number of probes with IVFFlat
Item.connection.execute("SET ivfflat.probes = 3")Half-Precision Vectors
Use the halfvec type to store half-precision vectors
class AddEmbeddingToItems < ActiveRecord::Migration[8.1]
def change
add_column :items, :embedding, :halfvec, limit: 3 # dimensions
end
endHalf-Precision Indexing
Index vectors at half precision for smaller indexes
class AddIndexToItemsEmbedding < ActiveRecord::Migration[8.1]
def change
add_index :items, "(embedding::halfvec(3)) halfvec_l2_ops", using: :hnsw
end
endGet the nearest neighbors
Item.nearest_neighbors(:embedding, [0.9, 1.3, 1.1], distance: "euclidean", precision: "half").first(5)Binary Vectors
Use the bit type to store binary vectors
class AddEmbeddingToItems < ActiveRecord::Migration[8.1]
def change
add_column :items, :embedding, :bit, limit: 3 # dimensions
end
endGet the nearest neighbors by Hamming distance
Item.nearest_neighbors(:embedding, "101", distance: "hamming").first(5)Binary Quantization
Use expression indexing for binary quantization
class AddIndexToItemsEmbedding < ActiveRecord::Migration[8.1]
def change
add_index :items, "(binary_quantize(embedding)::bit(3)) bit_hamming_ops", using: :hnsw
end
endSparse Vectors
Use the sparsevec type to store sparse vectors
class AddEmbeddingToItems < ActiveRecord::Migration[8.1]
def change
add_column :items, :embedding, :sparsevec, limit: 3 # dimensions
end
endGet the nearest neighbors
embedding = Neighbor::SparseVector.new({0 => 0.9, 1 => 1.3, 2 => 1.1}, 3)
Item.nearest_neighbors(:embedding, embedding, distance: "euclidean").first(5)MariaDB
Distance
Supported values are:
euclideancosinehamming
Indexing
Vector columns must use null: false to add a vector index
class CreateItems < ActiveRecord::Migration[8.1]
def change
create_table :items do |t|
t.vector :embedding, limit: 3, null: false
t.index :embedding, type: :vector
end
end
endBinary Vectors
Use the bigint type to store binary vectors
class AddEmbeddingToItems < ActiveRecord::Migration[8.1]
def change
add_column :items, :embedding, :bigint
end
endNote: Binary vectors can have up to 64 dimensions
Get the nearest neighbors by Hamming distance
Item.nearest_neighbors(:embedding, 5, distance: "hamming").first(5)MySQL
Distance
Supported values are:
euclideancosinehamming
Note: The DISTANCE() function is only available on HeatWave
Binary Vectors
Use the binary type to store binary vectors
class AddEmbeddingToItems < ActiveRecord::Migration[8.1]
def change
add_column :items, :embedding, :binary
end
endGet the nearest neighbors by Hamming distance
Item.nearest_neighbors(:embedding, "\x05", distance: "hamming").first(5)sqlite-vec
Distance
Supported values are:
euclideancosinetaxicabhamming
Dimensions
For sqlite-vec, it’s a good idea to specify the number of dimensions to ensure all records have the same number.
class Item < ApplicationRecord
has_neighbors :embedding, dimensions: 3
endVirtual Tables
You can also use virtual tables
class AddEmbeddingToItems < ActiveRecord::Migration[8.1]
def change
# Rails 8+
create_virtual_table :items, :vec0, [
"id integer PRIMARY KEY AUTOINCREMENT NOT NULL",
"embedding float[3] distance_metric=L2"
]
# Rails < 8
execute <<~SQL
CREATE VIRTUAL TABLE items USING vec0(
id integer PRIMARY KEY AUTOINCREMENT NOT NULL,
embedding float[3] distance_metric=L2
)
SQL
end
endUse distance_metric=cosine for cosine distance
You can optionally ignore any shadow tables that are created
ActiveRecord::SchemaDumper.ignore_tables += [
"items_chunks", "items_rowids", "items_vector_chunks00"
]Get the k nearest neighbors
Item.where("embedding MATCH ?", [1, 2, 3].to_s).where(k: 5).order(:distance)Filter by primary key
Item.where(id: [2, 3]).where("embedding MATCH ?", [1, 2, 3].to_s).where(k: 5).order(:distance)Int8 Vectors
Use the type option for int8 vectors
class Item < ApplicationRecord
has_neighbors :embedding, dimensions: 3, type: :int8
endBinary Vectors
Use the type option for binary vectors
class Item < ApplicationRecord
has_neighbors :embedding, dimensions: 8, type: :bit
endGet the nearest neighbors by Hamming distance
Item.nearest_neighbors(:embedding, "\x05", distance: "hamming").first(5)Examples
- Embeddings with OpenAI
- Binary embeddings with Cohere
- Sentence embeddings with Informers
- Hybrid search with Informers
- Sparse search with Transformers.rb
- Recommendations with Disco
OpenAI Embeddings
Generate a model
rails generate model Document content:text embedding:vector{1536}
rails db:migrateAnd add has_neighbors
class Document < ApplicationRecord
has_neighbors :embedding
endCreate a method to call the embeddings API
def embed(input)
url = "https://api.openai.com/v1/embeddings"
headers = {
"Authorization" => "Bearer #{ENV.fetch("OPENAI_API_KEY")}",
"Content-Type" => "application/json"
}
data = {
input: input,
model: "text-embedding-3-small"
}
response = Net::HTTP.post(URI(url), data.to_json, headers).tap(&:value)
JSON.parse(response.body)["data"].map { |v| v["embedding"] }
endPass your input
input = [
"The dog is barking",
"The cat is purring",
"The bear is growling"
]
embeddings = embed(input)Store the embeddings
documents = []
input.zip(embeddings) do |content, embedding|
documents << {content: content, embedding: embedding}
end
Document.insert_all!(documents)And get similar documents
document = Document.first
document.nearest_neighbors(:embedding, distance: "cosine").first(5).map(&:content)See the complete code
Cohere Embeddings
Generate a model
rails generate model Document content:text embedding:bit{1536}
rails db:migrateAnd add has_neighbors
class Document < ApplicationRecord
has_neighbors :embedding
endCreate a method to call the embed API
def embed(input, input_type)
url = "https://api.cohere.com/v2/embed"
headers = {
"Authorization" => "Bearer #{ENV.fetch("CO_API_KEY")}",
"Content-Type" => "application/json"
}
data = {
texts: input,
model: "embed-v4.0",
input_type: input_type,
embedding_types: ["ubinary"]
}
response = Net::HTTP.post(URI(url), data.to_json, headers).tap(&:value)
JSON.parse(response.body)["embeddings"]["ubinary"].map { |e| e.map { |v| v.chr.unpack1("B*") }.join }
endPass your input
input = [
"The dog is barking",
"The cat is purring",
"The bear is growling"
]
embeddings = embed(input, "search_document")Store the embeddings
documents = []
input.zip(embeddings) do |content, embedding|
documents << {content: content, embedding: embedding}
end
Document.insert_all!(documents)Embed the search query
query = "forest"
query_embedding = embed([query], "search_query")[0]And search the documents
Document.nearest_neighbors(:embedding, query_embedding, distance: "hamming").first(5).map(&:content)See the complete code
Sentence Embeddings
You can generate embeddings locally with Informers.
Generate a model
rails generate model Document content:text embedding:vector{384}
rails db:migrateAnd add has_neighbors
class Document < ApplicationRecord
has_neighbors :embedding
endLoad a model
model = Informers.pipeline("embedding", "sentence-transformers/all-MiniLM-L6-v2")Pass your input
input = [
"The dog is barking",
"The cat is purring",
"The bear is growling"
]
embeddings = model.(input)Store the embeddings
documents = []
input.zip(embeddings) do |content, embedding|
documents << {content: content, embedding: embedding}
end
Document.insert_all!(documents)And get similar documents
document = Document.first
document.nearest_neighbors(:embedding, distance: "cosine").first(5).map(&:content)See the complete code
Hybrid Search
You can use Neighbor for hybrid search with Informers.
Generate a model
rails generate model Document content:text embedding:vector{768}
rails db:migrateAnd add has_neighbors and a scope for keyword search
class Document < ApplicationRecord
has_neighbors :embedding
scope :search, ->(query) {
where("to_tsvector(content) @@ plainto_tsquery(?)", query)
.order(Arel.sql("ts_rank_cd(to_tsvector(content), plainto_tsquery(?)) DESC", query))
}
endCreate some documents
Document.create!(content: "The dog is barking")
Document.create!(content: "The cat is purring")
Document.create!(content: "The bear is growling")Generate an embedding for each document
embed = Informers.pipeline("embedding", "Snowflake/snowflake-arctic-embed-m-v1.5")
embed_options = {model_output: "sentence_embedding", pooling: "none"} # specific to embedding model
Document.find_each do |document|
embedding = embed.(document.content, **embed_options)
document.update!(embedding: embedding)
endPerform keyword search
query = "growling bear"
keyword_results = Document.search(query).limit(20).load_asyncAnd semantic search in parallel (the query prefix is specific to the embedding model)
query_prefix = "Represent this sentence for searching relevant passages: "
query_embedding = embed.(query_prefix + query, **embed_options)
semantic_results =
Document.nearest_neighbors(:embedding, query_embedding, distance: "cosine").limit(20).load_asyncTo combine the results, use Reciprocal Rank Fusion (RRF)
Neighbor::Reranking.rrf(keyword_results, semantic_results).first(5)Or a reranking model
rerank = Informers.pipeline("reranking", "mixedbread-ai/mxbai-rerank-xsmall-v1")
results = (keyword_results + semantic_results).uniq
rerank.(query, results.map(&:content)).first(5).map { |v| results[v[:doc_id]] }See the complete code
Sparse Search
You can generate sparse embeddings locally with Transformers.rb.
Generate a model
rails generate model Document content:text embedding:sparsevec{30522}
rails db:migrateAnd add has_neighbors
class Document < ApplicationRecord
has_neighbors :embedding
endLoad a model to generate embeddings
class EmbeddingModel
def initialize(model_id)
@model = Transformers::AutoModelForMaskedLM.from_pretrained(model_id)
@tokenizer = Transformers::AutoTokenizer.from_pretrained(model_id)
@special_token_ids = @tokenizer.special_tokens_map.map { |_, token| @tokenizer.vocab[token] }
end
def embed(input)
feature = @tokenizer.(input, padding: true, truncation: true, return_tensors: "pt", return_token_type_ids: false)
output = @model.(**feature)[0]
values = Torch.max(output * feature[:attention_mask].unsqueeze(-1), dim: 1)[0]
values = Torch.log(1 + Torch.relu(values))
values[0.., @special_token_ids] = 0
values.to_a
end
end
model = EmbeddingModel.new("opensearch-project/opensearch-neural-sparse-encoding-v1")Pass your input
input = [
"The dog is barking",
"The cat is purring",
"The bear is growling"
]
embeddings = model.embed(input)Store the embeddings
documents = []
input.zip(embeddings) do |content, embedding|
documents << {content: content, embedding: Neighbor::SparseVector.new(embedding)}
end
Document.insert_all!(documents)Embed the search query
query = "forest"
query_embedding = model.embed([query])[0]And search the documents
Document.nearest_neighbors(:embedding, Neighbor::SparseVector.new(query_embedding), distance: "inner_product").first(5).map(&:content)See the complete code
Disco Recommendations
You can use Neighbor for online item-based recommendations with Disco. We’ll use MovieLens data for this example.
Generate a model
rails generate model Movie name:string factors:cube
rails db:migrateAnd add has_neighbors
class Movie < ApplicationRecord
has_neighbors :factors, dimensions: 20, normalize: true
endFit the recommender
data = Disco.load_movielens
recommender = Disco::Recommender.new(factors: 20)
recommender.fit(data)Store the item factors
movies = []
recommender.item_ids.each do |item_id|
movies << {name: item_id, factors: recommender.item_factors(item_id)}
end
Movie.create!(movies)And get similar movies
movie = Movie.find_by(name: "Star Wars (1977)")
movie.nearest_neighbors(:factors, distance: "cosine").first(5).map(&:name)See the complete code for cube and pgvector
History
View the changelog
Contributing
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- Report bugs
- Fix bugs and submit pull requests
- Write, clarify, or fix documentation
- Suggest or add new features
To get started with development:
git clone https://github.com/ankane/neighbor.git
cd neighbor
bundle install
# Postgres
createdb neighbor_test
bundle exec rake test:postgresql
# SQLite
bundle exec rake test:sqlite
# MariaDB
docker run -e MARIADB_ALLOW_EMPTY_ROOT_PASSWORD=1 -e MARIADB_DATABASE=neighbor_test -p 3307:3306 mariadb:11.8
bundle exec rake test:mariadb
# MySQL
docker run -e MYSQL_ALLOW_EMPTY_PASSWORD=1 -e MYSQL_DATABASE=neighbor_test -p 3306:3306 mysql:9
bundle exec rake test:mysql