0.0
No release in over 3 years
Nearest neighbor search for Ruby and S3 Vectors
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
 Dependencies
 Project Readme

Neighbor S3

Nearest neighbor search for Ruby and S3 Vectors

Installation

Add this line to your application’s Gemfile:

gem "neighbor-s3"

Create a vector bucket and set your AWS credentials in your environment:

AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...

Getting Started

Create an index

index = Neighbor::S3::Index.new("items", bucket: "my-bucket", dimensions: 3, distance: "cosine")
index.create

Add vectors

index.add(1, [1, 1, 1])
index.add(2, [2, 2, 2])
index.add(3, [1, 1, 2])

Search for nearest neighbors to a vector

index.search([1, 1, 1], count: 5)

Search for nearest neighbors to a vector in the index

index.search_id(1, count: 5)

IDs are treated as strings by default, but can also be treated as integers

Neighbor::S3::Index.new("items", id_type: "integer", ...)

Operations

Add or update a vector

index.add(id, vector)

Add or update multiple vectors

index.add_all([{id: 1, vector: [1, 2, 3]}, {id: 2, vector: [4, 5, 6]}])

Get a vector

index.find(id)

Get all vectors

index.find_in_batches do |batch|
  # ...
end

Remove a vector

index.remove(id)

Remove multiple vectors

index.remove_all(ids)

Metadata

Add a vector with metadata

index.add(id, vector, metadata: {category: "A"})

Add multiple vectors with metadata

index.add_all([
  {id: 1, vector: [1, 2, 3], metadata: {category: "A"}},
  {id: 2, vector: [4, 5, 6], metadata: {category: "B"}}
])

Get metadata with search results

index.search(vector, with_metadata: true)

Filter by metadata

index.search(vector, filter: {category: "A"})

Supports these operators

Specify non-filterable metadata on index creation

Neighbor::S3::Index.new(name, non_filterable: ["category"], ...)

Example

You can use Neighbor S3 for online item-based recommendations with Disco. We’ll use MovieLens data for this example.

Create an index

index = Neighbor::S3::Index.new("movies", bucket: "my-bucket", dimensions: 20, distance: "cosine")

Fit the recommender

data = Disco.load_movielens
recommender = Disco::Recommender.new(factors: 20)
recommender.fit(data)

Store the item factors

index.add_all(recommender.item_ids.map { |v| {id: v, vector: recommender.item_factors(v)} })

And get similar movies

index.search_id("Star Wars (1977)").map { |v| v[:id] }

See the complete code

Reference

Get index info

index.info

Check if an index exists

index.exists?

Drop an index

index.drop

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone https://github.com/ankane/neighbor-s3.git
cd neighbor-s3
bundle install
bundle exec rake test