Neighbor S3
Nearest neighbor search for Ruby and S3 Vectors
Installation
Add this line to your application’s Gemfile:
gem "neighbor-s3"
Create a vector bucket and set your AWS credentials in your environment:
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...
Getting Started
Create an index
index = Neighbor::S3::Index.new("items", bucket: "my-bucket", dimensions: 3, distance: "cosine")
index.create
Add vectors
index.add(1, [1, 1, 1])
index.add(2, [2, 2, 2])
index.add(3, [1, 1, 2])
Search for nearest neighbors to a vector
index.search([1, 1, 1], count: 5)
Search for nearest neighbors to a vector in the index
index.search_id(1, count: 5)
IDs are treated as strings by default, but can also be treated as integers
Neighbor::S3::Index.new("items", id_type: "integer", ...)
Operations
Add or update a vector
index.add(id, vector)
Add or update multiple vectors
index.add_all([{id: 1, vector: [1, 2, 3]}, {id: 2, vector: [4, 5, 6]}])
Get a vector
index.find(id)
Get all vectors
index.find_in_batches do |batch|
# ...
end
Remove a vector
index.remove(id)
Remove multiple vectors
index.remove_all(ids)
Metadata
Add a vector with metadata
index.add(id, vector, metadata: {category: "A"})
Add multiple vectors with metadata
index.add_all([
{id: 1, vector: [1, 2, 3], metadata: {category: "A"}},
{id: 2, vector: [4, 5, 6], metadata: {category: "B"}}
])
Get metadata with search results
index.search(vector, with_metadata: true)
Filter by metadata
index.search(vector, filter: {category: "A"})
Supports these operators
Specify non-filterable metadata on index creation
Neighbor::S3::Index.new(name, non_filterable: ["category"], ...)
Example
You can use Neighbor S3 for online item-based recommendations with Disco. We’ll use MovieLens data for this example.
Create an index
index = Neighbor::S3::Index.new("movies", bucket: "my-bucket", dimensions: 20, distance: "cosine")
Fit the recommender
data = Disco.load_movielens
recommender = Disco::Recommender.new(factors: 20)
recommender.fit(data)
Store the item factors
index.add_all(recommender.item_ids.map { |v| {id: v, vector: recommender.item_factors(v)} })
And get similar movies
index.search_id("Star Wars (1977)").map { |v| v[:id] }
See the complete code
Reference
Get index info
index.info
Check if an index exists
index.exists?
Drop an index
index.drop
History
View the changelog
Contributing
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- Report bugs
- Fix bugs and submit pull requests
- Write, clarify, or fix documentation
- Suggest or add new features
To get started with development:
git clone https://github.com/ankane/neighbor-s3.git
cd neighbor-s3
bundle install
bundle exec rake test