Project

trove

0.02
No release in over a year
Deploy machine learning models in Ruby (and Rails)
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Runtime

 Project Readme

Trove

🔥 Deploy machine learning models in Ruby (and Rails)

Works great with XGBoost, Torch.rb, fastText, and many other gems

Installation

Add this line to your application’s Gemfile:

gem "trove"

And run:

bundle install
trove init

And configure your storage in .trove.yml.

Storage

Amazon S3

Create a bucket and enable object versioning.

Next, set up your AWS credentials. You can use the AWS CLI:

pip install awscli
aws configure

Or environment variables:

export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
export AWS_REGION=...

IAM users need:

  • s3:GetObject and s3:GetObjectVersion to pull files
  • s3:PutObject to push files
  • s3:ListBucket and s3:ListBucketVersions to list files and versions
  • s3:DeleteObject and s3:DeleteObjectVersion to delete files

Here’s an example policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Trove",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:GetObjectVersion",
                "s3:PutObject",
                "s3:ListBucket",
                "s3:ListBucketVersions",
                "s3:DeleteObject",
                "s3:DeleteObjectVersion"
            ],
            "Resource": [
                "arn:aws:s3:::my-bucket",
                "arn:aws:s3:::my-bucket/trove/*"
            ]
        }
    ]
}

If your production servers only need to pull files, only give them s3:GetObject and s3:GetObjectVersion permissions.

How It Works

Git is great for code, but it’s not ideal for large files like models. Instead, we use an object store like Amazon S3 to store and version them.

Trove creates a trove directory for you to use as a workspace. Files in this directory are ignored by Git but can be pushed and pulled from the object store. By default, files are tracked in .trove.yml to make it easy to deploy specific versions with code changes.

Getting Started

Use the trove directory to save and load models.

# training code
model.save_model("trove/model.bin")

# prediction code
model = FastText.load_model("trove/model.bin")

When a model is ready, push it to the object store with:

trove push model.bin

And commit the changes to .trove.yml. The model is now ready to be deployed.

Deployment

We recommend pulling files during the build process.

  • Heroku and Dokku
  • Docker

Make sure your storage credentials are available in the build environment.

Heroku and Dokku

Add to your Rakefile:

Rake::Task["assets:precompile"].enhance do
  Trove.pull
end

This will pull files at the very end of the asset precompile. Check the build output for:

remote:        Pulling model.bin...
remote:        Asset precompilation completed (30.00s)

Docker

Add to your Dockerfile:

RUN bundle exec trove pull

Commands

Push a file

trove push model.bin

Pull all files in .trove.yml

trove pull

Pull a specific file (uses the version in .trove.yml if present)

trove pull model.bin

Pull a specific version of a file

trove pull model.bin --version 123

Delete a file

trove delete model.bin

List files

trove list

List versions

trove versions model.bin

Ruby API

You can use the Ruby API in addition to the CLI.

Trove.push(filename)
Trove.pull
Trove.pull(filename)
Trove.pull(filename, version: version)
Trove.delete(filename)
Trove.list
Trove.versions(filename)

This makes it easy to perform operations from code, iRuby notebooks, and the Rails console.

Automated Training

By default, Trove tracks files in .trove.yml to make it easy to deploy specific versions with code changes. However, this functionality is entirely optional. Disable it with:

vcs: false

This is useful if you want to automate training or build more complex workflows.

Non-Ruby

Trove can be used in non-Ruby projects as well.

gem install trove
trove init

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone https://github.com/ankane/trove.git
cd trove
bundle install

export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
export AWS_REGION=...
export S3_BUCKET=my-bucket

bundle exec rake test