Project

shameless

0.02
No commit activity in last 3 years
No release in over 3 years
Scalable distributed append-only data store
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

>= 0
>= 0

Runtime

~> 4.0
 Project Readme

Shameless

Build Status

Shameless is an implementation of a schemaless, distributed, append-only store built on top of MySQL and the Sequel gem. It was extracted from a battle-tested codebase of our main application at HotelTonight. Since it's using Sequel for database access, it could work on any database, e.g. postgres, although we've only used it with MySQL.

Background

Shameless was born out of our need to have highly scalable, distributed storage for hotel rates. Rates are a way hotels package their rooms, they typically include check-in and check-out date, room type, rate plan, net price, discount, extra services, etc. Our original solution of storing rates in a typical relational SQL table was reaching its limits due to write congestion, migration anxiety, and high maintenance.

Hotel rates change very frequently, so our solution needed to have consistent write latency. There are also multiple agents mutating various aspects of those rates, so we wanted something that would enable versioning. We also wanted to avoid having to create migrations whenever we were adding more data to rates.

Concept

The whole idea of Shameless is to split a regular SQL table into index tables and content tables. Index tables map the fields you want to query by to UUIDs, content tables map UUIDs to model contents (bodies). In addition, both index and content tables are sharded.

The body of the model is schema-less, you can store arbitrary data structures in it. Under the hood, the body is serialized using MessagePack and stored as a blob in a single database column (hence the need for index tables).

The process of querying for records can be described as:

  1. Query the index tables by index fields (e.g. hotel ID, check-in date, and length of stay), sharded by hotel ID, getting get back a list of UUIDs
  2. Query the content tables, sharded by UUID, for most recent version of model

Inserting a record is similar:

  1. Generate a UUID
  2. Serialize and write model content into appropriate shard of the content tables
  3. Insert a row (index fields + model UUID) to the appropriate shard of the index table

Inserting a new version of an existing record is even simpler:

  1. Increment version
  2. Serialize and write model content into appropriate shard of the content tables

Naturally, shameless hides all that complexity behind a straight-forward API.

Usage

Creating a store

The core object of shameless is a Store. Here's how you can set one up:

# config/initializers/rate_store.rb

RateStore = Shameless::Store.new(:rate_store) do |c|
  c.partition_urls = [ENV['RATE_STORE_DATABASE_URL_0'], ENV['RATE_STORE_DATABASE_URL_1']
  c.shards_count = 512 # total number of shards across all partitions
  c.connection_options = {max_connections: 10} # connection options passed to `Sequel.connect`
  c.database_extensions = [:newrelic_instrumentation]
  c.create_table_options = {engine: "InnoDB"} # passed to Sequel's `create_table`
end

The initializer argument (:rate_store) defines the namespace by which all tables will be prefixed, in this case rate_store_. If you pass nil, there will be no prefix.

Once you've got the Store configured, you can declare models.

Declaring models

Models specify the kinds of entities you want to persist in your store. Models are simple Ruby classes (even anonymous) that you attach to a Store using Store#attach(model), e.g.:

# app/models/rate.rb

class Rate
  RateStore.attach(self)

  # ...
end

By default, this will map to tables called rate_store_rate_[000000-000511] by lowercasing the class name. You can also provide the table namespace using a second argument, e.g.:

my_model = Class.new do
  RateStore.attach(self, :rates)
end

A model is useless without indices. Let's see how to define them.

Defining indices

Indices are a crucial component of shameless. They allow us to perform fast lookups for model UUIDs. Here's how you define an index:

class Rate
  RateStore.attach(self)

  index do
    integer :hotel_id
    string :room_type
    string :check_in_date # at the moment, only integer and string types are supported

    shard_on :hotel_id # required, values need to be numeric
  end
end

The default index is called a primary index, the corresponding tables would be called rate_store_rate_primary_index_[000000-000511]. You can add additional indices you'd like to query by:

class Rate
  RateStore.attach(self)

  index do
    # ..
  end

  index :secondary do
    integer :hotel_id
    string :gateway
    string :discount_type

    shard_on :hotel_id
  end
end

Defining cells

Model content is stored in blobs called "cells". You can think of cells as separate model columns that can store rich data structures and can change independently over time. The default cell is called "base" (that's what all model-level accessors delegate to), but you can declare additional cells using Model.cell:

class Rate
  RateStore.attach(self)

  index do
    # ..
  end

  cell :meta
end

Reading/writing

To write data to the model, use Model.put Model#save, Cell#save, Model#update, or Cell#update. Model.put will perform an "upsert", i.e. it will try to find an existing record with the given index fields, and insert a new version for that record's base cell if it finds one, or pick a new UUID, write the first version of the base cell, and write to all indices otherwise.

Here are some examples of how you can read and write data from/to a shameless store:

# Writing - all index fields are required, the rest is the schemaless content
rate = Rate.put(hotel_id: 1, room_type: '1 bed', check_in_date: Date.today, gateway: 'pegasus', discount_type: 'geo', net_price: 120.0)
rate[:net_price] # => 120.0 # access in the "base" cell

# Create a new version of the "base" cell
rate[:net_price] = 130.0
rate.save

# You can also access the "base" cell explicitly
rate.base[:net_price] = 140.0
rate.base.save

# Reading from/writing to a different cell is simple, too:
rate.meta[:hotel_enabled] = true
rate.meta.save

# You can also do that in one go using `Model#update` or `Cell#update`. This writes a new
# version of the cell, merging the hash passed in as parameter with existing values.
rate.update(tax_rate: 11.0, gateway: 'pegasus')
rate.body # => {net_price: 140.0, tax_rate: 11.0, gateway: 'pegasus'}

rate.meta.update(hotel_enabled: false)

To query, use Model.where (also using Sequel's virtual row blocks):

# Querying by primary index
rates = Rate.where(hotel_id: 1, room_type: '1 bed', check_in_date: Date.today)

# Querying by a named index
rates = Rate.secondary_index.where(hotel_id: 1, gateway: 'pegasus', discount_type: 'geo')
rates.first[:net_price] # => 130.0

# Query using Sequel's virtual row block (handy for inequality operators)
rates = Rate.where(hotel_id: 1, room_type: '1 bed') { check_in_date > Date.today }

To access a cell field that you're not sure has a value, you can use and Cell#fetch (Model#fetch delegates to the base cell) to get a value from a cell, or a default, e.g.:

rate[:net_price] = 130.0
rate.fetch(:net_price, 100) # => 130.0
rate.meta.fetch(:enabled, true) # => true

Cells are versioned, the current version is stored in a column called ref_key. The first version of a cell has a ref_key of zero. To access a previous version of a cell, use Cell#previous (Model#previous delegates to the base cell). Example:

# ...
rate[:net_price] # => 120.0
rate.ref_key # => 1
rate.update(net_price: 130.0)
rate.ref_key # => 2
rate.previous[:net_price] # => 120.0
rate.previous.ref_key # => 1
rate.previous.previous.ref_key # => 0
rate.previous.previous.previous # => nil

It could happen that another process may have updated a model/cell. To fetch the latest state, use Cell#reload (Model#reload reloads all cells), e.g.:

rate[:net_price] # => 120.0

# Another process updates the cell
Rate.where(hotel_id: rate[:hotel_id]).first.update(net_price: 130.0)

rate[:net_price] # => 120.0
rate.reload
rate[:net_price] # => 130.0

To check if a given cell exists, use Cell#present? (as you can suspect, Model#present? delegates to the base cell). You can also use Model#cells to iterate over all cells, e.g.:

rate.present? # => true
rate.meta.present? # => false

rate.cells.any?(&:present?) # => true

To see the cell's full state (body + metadata), use Cell#as_json (Model#as_json delegates to the base cell), e.g.:

rate.as_json # => {id: 123, uuid: "...", created_at: "...", column_name: "base", ref_key: 3,
  # body: {hotel_id: 1, check_in_date: "2017-01-03", room_type: "ROH", net_price: 130.0}}

Creating tables

To create all shards for all tables, across all partitions, run:

RateStore.create_tables!

This will create the underlying index tables, content tables, together with database indices for fast access.

Concurrent writes

Since writes to shameless aren't atomic, concurrency control needs to be moved to application code. We're using a setup where almost all our writes go through queues, one queue per shard. We're using Store#each_shard to match queue names to shards and to aggregate queue stats across all shards.

Using shameless as a data stream store (similar to Kafka)

Thanks to storing each write to shameless as a new record in the underlying database table, we're able to use our shameless store as a log for stream processing. For each shard, we have a worker that goes through all new entries in that shard and triggers various event processors to handle all kinds of asynchronous work. For that purpose, we're using Model.fetch_latest_cells(shard:, cursor:, limit:), and incrementing a cursor (stored in Redis) after each record has been processed successfully. We're using the cells' IDs as cursors, using Cell#id, which returns the underlying table's primary key value.

Utilities

Sometimes it may be useful to know where a given model will end up, based on its shardable value. For this, you can use Store#find_shard(shardable_value), e.g.: RateStore.find_shard(hotel.id) # => 196.

Installation

Add this line to your application's Gemfile:

gem 'shameless'

And then execute:

$ bundle

Or install it yourself as:

$ gem install shameless

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake spec to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/hoteltonight/shameless.

License

The gem is available as open source under the terms of the MIT License.

Credits

Shameless was inspired by the following resources: