0.0
No release in over 3 years
Handles mass data imports via temporary staging tables, supporting PostgreSQL and MySQL, with a clean DSL for table lifecycle management and built-in bulk insert capabilities.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
 Dependencies

Development

~> 2.0
>= 0
>= 0
~> 13.0
~> 3.0

Runtime

 Project Readme

๐ŸŽญ StagingTable

The red carpet for your data before it hits the main stage.

Gem Version Test Status RBS

Stop shoving data directly into your production tables like a savage. Give it a dressing room first!

StagingTable lets you bulk import data into a temporary "staging" table, validate/massage it, and then gracefully transfer it to your real tables using efficient SQL strategies.

It's like INSERT INTO ... SELECT but with a Ruby DSL that makes you smile.

๐ŸŒŸ Why?

Importing large datasets is hard.

  • Direct inserts are slow and bypass validations.
  • ActiveRecord is safe but slow for millions of records.
  • Raw SQL is fast but messy and hard to maintain.

StagingTable gives you the best of both worlds:

  1. ๐Ÿš€ Speed: Bulk insert into a temp table (no index overhead yet).
  2. ๐Ÿ›ก๏ธ Safety: Validate or query the data before it touches your real table.
  3. ๐Ÿงน Cleanliness: Automatic cleanup of temp tables.
  4. ๐Ÿ”„ Power: Built-in support for UPSERT (INSERT ON CONFLICT) and duplicate handling.

๐Ÿ“ฆ Installation

Add this line to your application's Gemfile:

gem 'staging_table'

And then execute:

bundle install

๐Ÿ› ๏ธ Usage

The "Happy Path" (Block Syntax)

The simplest way to use StagingTable. It handles the creation and cleanup of the temporary table automatically.

# 1. Create a staging table that mirrors the 'users' table
StagingTable.stage(User) do |staging|
  
  # 2. Bulk insert data (Hashes, AR Objects, or Relations)
  staging.insert([
    { name: 'John Doe', email: 'john@example.com' },
    { name: 'Jane Doe', email: 'jane@example.com' }
  ])
  
  # 3. The 'staging' object is a real ActiveRecord model!
  #    You can query it, validate it, or massage data.
  puts "Staged count: #{staging.count}"
  staging.where(email: nil).delete_all
  
  # 4. When the block exits, data is automatically transferred 
  #    to the 'users' table using a single SQL statement.
end

๐Ÿ“ฅ Importing Data

The insert method is flexible. Feed it whatever you have:

StagingTable.stage(User) do |staging|
  # ๐ŸŽ Array of Hashes
  staging.insert([
    { name: 'John', email: 'john@example.com' },
    { name: 'Jane', email: 'jane@example.com' }
  ])

  # ๐ŸŠ Array of ActiveRecord objects
  staging.insert(User.where(active: true).to_a)

  # ๐Ÿ‡ ActiveRecord::Relation (Lazy loading)
  staging.insert(User.where(role: 'admin'))
end

For massive datasets, use insert_from_query to process in batches and keep memory usage low:

StagingTable.stage(User) do |staging|
  # Processes in batches of 1000 (configurable)
  staging.insert_from_query(User.where(needs_migration: true))
end

โš”๏ธ Handling Duplicates (Upsert)

Don't let duplicates crash your party. Configure the transfer strategy to handle conflicts gracefully.

StagingTable.stage(User, 
  transfer_strategy: :upsert,    # Default is :insert
  conflict_target: [:email],     # Column(s) to check for conflicts
  conflict_action: :update       # :update (overwrite) or :ignore (skip)
) do |staging|
  staging.insert(records)
end

๐Ÿ“Š Transfer Results

Every transfer returns a TransferResult with detailed statistics:

result = StagingTable.stage(User) do |staging|
  staging.insert(records)
end

puts result.inserted  # => 450  (new records)
puts result.updated   # => 50   (updated via upsert)
puts result.skipped   # => 10   (ignored conflicts)
puts result.total     # => 510  (total processed)
puts result.success?  # => true (any inserts or updates?)

# Also available as a hash
result.to_h # => { inserted: 450, updated: 50, skipped: 10, total: 510 }

๐Ÿช Callbacks

Hook into the staging lifecycle to validate, transform, or log:

StagingTable.stage(User,
  before_insert: ->(session) {
    Rails.logger.info "Starting import..."
  },
  after_insert: ->(session, records) {
    Rails.logger.info "Staged #{records.count} records"
  },
  before_transfer: ->(session) {
    # Clean up invalid data before transfer
    session.where(email: nil).delete_all
    session.where(status: 'banned').delete_all
  },
  after_transfer: ->(session, result) {
    Rails.logger.info "Imported #{result.inserted} new, updated #{result.updated}"
  }
) do |staging|
  staging.insert(records)
end

๐Ÿ“ก Instrumentation (ActiveSupport::Notifications)

Monitor and debug your imports in production with built-in instrumentation:

# Subscribe to transfer events
StagingTable::Instrumentation.subscribe(:transfer) do |event|
  Rails.logger.info "[StagingTable] Transfer to #{event.payload[:source_table]} " \
                    "completed in #{event.duration.round(2)}ms"
  StatsD.measure('staging_table.transfer.duration', event.duration)
  StatsD.increment('staging_table.transfer.inserted', event.payload[:result].inserted)
end

# Subscribe to all StagingTable events
StagingTable::Instrumentation.subscribe_all do |event|
  Rails.logger.debug "[StagingTable] #{event.name}: #{event.duration.round(2)}ms"
end

Available Events:

Event Payload Description
staging_table.stage source_model, source_table, options, result Wraps the entire staging block
staging_table.create_table source_model, source_table, staging_table When staging table is created
staging_table.insert source_model, source_table, staging_table, record_count, batch_size When records are inserted
staging_table.transfer source_model, source_table, staging_table, strategy, staged_count, result When data is transferred
staging_table.drop_table source_model, source_table, staging_table When staging table is dropped

You can also use standard ActiveSupport::Notifications directly:

ActiveSupport::Notifications.subscribe('staging_table.transfer') do |event|
  # Your monitoring code here
end

๐ŸŽ›๏ธ Manual Control

Need to keep the staging table alive across multiple background jobs? We got you.

Note: Temporary tables in PostgreSQL are session-specific. This only works if you stay in the same DB connection!

# Create the session
session = StagingTable::Session.new(User, excluded_columns: %w[created_at updated_at])
session.create_table

begin
  # Insert data in chunks
  session.insert(batch_1)
  session.insert(batch_2)
  
  # Run some sanity checks
  if session.where(status: 'banned').exists?
    raise "Whoa there! No banned users allowed."
  end
  
  # Commit to the real table
  result = session.transfer
  puts "Transferred #{result.total} records"
ensure
  # Always clean up your mess
  session.drop_table
end

โš™๏ธ Configuration

Set global defaults in an initializer (e.g., config/initializers/staging_table.rb):

StagingTable.configure do |config|
  config.default_batch_size = 2000
  config.default_transfer_strategy = :insert # or :upsert
end

๐Ÿ” Type Checking (RBS)

This gem ships with RBS type signatures for static type checking. The signatures are located in the sig/ directory and are validated in CI.

Using the Type Signatures

If you want to type-check your own code that uses StagingTable:

# Install RBS and the collection for dependencies
gem install rbs
rbs collection install

# Validate signatures
rbs -I sig validate

Using with Steep

For full type checking with Steep:

# Gemfile
gem 'steep', group: :development
# Steepfile
target :lib do
  signature "sig"
  check "lib"
  library "activerecord"
  library "activesupport"
end
bundle exec steep check

๐Ÿ’พ Supported Databases

We speak your language.

Database Strategy
PostgreSQL CREATE TABLE ... (LIKE ... INCLUDING DEFAULTS) + INSERT ... ON CONFLICT
MySQL CREATE TABLE ... LIKE ... + INSERT ... ON DUPLICATE KEY UPDATE
SQLite Copies structure from sqlite_master + INSERT ... ON CONFLICT

๐Ÿค Contributing

Found a bug? Want to add support for Oracle? (Please don't, but if you must...)

  1. Fork it
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request

Running Tests

We support the big three. Set up your environment variables for PG/MySQL or just run SQLite tests out of the box.

# Run everything
bundle exec rake spec

# Pick your poison
bundle exec rake spec:postgresql
bundle exec rake spec:mysql
bundle exec rspec --tag sqlite

๐Ÿ™ Special Thanks

Special thanks to agustin-peluffo who created the first implementation for a project!


Made with โค๏ธ by eagerworks