๐ญ StagingTable
The red carpet for your data before it hits the main stage.
Stop shoving data directly into your production tables like a savage. Give it a dressing room first!
StagingTable lets you bulk import data into a temporary "staging" table, validate/massage it, and then gracefully transfer it to your real tables using efficient SQL strategies.
It's like INSERT INTO ... SELECT but with a Ruby DSL that makes you smile.
๐ Why?
Importing large datasets is hard.
- Direct inserts are slow and bypass validations.
- ActiveRecord is safe but slow for millions of records.
- Raw SQL is fast but messy and hard to maintain.
StagingTable gives you the best of both worlds:
- ๐ Speed: Bulk insert into a temp table (no index overhead yet).
- ๐ก๏ธ Safety: Validate or query the data before it touches your real table.
- ๐งน Cleanliness: Automatic cleanup of temp tables.
-
๐ Power: Built-in support for
UPSERT(INSERT ON CONFLICT) and duplicate handling.
๐ฆ Installation
Add this line to your application's Gemfile:
gem 'staging_table'And then execute:
bundle install๐ ๏ธ Usage
The "Happy Path" (Block Syntax)
The simplest way to use StagingTable. It handles the creation and cleanup of the temporary table automatically.
# 1. Create a staging table that mirrors the 'users' table
StagingTable.stage(User) do |staging|
# 2. Bulk insert data (Hashes, AR Objects, or Relations)
staging.insert([
{ name: 'John Doe', email: 'john@example.com' },
{ name: 'Jane Doe', email: 'jane@example.com' }
])
# 3. The 'staging' object is a real ActiveRecord model!
# You can query it, validate it, or massage data.
puts "Staged count: #{staging.count}"
staging.where(email: nil).delete_all
# 4. When the block exits, data is automatically transferred
# to the 'users' table using a single SQL statement.
end๐ฅ Importing Data
The insert method is flexible. Feed it whatever you have:
StagingTable.stage(User) do |staging|
# ๐ Array of Hashes
staging.insert([
{ name: 'John', email: 'john@example.com' },
{ name: 'Jane', email: 'jane@example.com' }
])
# ๐ Array of ActiveRecord objects
staging.insert(User.where(active: true).to_a)
# ๐ ActiveRecord::Relation (Lazy loading)
staging.insert(User.where(role: 'admin'))
endFor massive datasets, use insert_from_query to process in batches and keep memory usage low:
StagingTable.stage(User) do |staging|
# Processes in batches of 1000 (configurable)
staging.insert_from_query(User.where(needs_migration: true))
endโ๏ธ Handling Duplicates (Upsert)
Don't let duplicates crash your party. Configure the transfer strategy to handle conflicts gracefully.
StagingTable.stage(User,
transfer_strategy: :upsert, # Default is :insert
conflict_target: [:email], # Column(s) to check for conflicts
conflict_action: :update # :update (overwrite) or :ignore (skip)
) do |staging|
staging.insert(records)
end๐ Transfer Results
Every transfer returns a TransferResult with detailed statistics:
result = StagingTable.stage(User) do |staging|
staging.insert(records)
end
puts result.inserted # => 450 (new records)
puts result.updated # => 50 (updated via upsert)
puts result.skipped # => 10 (ignored conflicts)
puts result.total # => 510 (total processed)
puts result.success? # => true (any inserts or updates?)
# Also available as a hash
result.to_h # => { inserted: 450, updated: 50, skipped: 10, total: 510 }๐ช Callbacks
Hook into the staging lifecycle to validate, transform, or log:
StagingTable.stage(User,
before_insert: ->(session) {
Rails.logger.info "Starting import..."
},
after_insert: ->(session, records) {
Rails.logger.info "Staged #{records.count} records"
},
before_transfer: ->(session) {
# Clean up invalid data before transfer
session.where(email: nil).delete_all
session.where(status: 'banned').delete_all
},
after_transfer: ->(session, result) {
Rails.logger.info "Imported #{result.inserted} new, updated #{result.updated}"
}
) do |staging|
staging.insert(records)
end๐ก Instrumentation (ActiveSupport::Notifications)
Monitor and debug your imports in production with built-in instrumentation:
# Subscribe to transfer events
StagingTable::Instrumentation.subscribe(:transfer) do |event|
Rails.logger.info "[StagingTable] Transfer to #{event.payload[:source_table]} " \
"completed in #{event.duration.round(2)}ms"
StatsD.measure('staging_table.transfer.duration', event.duration)
StatsD.increment('staging_table.transfer.inserted', event.payload[:result].inserted)
end
# Subscribe to all StagingTable events
StagingTable::Instrumentation.subscribe_all do |event|
Rails.logger.debug "[StagingTable] #{event.name}: #{event.duration.round(2)}ms"
endAvailable Events:
| Event | Payload | Description |
|---|---|---|
staging_table.stage |
source_model, source_table, options, result
|
Wraps the entire staging block |
staging_table.create_table |
source_model, source_table, staging_table
|
When staging table is created |
staging_table.insert |
source_model, source_table, staging_table, record_count, batch_size
|
When records are inserted |
staging_table.transfer |
source_model, source_table, staging_table, strategy, staged_count, result
|
When data is transferred |
staging_table.drop_table |
source_model, source_table, staging_table
|
When staging table is dropped |
You can also use standard ActiveSupport::Notifications directly:
ActiveSupport::Notifications.subscribe('staging_table.transfer') do |event|
# Your monitoring code here
end๐๏ธ Manual Control
Need to keep the staging table alive across multiple background jobs? We got you.
Note: Temporary tables in PostgreSQL are session-specific. This only works if you stay in the same DB connection!
# Create the session
session = StagingTable::Session.new(User, excluded_columns: %w[created_at updated_at])
session.create_table
begin
# Insert data in chunks
session.insert(batch_1)
session.insert(batch_2)
# Run some sanity checks
if session.where(status: 'banned').exists?
raise "Whoa there! No banned users allowed."
end
# Commit to the real table
result = session.transfer
puts "Transferred #{result.total} records"
ensure
# Always clean up your mess
session.drop_table
endโ๏ธ Configuration
Set global defaults in an initializer (e.g., config/initializers/staging_table.rb):
StagingTable.configure do |config|
config.default_batch_size = 2000
config.default_transfer_strategy = :insert # or :upsert
end๐ Type Checking (RBS)
This gem ships with RBS type signatures for static type checking. The signatures are located in the sig/ directory and are validated in CI.
Using the Type Signatures
If you want to type-check your own code that uses StagingTable:
# Install RBS and the collection for dependencies
gem install rbs
rbs collection install
# Validate signatures
rbs -I sig validateUsing with Steep
For full type checking with Steep:
# Gemfile
gem 'steep', group: :development# Steepfile
target :lib do
signature "sig"
check "lib"
library "activerecord"
library "activesupport"
endbundle exec steep check๐พ Supported Databases
We speak your language.
| Database | Strategy |
|---|---|
| PostgreSQL |
CREATE TABLE ... (LIKE ... INCLUDING DEFAULTS) + INSERT ... ON CONFLICT
|
| MySQL |
CREATE TABLE ... LIKE ... + INSERT ... ON DUPLICATE KEY UPDATE
|
| SQLite | Copies structure from sqlite_master + INSERT ... ON CONFLICT
|
๐ค Contributing
Found a bug? Want to add support for Oracle? (Please don't, but if you must...)
- Fork it
- Create your feature branch (
git checkout -b my-new-feature) - Commit your changes (
git commit -am 'Add some feature') - Push to the branch (
git push origin my-new-feature) - Create new Pull Request
Running Tests
We support the big three. Set up your environment variables for PG/MySQL or just run SQLite tests out of the box.
# Run everything
bundle exec rake spec
# Pick your poison
bundle exec rake spec:postgresql
bundle exec rake spec:mysql
bundle exec rspec --tag sqlite๐ Special Thanks
Special thanks to agustin-peluffo who created the first implementation for a project!
Made with โค๏ธ by eagerworks