Grainery
Database seed storage system for Rails applications. Extract database records and generate seed files organized by database with automatic dependency resolution. Like a grainery stores grain, this gem stores and organizes your database seeds.
Note: This gem was developed with assistance from Claude, Anthropic's AI assistant. Claude helped with code generation, documentation, and testing strategies throughout the development process.
⚠️ Development Status: This gem is in active development and does not yet have a comprehensive test suite. While the core functionality has been tested manually, automated tests are planned for future releases. Use with caution in production environments.
Features
- ✅ Automatic database detection
- ✅ Dependency-aware loading (topological sort)
- ✅ Multi-database support
- ✅ Database schema dumping for all related databases
- ✅ Configurable per project
- ✅ Preserves custom seeds
- ✅ One seed file per table
- ✅ Clean separation of concerns
- ✅ Supports SQL Server, MySQL, PostgreSQL
- ✅ Test database management tasks
- ✅ Rails 6.1 - 8.x support
Installation
Add this line to your application's Gemfile:
gem 'grainery', path: 'grainery'And then execute:
bundle installUsage
1. Initialize Configuration
rake grainery:init_configThis auto-detects:
- All databases and model base classes
- Anonymizable fields in your database schema (email, phone, SSN, Greek documents, etc.)
- Creates
config/grainery.ymlwith detected configuration
2. Harvest Data
# Harvest with limit (100 records per table) + schema dump + anonymization
rake grainery:generate
# Harvest ALL records + schema dump + anonymization (use with caution)
rake grainery:generate_all
# Harvest data only (no schema dump) + anonymization
rake grainery:generate_data_only
# Harvest without anonymization (raw production data - use with extreme caution!)
rake grainery:generate_rawNote: By default, sensitive fields are anonymized using Faker. Configure anonymization in config/grainery.yml.
3. Load Seeds
# Load seeds only (blocked in production)
rake grainery:load
# Load schemas + seeds (blocked in production)
rake grainery:load_with_schema
# Override production protection (use with extreme caution!)
GRAINERY_ALLOW_PRODUCTION=true rake grainery:loadNote: Loading tasks are blocked in production by default to prevent accidental data loss.
This loads:
- Database schemas (if using
load_with_schema) - Harvested seeds (in dependency order)
- Custom seeds from
db/seeds.rb(last)
Directory Structure
db/
├── grainery/ # Harvested seeds (auto-generated)
│ ├── load_order.txt # Load order respecting dependencies
│ ├── primary/ # Primary database
│ │ ├── schema.rb # Database schema dump
│ │ ├── users.rb
│ │ ├── posts.rb
│ │ └── comments.rb
│ ├── other/ # Other database
│ │ ├── schema.rb # Database schema dump
│ │ └── projects.rb
│ └── banking/ # Banking database
│ ├── schema.rb # Database schema dump
│ └── employees.rb
└── seeds.rb # Custom seeds (loaded last)
Configuration
config/grainery.yml:
# Path for harvested seed files
grainery_path: db/grainery
# Database connection mappings
database_connections:
primary:
connection: test
adapter: sqlserver
model_base_class: ApplicationRecord
other:
connection: other
adapter: sqlserver
model_base_class: OtherDB
# ... other databases
# Lookup tables (harvest all records)
lookup_tables: []
# Field anonymization (column_name => faker_method)
# Set to empty hash {} to disable anonymization
anonymize_fields:
email: email
first_name: first_name
last_name: last_name
name: name
phone: phone_number
phone_number: phone_number
address: address
street_address: street_address
city: city
state: state
zip: zip_code
zip_code: zip_code
postal_code: zip_code
ssn: ssn
credit_card: credit_card_number
password: password
token: token
api_key: api_key
secret: secret
iban: iban
vat_number: greek_vat
afm: greek_vat
amka: greek_amka
social_security_number: greek_amka
ssn_greek: greek_amka
personal_number: greek_personal_number
personal_id: greek_personal_number
afm_extended: greek_personal_number
ada: greek_ada
diavgeia_id: greek_ada
decision_number: greek_ada
adam: greek_adam
adam_number: greek_adam
procurement_id: greek_adam
date_of_birth: date_of_birth
birth_date: date_of_birth
dob: date_of_birth
birthdate: date_of_birth
identity_number: identity_number
id_number: identity_number
national_id: identity_numberAvailable Rake Tasks
Grainery Tasks
# Initialize configuration
rake grainery:init_config
# Harvest data (with limit) + schema dump + anonymization
rake grainery:generate
# Harvest ALL records + schema dump + anonymization
rake grainery:generate_all
# Harvest data only (no schema dump) + anonymization
rake grainery:generate_data_only
# Harvest without anonymization (raw production data)
rake grainery:generate_raw
# Load harvested + custom seeds
rake grainery:load
# Load schemas + seeds + custom seeds
rake grainery:load_with_schema
# Clean grainery directory
rake grainery:cleanTest Database Tasks
# Setup clean test database (schema only)
rake test:db:setup_for_grainery
# or: rake db:test:setup_for_grainery
# Seed test database with grainery data
rake test:db:seed_with_grainery
# Reset and seed (one command)
rake test:db:reset_with_grainery
# or: rake db:test:reset_with_grainery
# Clean test database (truncate all tables)
rake test:db:clean
# or: rake db:test:clean
# Show test database statistics
rake test:db:stats
# or: rake db:test:statsDependency Resolution
Grainer automatically:
- Analyzes
belongs_toassociations - Builds dependency graph
- Performs topological sort
- Generates
load_order.txt
Example Load Order
# PRIMARY Database
primary/users.rb
primary/categories.rb
primary/posts.rb
primary/comments.rb
# OTHER Database
other/departments.rb
other/projects.rb
Lookup Tables
For small reference tables (statuses, types, categories), grainer can load all records instead of samples.
Add to config/grainery.yml:
lookup_tables:
- invoice_statuses
- user_roles
- categoriesFile Formats
Schema File Format
Each database gets a schema dump:
# Schema dump for primary database
# Generated: 2025-10-01 10:30:00
# Adapter: postgresql
ActiveRecord::Schema.define do
create_table "users", force: :cascade do |t|
t.string "email", null: false
t.string "name"
t.boolean "active", default: true
t.datetime "created_at", null: false
t.datetime "updated_at", null: false
end
add_index "users", ["email"], unique: true
endSeed File Format
Each table gets its own seed file:
# Harvested from primary database: users
# Records: 100
# Generated: 2025-10-01 10:30:00
User.create!(
{
email: "user1@example.com",
name: "John Doe",
active: true
},
{
email: "user2@example.com",
name: "Jane Smith",
active: true
}
)Custom Seeds
Your custom seed logic in db/seeds.rb is preserved and loaded last.
Example db/seeds.rb:
# Custom seed logic
puts "Creating admin user..."
User.find_or_create_by!(email: 'admin@example.com') do |user|
user.name = 'Admin'
user.role = 'admin'
end
puts "Setting up application defaults..."
Setting.create!(key: 'app_name', value: 'My App')Use Cases
Development
# Harvest production-like data for development with schemas
rake grainery:generate
rake grainery:load_with_schemaTesting
# Create test fixtures with schemas
rake grainery:generate
# In test setup, load schemas and seeds
rake grainery:load_with_schemaStaging
# Harvest production data (anonymized) with schemas
rake grainery:generate_all
# Deploy to staging
# Load on staging server with full schema
rake grainery:load_with_schemaCross-Database Migration
# Export from one database system
rake grainery:generate_all # Captures schema + data
# Import to another database system
rake grainery:load_with_schema # Recreates schema + loads dataSafety Features
-
Production Environment Protection: Destructive tasks (load, load_with_schema, test:db:*) are blocked in production
- Requires explicit
GRAINERY_ALLOW_PRODUCTION=trueenvironment variable to override - Includes 5-second countdown when override is used
- Requires explicit
-
Separate Directories: Harvested seeds never touch
db/seeds.rb - Dependency Order: Foreign keys respected automatically
-
Custom Preservation: Your
db/seeds.rbalways loads last -
Clean Command:
rake grainery:cleanremoves only harvested files - Optional Schema Loading: Schemas only load when explicitly requested
- Per-Database Schemas: Each database gets isolated schema file
Production Safety Matrix
Safe Operations (Read-Only):
- ✅
rake grainery:generate- Harvests data, no modifications - ✅
rake grainery:generate_all- Harvests all data, no modifications - ✅
rake grainery:generate_data_only- Harvests data only, no modifications - ✅
rake grainery:init_config- Creates config file only - ✅
rake grainery:clean- Deletes harvested files only (not database data)
Destructive Operations (Blocked by Default):
- ❌
rake grainery:load- Inserts data into database - ❌
rake grainery:load_with_schema- Modifies schema AND inserts data - ❌
rake test:db:*- All test database operations
Recommendation:
- Harvesting in production is safe and useful for creating staging/development fixtures
- Loading in production should be tested thoroughly in staging first due to lack of automated test coverage
- Always review generated files before loading into any environment
Data Anonymization
✅ Built-in Anonymization: Grainery automatically anonymizes sensitive fields using the Faker gem during harvest.
Automatic Detection
When you run rake grainery:init_config, Grainery automatically:
- Scans all database tables and columns
- Detects fields that should be anonymized based on naming patterns
- Adds them to
config/grainery.ymlwith appropriate anonymization methods
Detected patterns include: email, phone, address, ssn, password, token, Greek documents (afm, amka, ada, adam), dates of birth, and more.
How It Works
When harvesting, Grainery automatically replaces sensitive field values with fake data:
# Original production data:
{ email: "john.doe@company.com", name: "John Doe", phone: "555-1234" }
# Anonymized in seed files:
{ email: "jane_smith@example.org", name: "Sarah Johnson", phone: "555-987-6543" }Configuration
The config/grainery.yml file is automatically populated with detected fields during initialization. You can customize it as needed:
anonymize_fields:
# Global field configuration (applies to all tables)
email: email # Uses Faker::Internet.email
first_name: first_name # Uses Faker::Name.first_name
last_name: last_name # Uses Faker::Name.last_name
name: name # Uses Faker::Name.name
phone: phone_number # Uses Faker::PhoneNumber.phone_number
ssn: ssn # Uses Faker::IDNumber.valid
# Table-specific configuration (when same field appears in multiple tables)
users.address: address # Only anonymize address in users table
companies.address: skip # Don't anonymize address in companies table
# Database.table-specific configuration (most specific)
primary.users.email: email # Only for users table in primary database
other.contacts.email: email # Only for contacts table in other databaseScoping Priority:
-
database.table.field(highest priority - most specific) -
table.field(medium priority - table-specific) -
field(lowest priority - global)
When a field name appears in multiple tables, Grainery automatically uses scoped names during detection.
Disabling Anonymization
Option 1: Disable completely
# Set to empty hash
anonymize_fields: {}Option 2: Use raw generation task
rake grainery:generate_raw # Harvests without anonymizationOption 3: Skip specific fields
To keep real values for specific fields while anonymizing others, set them to skip:
anonymize_fields:
email: email # Will be anonymized
name: name # Will be anonymized
company_name: skip # Will keep real value (not anonymized)
department: skip # Will keep real value (not anonymized)
phone: phone_number # Will be anonymizedThis is useful when you need to preserve certain non-sensitive reference data while still protecting personal information.
Supported Faker Methods
Personal Information:
-
email- Fake email addresses -
first_name,last_name,name- Fake names -
phone_number- Fake phone numbers -
address,street_address- Fake addresses -
city,state,zip_code,postal_code- Fake location data -
date_of_birth- Fake date of birth preserving approximate age (±2 years, minimum age 18 to preserve adulthood)
Financial & Identity:
-
ssn- Fake social security numbers -
credit_card_number- Fake credit card numbers -
iban- Fake Greek IBAN (27 characters: GR + check digits + bank code + account number, auto-truncates to column size) -
greek_vat- Fake Greek VAT number (AFM - 9 digits, adjusts to column size) -
greek_amka- Fake Greek AMKA/Social Security Number (11 digits: DDMMYY + 5 digits, adjusts to column size) -
greek_personal_number- Fake Greek Personal Number (12 characters: 2 digits + letter + 9-digit AFM, e.g., "12A123456789", adjusts to column size) -
greek_ada- Fake Greek ADA/Diavgeia Decision Number (15 characters: 4 Greek letters + 2 digits + 4 Greek letters + dash + 1 digit + 2 Greek letters, e.g., "ΨΜΦΡ69ΟΤΝΡ-9ΤΟ", adjusts to column size) -
greek_adam- Fake Greek ADAM/Public Procurement Publicity identifier (14-15 characters: 2 digits + PROC or REQ + 9 digits, e.g., "24REQ187755230" or "23PROC456789012", adjusts to column size) -
identity_number- Fake identity number (alphanumeric format, adjusts to column size)
Security:
-
password- Fake passwords (auto-truncates to column size) -
token- Random alphanumeric strings (defaults to 32 characters, adjusts to column size) -
api_key- Random alphanumeric strings (defaults to 40 characters, adjusts to column size) -
secret- Random alphanumeric strings (defaults to 64 characters, adjusts to column size)
Custom Field Mapping
Add your own field mappings to anonymize custom columns:
anonymize_fields:
# Global mappings
employee_id: ssn
mobile: phone_number
home_address: address
work_email: email
tax_id: ssn
bank_account: iban
tin: greek_vat
social_insurance: greek_amka
citizen_id: greek_personal_number
passport_number: identity_number
diavgeia_decision: greek_ada
procurement_number: greek_adam
birth_date: date_of_birth
# Scoped examples for duplicate fields
users.status: skip # Don't anonymize status in users
orders.status: skip # Don't anonymize status in orders
primary.employees.department: skip # Department in primary.employees
other.staff.department: skip # Department in other.staff
# Skip anonymization for non-sensitive fields
company_name: skip
department: skip
job_title: skipImportant Notes
- Anonymization happens during harvest, not during load
- Generated seed files contain anonymized data
- Original production data is never modified
- Safe to commit anonymized seed files to version control
- Lookup tables are not anonymized (reference data)
- Anonymization can be disabled per-harvest using
generate_rawtask - Respects database constraints: Fake values are automatically truncated to match column size limits
- Type-aware: String fields respect their maximum length, numeric fields maintain their data type
-
Selective anonymization: Use
skipto preserve real values for specific fields while anonymizing others -
Scoped configuration: When the same field name appears in multiple tables, use
table.fieldordatabase.table.fieldnotation for table-specific or database-specific anonymization
Best Practices
-
Use Limits: Start with
rake grainery:generate(100 records) -
Review Load Order: Check
db/grainery/load_order.txt -
Test Loading: Run
rake grainery:loadon clean database first -
Commit Selectively: Consider
.gitignorefor large grainery files -
Custom Seeds Last: Keep application-specific logic in
db/seeds.rb
Troubleshooting
Circular Dependencies
If you see "Circular dependency detected", check for:
- Self-referential associations
- Circular foreign keys
Solution: Temporarily remove optional: true or foreign_key: false
Missing Records
If records fail to load:
- Check
load_order.txtfor correct ordering - Verify foreign key constraints
- Review error messages in console output
Large Files
If seed files are too large:
# Use limit parameter
rake grainery:generate # 100 records per table (default)Example Workflow
# 1. Initialize on first use
rake grainery:init_config
# 2. Harvest from production (with VPN/SSH tunnel)
RAILS_ENV=production rake grainery:generate
# 3. Review generated files
ls -la db/grainery/
# 4. Commit grainery files (optional)
git add db/grainery/
git commit -m "Add production seed data"
# 5. On another machine, pull and load
git pull
rake db:reset
rake grainery:load
# 6. Your custom seeds run automatically last
# db/seeds.rb is executed after all harvested seedsContributing
Bug reports and pull requests are welcome on GitHub at https://github.com/mpantel/grainery.
License
The gem is available as open source under the terms of the MIT License.