SQLiteSweep
Query millions of SQLite databases across remote hosts via SSH, aggregating results in real-time. Designed for multi-tenant apps where each tenant has their own SQLite database.
Installation
gem install sqlitesweepOr add to your Gemfile:
gem "sqlitesweep"Usage
sqlitesweep \
-q 'SELECT count(*) FROM products WHERE price > 0' \
-a sum \
-s 'rails runner "Account.active.find_each { |a| puts a.db_uri }"' \
-c 16The -s (source) command should output one database URI per line. URIs can be:
- Local paths:
/data/tenants/acme.sqlite3 - File URIs:
file:///data/tenants/acme.sqlite3 - SSH URIs:
ssh://deploy@web1.example.com/data/tenants/acme.sqlite3
Results go to stdout, progress goes to stderr — so it's pipe-friendly:
sqlitesweep -q "SELECT count(*) FROM users" -a sum -s "cat db_uris.txt" > result.txtActions
| Action | Description |
|---|---|
sum |
Sum the first column across all databases (default) |
average / avg
|
Average the first column across all databases |
list |
Write all rows to a JSONL file, print the file path |
Options
| Flag | Long | Default | Description |
|---|---|---|---|
-q |
--query |
(required) | SQL query to execute on each database |
-a |
--action |
sum |
sum, average/avg, or list
|
-s |
--source |
(required) | Shell command that outputs database URIs |
-c |
--concurrency |
8 |
Max parallel query workers |
--max-ssh |
50 |
Max SSH master connections | |
--no-live |
false |
Disable live progress display | |
--batch-size |
4 |
Databases to query per SSH call | |
--ssh-timeout |
10 |
SSH connect timeout (seconds) | |
--query-timeout |
30 |
Per-query timeout (seconds) |
Architecture
┌──────────────────────────────────────────────────────────────────────────────┐
│ sqlitesweep │
└──────────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────┐
│ Source Command (-s) │ Any shell command that outputs database
│ │ URIs to stdout, one per line.
│ rails runner "..." │ e.g. Rails runner, cat, curl, script...
└─────────────┬───────────────┘
│
│ streams URIs line by line
▼
┌─────────────────────────────┐
│ URI Router │ Parses each URI and routes it:
│ │
│ /local/path.sqlite3 ─────────▶ direct to Worker Pool
│ ssh://user@host/path ────────▶ to Host Batcher
└─────────────────────────────┘
│
┌────────┴────────┐
▼ ▼
┌──────────┐ ┌──────────────────────────────────────────────────────┐
│ Local │ │ Host Batcher │
│ Query │ │ │
│ │ │ Groups URIs by host. Flushes when a batch reaches │
│ Opens │ │ --batch-size (default 4) or after 200ms timeout. │
│ SQLite │ │ │
│ directly│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ via gem │ │ │ @host-1 │ │ @host-2 │ │ @host-3 │ ... │
│ │ │ │ db1,db2 │ │ db4 │ │ db5,db6 │ │
│ │ │ │ db3 │ │ │ │ db7,db8 │ │
│ │ │ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
└────┬─────┘ └───────┼─────────────┼─────────────┼────────────────┘
│ │ │ │
│ ▼ ▼ ▼
│ ┌─────────────────────────────────────────────────┐
│ │ SSH Connection Manager │
│ │ │
│ │ One ControlMaster per host (reused across all │
│ │ queries to that host). Multiplexed via Unix │
│ │ socket — no repeated TCP/auth handshakes. │
│ │ │
│ │ Semaphore caps total masters at --max-ssh. │
│ │ BatchMode=yes — no password prompts. │
│ └──────────────────────┬──────────────────────────┘
│ │
│ ▼
│ ┌─────────────────────────────────────────────────┐
│ │ Remote Query (one SSH call per batch) │
│ │ │
│ │ ssh user@host ' │
│ │ printf "%s\t" /path/db1; │
│ │ sqlite3 -json /path/db1 "SELECT ..."; echo; │
│ │ printf "%s\t" /path/db2; │
│ │ sqlite3 -json /path/db2 "SELECT ..."; echo; │
│ │ ' │
│ │ │
│ │ Multiple databases queried in a single SSH │
│ │ round-trip. Results parsed from tab-delimited │
│ │ output back into individual results. │
│ └──────────────────────┬──────────────────────────┘
│ │
▼ ▼
┌──────────────────────────────────────────────────────────────────────┐
│ Worker Pool (-c, default 8 threads) │
│ │
│ Fixed thread pool runs queries in parallel. Back-pressure via │
│ caller-runs policy: when all threads are busy, the submitting │
│ thread executes the work itself instead of queuing unboundedly. │
└───────────────────────────────┬──────────────────────────────────────┘
│
│ results (thread-safe)
▼
┌──────────────────────────────────────────────────────────────────────┐
│ Aggregator │
│ │
│ Mutex-protected accumulator. Mode depends on --action: │
│ │
│ sum Running total of first column value from each database. │
│ average Running total / number of databases queried. │
│ list Streams rows to a JSONL temp file (bounded memory). │
└───────────────────────────────┬──────────────────────────────────────┘
│
┌─────────────────┴─────────────────┐
▼ ▼
┌──────────────────────────┐ ┌──────────────────────────┐
│ Live Display (stderr) │ │ Final Result (stdout) │
│ │ │ │
│ Queried: 8421 │ │ sum/average ▶ number │
│ Errors: 2 │ │ list ▶ /tmp/file.jsonl │
│ Rate: 1204/s │ │ │
│ Elapsed: 7.0s │ │ Pipe-friendly: stderr │
│ Result: 421052 │ │ for progress, stdout │
│ │ │ for the answer. │
└──────────────────────────┘ └──────────────────────────┘
Requirements
- Ruby >= 4.0
-
sqlite3available on remote hosts (for SSH queries) - SSH agent or key-based auth configured (BatchMode is enforced — no password prompts)
Development
bundle install
bundle exec rake testRun the integration test:
bundle exec ruby test/integration/test_local_sweep.rbWatch the live progress display with a slow-drip demo:
ruby test/integration/harness/live_demo.rb
ruby test/integration/harness/live_demo.rb --count 50 --delay 0.5
ruby test/integration/harness/live_demo.rb --action listRun the benchmark:
ruby test/integration/harness/sweep_bench.rb --db-count 1000Docker-based SSH testing
docker compose -f test/integration/harness/docker-compose.yml up -d
ruby test/integration/harness/sweep_bench.rb --docker --db-count 500License
MIT
