PiiCipher
A Rails gem that enables searchable blind indexing for PII fields — powered by a Rust extension for performance.
PiiCipher handles the search layer of encrypted PII. It is designed to sit alongside Rails' built-in ActiveRecord::Encryption (encrypts :email), which handles the actual column encryption. Together they give you full GDPR-compliant storage: the real value never touches the database as plaintext, and searching still works.
PiiCipher computes HMAC-SHA256 hashes of the plaintext value before it is encrypted, and stores those hashes in a separate column. Queries are rewritten to search the hashes — the ciphertext column is never scanned.
Two search modes are supported:
| Mode | Column type | Use case |
|---|---|---|
| Partial (default) |
jsonb array |
LIKE-style substring searches (e.g. searching "smi" matches "Smith") |
| Exact | string |
Exact-match lookups (e.g. looking up a full SSN or email) |
How it works
Partial search — trigram blind indexing
For partial search, PiiCipher slides a window across the plaintext and HMAC-SHA256s each n-gram using your secret key. The window size defaults to 3 (trigrams) and is configurable per attribute with gram_size::
"smith" → ["smi", "mit", "ith"] → [hmac("smi"), hmac("mit"), hmac("ith")]
By default values are downcased before hashing, so search is case-insensitive ("smi" matches "Smith"). Set case_sensitive: true to opt out.
These hashes are stored in a jsonb array column. Querying with where(email: "mit") generates the same hashes for the search term and uses a PostgreSQL @> (contains) check — no plaintext ever touches the database.
Partial search is approximate: @> matches when the stored array contains all of the search term's n-gram hashes, which is occasionally satisfied by values that don't actually contain the term as a contiguous substring. Treat it like a fast candidate filter; if you need exact substring semantics, re-filter the returned (decrypted) records in Ruby.
Exact search — single blind index
For exact match, a single HMAC-SHA256 of the full value is stored in a regular string column. Querying generates the same hash and does a standard equality check.
Both hash functions live in a Rust extension (magnus bindings + the hmac and sha2 crates) and are called transparently from Ruby.
Column encryption (the full picture)
PiiCipher only generates the blind indexes — it does not encrypt the column itself. Column encryption is handled by Rails AR Encryption (encrypts). The two work at different layers and do not interfere:
user.save
├─ before_save (pii_cipher) → reads plaintext → writes hashes to email_bidx_array
└─ DB write (Rails AR Enc.) → encrypts plaintext → writes ciphertext to email column
Because Rails AR Encryption works at the DB serialization layer (not a callback), self.email always returns plaintext during before_save — pii_cipher always hashes the real value, never the ciphertext.
Requirements
- Ruby >= 3.1
- Rails / ActiveRecord >= 7.1 (Active Record Encryption ships in Rails 7.0+)
- PostgreSQL (partial search relies on the
jsonb@>operator) - Rust toolchain (only needed when building the gem from source)
Installation
Add to your Gemfile:
gem "pii_cipher"Then run:
bundle installSetup
1. Generate Rails AR Encryption keys
Run this once to generate the three keys Rails AR Encryption needs:
bin/rails db:encryption:initCopy the output into your credentials file:
bin/rails credentials:editactive_record_encryption:
primary_key: <generated>
deterministic_key: <generated>
key_derivation_salt: <generated>These keys encrypt and decrypt the column values. Keep them in your secrets manager — losing them means losing access to your data.
2. Set the PiiCipher secret key
PiiCipher reads the HMAC key from the PII_SECRET_KEY environment variable. Add it to your environment (e.g. via credentials, dotenv, or your secrets manager):
PII_SECRET_KEY=your-long-random-secret-hereGenerate a secure random value with:
rails secretChanging this key will invalidate all existing blind indexes.
3. Add blind index columns
For each encrypted attribute, add the corresponding blind index column in a migration.
Partial search (default — stores trigram hashes in a jsonb array):
class AddEmailBidxToUsers < ActiveRecord::Migration[8.1]
def change
add_column :users, :email_bidx_array, :jsonb
add_index :users, :email_bidx_array, using: :gin
end
endExact search (stores a single hash string):
class AddSsnBidxToUsers < ActiveRecord::Migration[8.1]
def change
add_column :users, :ssn_bidx, :string
add_index :users, :ssn_bidx
end
endThe GIN index on jsonb columns is strongly recommended for performance on partial searches.
4. Declare encrypted attributes in your model
Declare encrypts (Rails AR Encryption) first, then use_pii_cipher. Both must be present for full GDPR-compliant searchable encryption.
class User < ApplicationRecord
encrypts :email # Rails: stores ciphertext in DB, decrypts on read
use_pii_cipher :email # pii_cipher: generates trigram blind indexes from plaintext
encrypts :ssn
use_pii_cipher :ssn, partial: false # exact-match blind index
endMultiple attributes can be passed to use_pii_cipher in a single call:
encrypts :email, :phone_number
use_pii_cipher :email, :phone_numberUsage
Saving records
No changes to your existing create/update code. Everything happens automatically:
User.create!(email: "alice@example.com", ssn: "123-45-6789")What happens under the hood:
-
before_save(pii_cipher) reads"alice@example.com"as plaintext, generates trigram hashes, writes them toemail_bidx_array - Rails AR Encryption encrypts
"alice@example.com"and writes ciphertext to theemailcolumn
What's in the database vs what Ruby sees
user = User.find(1)
# Ruby — always decrypted transparently by Rails
user.email
# => "alice@example.com"
# Raw database row — email column holds ciphertext, blind index holds hashes
# email => {"p":"Wd5LybiwJGPHYI...","h":{"iv":"XJul...","at":"Pk..."}}
# email_bidx_array => ["a3f2c1...", "9b4e7d...", ...]Nobody with direct database access can read the email. The blind index is just opaque hashes — it reveals nothing about the original value without the PII_SECRET_KEY.
Querying
Pass the plaintext value to where exactly as you normally would — PiiCipher intercepts encrypted columns and rewrites the query to search the blind index:
# Partial search — finds any user whose email contains "alice"
User.where(email: "alice")
# Exact search — finds the user with that exact SSN
User.where(ssn: "123-45-6789")
# Mix encrypted and plain columns freely
User.where(email: "alice", status: "active")The found records have their emails decrypted by Rails on the way out — callers always receive plaintext. The interceptor only rewrites keys declared with use_pii_cipher; all other where calls pass through to ActiveRecord unchanged.
Performance
Benchmarked on a local machine against PostgreSQL 18 with 100,000 rows. The comparison baseline is a plain (unencrypted) column with a standard index — the closest real-world alternative for each search type.
Writes
| Time (100k rows) | |
|---|---|
| Plain insert | 1,221 ms |
| Encrypted insert | 2,861 ms (+134%) |
The overhead is not from the Rust hashing — that runs in microseconds. It comes from writing significantly more data per row: each record gains a jsonb array of 64-character HMAC hex strings (one per trigram) and a 64-character blind index string. Both the larger rows and the GIN index maintenance during insert contribute to the slower writes.
Reads
| Query type | Plain | Encrypted | Difference |
|---|---|---|---|
| Exact match (B-tree) | 0.121 ms | 0.095 ms | ~within noise |
| Partial match (GIN) | 1.515 ms | 1.865 ms | +23% |
Exact match is effectively identical. Both paths hit a B-tree index; the lookup cost is the same regardless of what the key looks like.
Partial match is ~23% slower. The GIN index sizes end up comparable (see below), but PostgreSQL has to parse the jsonb array and evaluate the @> containment operator on each probe, which adds a small constant overhead that pg_trgm's native GIN operator doesn't pay.
Storage
| Table total | Email index | Name GIN index | |
|---|---|---|---|
| Plain | 21 MB | 5 MB | 7.2 MB |
| Encrypted | 89 MB | 12 MB | 7.0 MB |
The table is 4.2× larger. Every stored trigram hash is 64 characters regardless of what the original value looked like — a 5-character name still produces 3 trigrams × 64 chars = 192 bytes of blind index data. At large scale, this is the dominant cost to plan for.
The email B-tree index is 2.4× larger for the same reason (64-char hash vs ~25-char email). The name GIN index sizes are nearly identical — HMAC hashes repeat across rows the same way plain trigrams do (same input + same key = same hash), so the GIN posting lists compress similarly.
What this means in practice
- Reads are fast. Sub-millisecond exact lookups and ~2ms partial searches hold up well even at this row count.
- Writes cost more. If your workload is write-heavy on PII fields, budget for the extra insert time.
- Storage is the main tradeoff. Plan for roughly 4× the table and index footprint compared to an equivalent unencrypted schema.
You can reproduce these results yourself:
ruby -I lib benchmarks/run.rbConfiguration reference
use_pii_cipher(*attributes, partial: true, gram_size: 3, case_sensitive: false)
| Option | Type | Default | Description |
|---|---|---|---|
partial |
Boolean | true |
true → n-gram array in column_bidx_array; false → single hash in column_bidx
|
gram_size |
Integer | 3 |
Sliding-window size for partial search. Ignored when partial: false. Changing it invalidates existing indexes. |
case_sensitive |
Boolean | false |
false downcases values before hashing (case-insensitive search). Must match between stored index and queries; changing it invalidates existing indexes. |
Limitations & gotchas
-
Query rewriting covers hash-form
where.Model.where(email: "x"), scopes, and chained relations (Model.active.where(email: "x")) are all rewritten. Conditions that don't go throughwhere(hash)are not rewritten — includingwhere.not(...), raw string/array conditions (where("email = ?", x)),.or(...)branches, andfind_bywith string SQL. For those, build the blind index yourself withPiiCipher.generate_ngram_hashes/generate_blind_index. - Partial search is approximate and may over-match (see "How it works"). Re-filter in Ruby if you need exact substring semantics.
-
Search terms shorter than
gram_sizeare hashed whole and only match values that were themselves shorter thangram_size. Prefer search terms at leastgram_sizecharacters long. -
PostgreSQL only for partial search — it uses the
jsonb@>containment operator. -
Key/option changes invalidate indexes. Changing
PII_SECRET_KEY,gram_size, orcase_sensitivemeans existing blind indexes no longer match; you must re-save affected records to regenerate them.
Development
After checking out the repo, run bin/setup to install dependencies (this also compiles the Rust extension). Then run the test suite:
bundle exec rake specThe Ruby specs include a PostgreSQL-backed integration suite (it builds a temporary table and exercises real @> queries). Set the standard PG* env vars to point at a database, or skip those examples with bundle exec rspec --tag ~integration. The Rust extension also has its own unit tests, runnable from ext/pii_cipher with cargo test.
To open an interactive console with the gem loaded:
bin/consoleTo build and install the gem locally:
bundle exec rake installContributing
Bug reports and pull requests are welcome on GitHub at https://github.com/selvachezhian/pii_cipher. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the code of conduct.
License
The gem is available as open source under the terms of the MIT License.