Deidentify
Deidentify is a gem design to allow for easy removal of sensitive data.
It defines a DSL that will allow you to choose which fields should be deidentified. It will then replace the specified database columns with a varity of deidentified values.
Installation
Add this line to your application's Gemfile:
gem 'deidentify'And then execute
$ bundle install
Or install if yourself as:
$ gem install deidentify
Usage
Include the deidentify module into you chosen model and add the deidentification DSL.
class Person < ApplicationRecord
include Deidentify
deidentify :name, method: :replace, new_value: "deidentified"
deidentify :age, method: :delete
endThen simply call
person = Person.find(id)
person.deidentify!This will deidentify the person according to your configuration.
Recursive Deidentification
This gem allows you to deidentify all data associated with a single object(mostly likely a single user). It does this by traversing associations to propagate the deidentify call.
class Person < ApplicationRecord
include Deidentify
belongs_to :organisation
has_many :projects
deidentify :name, method: :replace, new_value: "deidentified"
deidentify_associations :organisation, :projects
endThen calling
person = Person.find(id)
person.deidentify!will deidentify the person, the organisation they belong to and their projects. It will use the deidentification configuration defined in each class to determine which fields to change.
Callbacks
You can specify callbacks for the deidentify method.
class Person < ApplicationRecord
include Deidentify
deidentify :name, method: :replace, new_value: "deidentified"
before_deidentify do
delete_file_from_external_store
send_deletion_request_to_third_party
end
endDeidentified At
This gem will record if a record has been deidentified by using a deidentified_at timestamp. If a record has this timestamp then it will be set when the record is deidentified. This will not break if there is no deidentified_at on your record.
Deidentification Methods
Delete
This will delete the value in the field and replace it with nil.
deidentify :email, method: :deleteReplace
This will replace the value with the provided value.
deidentify :age, method: :replace, new_value: -1There is a keep nil option that will determine if nils are replaced. By default this is set to true which means nil will not be replaced with the new_value. Setting this to false will mean that nil will be replaced with the new_value.
deidentify :age, method: :replace, new_value: -1, keep_nil: falseHash
This will replace a string with a hashed version
deidentify :name, method: :hashThere is a length option that will set the length of the hash.
deidentify :name, method: :hash, length: 20NOTE: This uses the SHA256 algorithm to hash. Truncating the length of this shouldn't reduce the security of the hashed value but it will increase the chance of collisions.
Hash Email
This will replace an email with a hashed version. This will hash the name and domain seperately creating a value of the format hash@hash.
deidentify :email, method: :hash_emailThere is a length option that will set the maximum length of the hashed email. NOTE: this can produce emails shorter than the length provided.
deidentify :name, method: :hash_email, length: 20NOTE: This also uses SHA256(see hash).
Hash Url
This will replace a url with a hashed version. This will hash the host, path, query and fragment strings seperately creating a value of the format https://host/path?query#fragment.
deidentify :url, method: :hash_urlThere is a length option that will set the maximum length of the hashed url. NOTE: this can produce urls shorter than the length provided.
deidentify :url, method: :hash_url, length: 20NOTE: This also uses SHA256(see hash).
Delocalize IP
This will replace an IP address with its network address turning the last bits to 0s depending on the network mask (by default 24 bits for IPv4 and 48 bits for IPv6).
deidentify :ip, method: :delocalize_ipThe length of the mask can be provided as parameter
deidentify :ip, method: :delocalize_ip, mask_length: 16Lambda
You can pass a custom lambda as the deidentification method.
deidentify :email, method: -> (person) { "deidentified@#{person.email.split("@").last}" }Keep
You can opt to leave a value untouched.
deidentify :age, method: :keepNOTE: You get the same behaviour by simply not specifing a deidentification method for a field.
Keep is designed so that it is possible to mark a field as not containing sensitive data. That makes it obvious which fields have been purposely not changed and which have been missed during development.
Secret Configuration
For the hashing deidenitification methods you can configure this gem to take a secret which will be used to salt the hashed values.
Do this by creating this file config/initializers/deidentify.rb
Deidentify.configure do |config|
config.salt = # Your secret value
endScope Configuration
It's possible to pass a scope into the configuration.
Deidentify.configure do |config|
config.scope = ->(klass_or_association) { klass_or_association.where(deidentified_at: nil) }
endThis scope will limit what records will be deidentified.
So in this example it will not deidentify records that have already been marked as deidentified.
Generator
This gem comes with a generator that will generate a deidentification policy module for a model. By calling
$ rails generate deidentify:configure_for Person
you will generate a module in app/concerns/deidentify/ which will contain all columns of that model.
module Deidentify::PersonPolicy
extend ActiveSupport::Concern
include Deidentify
included do
deidentify :name, method: :keep
deidentify :age, method: :keep
end
endNOTE: This will always default to keep, you will need to update to other methods manually.
It will also include this module in the model directly after the class declaration.
class Person < ApplicationRecord
include Deidentify::PersonPolicy
...
endNamespaces
This generator will also work with namespaces.
$ rails generate deidentify::configure_for Billing::Payment
This will generate the module in app/concerns/deidentify/billing/
module Deidentify::Billing::PaymentPolicy
...
endAnd will add the module in the correct class
class Billing::Payment < ApplicationRecord
include Deidentify::Billing::PaymentPolicy
...
endSpecifing the file path
You can specify a file path if your path doesn't match your namespace.
For example if you have a model Payment which is found in app/models/billing/payment.rb
$ rails generate deidentify::configure_for Payment --file_path billing/payment
NOTE: the path provided must be the portion after models
This will generate a module at app/concerns/deidentify/billing/
module Deidentify::Billing::PaymentPolicy
...
endAnd will add the module into the model found at the path specified
class Payment < ApplicationRecord
include Deidentify::Billing::PaymentPolicy
...
endContributing
Contributions are very welcome.
Please raise any problems you find as issues or create a pull request with a fix. Raise any new features as pull requests.
When contributing code please make sure that:
- The PR contains a detailed description of the feature or issue
- It is well tested
- All tests pass
- Rubocop reports no new warnings
License
This gem is available as open source under the terms of the MIT License.