Overview
We provide some utilities for validating data in a PostgreSQL data table. These utilities can be used as simple terminal commands and can be installed by:
gem install idata
idata comes along with the following commands:
- iload
- ivalidate
- ipatch
- ieval
- iexport
- imerge
- isanitize
Run a command with --help switch for the details
Prequisites:
- PostgreSQL 9.0 or above
- Ruby 2.0 or above
- An auto ID field is required for data table to be validated using ivalidate
Usage
Suppose we have an items table, and we want to validate its records against certain criteria like:
-
vendor_codemust not be null -
vendor_namemust not be null - The composite
[vendor_code, vendor_name]must be unique - One
vendor_codecorresponds to only ONEvendor_name(in other words, there must not be two items with differentvendor_namebut with the samevendor_code) and vice-versa -
vendor_codemust reference thecodecolumn in thevendorstable
Then the validation command could be:
ivalidate --host=localhost --user=postgres --database=mydb --table=items
--log-to=validation_errors \
--not-null="vendor_code" \
--not-null="vendor_name" \
--unique="vendor_code,vendor_name" \
--consistent-by="vendor_code|vendor_name" \
--consistent-by="vendor_name|vendor_code" \
--cross-reference="vendor_code|vendors.code"
Validation results for every single record are logged to an additional column named validation_errors
of the items table, as specified by the --log-to switch. As you can see, most common checks can be performed using the supported switches:
--not-null
--unique
--consistent-by
--cross-reference
Custom Validation
For more customized checks, we support some other switches.
The --match="field/pattern/" switch tells the program to check if value of a field matches the provided pattern (which is a regular expression).
For example:
# Check if item_id is a number:
ivalidate --match="item_id/[0-9]+/"
# Check if value of status is either 'A' or 'I' (any other value is not allowed)
ivalidate --match="status/^(A|I)$/"
In case you need even more customized validation other than the supported ones (match, unique, not-null, cross-reference...)
then the --query switch may come in handy. For example:
ivalidate --query="start_date >= string_to_date('01/02/2014') -- invalid date"
You can also use --rquery which is the reversed counterpart of --query
For example, the following two checks are equivalent, mark any record whose start_date < '01/02/2014' as "invalid date":
ivalidate --query="start_date >= string_to_date('01/02/2014') -- invalid date"
ivalidate --rquery="start_date < string_to_date('01/02/2014') -- invalid date"
Note: run ivalidate --help to see the full list of supported switches
Put it all together
You can put several ivalidate commands (for several data tables) in one single bash/sh file.
Besides ivalidate, we also support some other utilities to:
- Load data from text files to SQL tables
- Modify data tables
- Generate summary reports
See our sample.sh for a comprehensive example