Project

pdscan_rb

0.11
There's a lot of open issues
No release in over a year
Ruby gem wrapper around PDSCAN library. This only acts as a simple method to call pdscan library included in the gem. Only for the linux.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies
 Project Readme

pdscan

Scan your data stores for unencrypted personal data (PII)

  • Last names (US)
  • Email addresses
  • IP addresses (IPv4)
  • Street addresses (US)
  • Phone numbers
  • Credit card numbers
  • Social Security numbers (US)
  • Dates of birth
  • Location data
  • OAuth tokens
  • MAC addresses

Uses data sampling and naming, and works with compressed files

💥 Zero runtime dependencies and minimal database load

Build Status

Installation

Download the latest version:

You can also install it with Homebrew or Docker.

Data Stores

  • Elasticsearch
  • Files
  • MariaDB
  • MongoDB
  • MySQL
  • OpenSearch
  • Postgres
  • Redis
  • S3
  • SQLite
  • SQL Server

Elasticsearch

pdscan elasticsearch+http://user:pass@host:9200

For HTTPS, use elasticsearch+https://.

You can also specify indices.

pdscan elasticsearch+http://user:pass@host:9200/index1,index2

Wildcards are also supported.

pdscan "elasticsearch+http://user:pass@host:9200/index*"

Files

pdscan file://path/to/file.txt

You can also specify a directory.

pdscan file://path/to/directory

For absolute paths, use file:///.

pdscan file:///absolute/path/to/file.txt

For paths relative to your home directory on Mac and Linux, use:

pdscan file://$HOME/file.txt

MariaDB

pdscan mariadb://user:pass@host:3306/dbname

MongoDB

pdscan mongodb://user:pass@host:27017/dbname

MySQL

pdscan mysql://user:pass@host:3306/dbname

OpenSearch

pdscan opensearch+http://user:pass@host:9200

For HTTPS, use opensearch+https://.

You can also specify indices.

pdscan opensearch+http://user:pass@host:9200/index1,index2

Wildcards are also supported.

pdscan "opensearch+http://user:pass@host:9200/index*"

Postgres

pdscan postgres://user:pass@host:5432/dbname

Always make sure your connection is secure when connecting to a database over a network you don’t fully trust. Your best option is to connect over SSH or a VPN. Another option is to use sslmode=verify-full. If you don’t do this, your database credentials can be compromised.

If your connection doesn’t use SSL, append to the URI:

?sslmode=disable

For best sampling, enable the tsm_system_rows extension (ships with Postgres 9.5+).

CREATE EXTENSION tsm_system_rows;

Redis

pdscan redis://user:pass@host:6379/db

S3

pdscan s3://bucket/path/to/file.txt

Requires s3:GetObject permission

You can also specify a prefix by ending with a /.

pdscan s3://bucket/path/to/directory/

Requires s3:ListBucket and s3:GetObject permissions

SQLite

pdscan sqlite://path/to/dbname.sqlite3

Not available with prebuilt binaries

SQL Server

pdscan "sqlserver://user:pass@host:1433?database=dbname"

Options

Show the data found

pdscan --show-data

Show low confidence matches

pdscan --show-all

Change the sample size

pdscan --sample-size 50000

Specify the number of processes to use (defaults to 1)

pdscan --processes 4

Scan for only certain types of data

pdscan --only email,phone,location

Scan for all except certain types of data

pdscan --except ip,mac

Specify the minimum number of rows/documents/lines for a match (experimental)

pdscan --min-count 10

Specify a custom pattern (experimental)

pdscan --pattern "\d{16}"

Output newline delimited JSON (experimental)

pdscan --format ndjson

Additional Installation Methods

Homebrew

With Homebrew, you can use:

brew install ankane/brew/pdscan

Docker

Get the Docker image with:

docker pull ankane/pdscan

And run it with:

docker run -ti ankane/pdscan <connection-uri>

For data stores on the host machine, use host.docker.internal as the hostname

docker run -ti ankane/pdscan "postgres://user@host.docker.internal:5432/dbname?sslmode=disable"

On Linux, this requires Docker 20.04+ and --add-host=host.docker.internal:host-gateway

For files on the host machine, use:

docker run -ti -v /path/to/files:/data ankane/pdscan file:///data

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone https://github.com/ankane/pdscan.git
cd pdscan
make test