PdfScanner
PdfScanner is a Ruby gem for scanning PDF files for potentially dangerous or unwanted features, such as JavaScript, embedded files, forms, and more. It uses configurable security policies to analyze PDF files and can quarantine files that violate your policies.
Features
- Scans PDF files for scripts, attachments, forms, and other risky features
- Configurable security policies via YAML
- Supports encrypted PDFs (with password)
- Can quarantine files that violate policies
- Extensible and easy to integrate
Installation
Add this line to your application's Gemfile:
gem 'pdf_scanner'And then execute:
bundle installOr install it yourself as:
gem install pdf_scannerUsage
Basic Example
require 'pdf_scanner'
scanner = PdfScanner::Scanner.new(
target_file: '/path/to/file.pdf',
config_file: '/path/to/pdfcop.conf.yml', # optional, uses default if omitted
policy: 'standard', # optional, uses 'standard' if omitted
dir: '/path/to/quarantine', # optional, for quarantining
passwd: 'password' # optional, for encrypted PDFs
)
result = scanner.scan
puts result.inspectParameters
-
target_file(required): Path to the PDF file to scan. -
config_file(optional): Path to the YAML config file with security policies. -
policy(optional): Policy name to use (default:standard). -
dir(optional): Directory to move/quarantine files that violate policies. -
passwd(optional): Password for encrypted PDFs.
Return Value
The scan method returns a hash with two keys:
-
:rejected_policies— Array of hashes with:policyand:messagefor each rejected policy. -
:analysis_failure— Array of hashes with:errorand:messagefor analysis failures.
Example:
{
rejected_policies: [
{ policy: "standard", message: "[:allowJS]" }
],
analysis_failure: []
}Configuration
Policies are defined in a YAML file (see lib/pdf_scanner/config/pdfcop.conf.yml for an example). Each policy is a set of boolean flags controlling which PDF features are allowed.
Example policy section:
POLICY_STANDARD:
allowParserErrors: false
allowAttachments: false
allowEncryption: false
allowJS: false
allowAcroForms: false
# ... more options ...Command-Line Usage
You can also use the provided bin/console for interactive testing:
bin/consoleDevelopment
After checking out the repo, run:
bin/setupTo install dependencies. You can also run:
bin/consoleFor an interactive prompt.
To install this gem onto your local machine:
bundle exec rake installTo release a new version, update the version number in version.rb, then run:
bundle exec rake releaseContributing
Bug reports and pull requests are welcome on GitHub at https://github.com/shekhar-patil/pdf_scanner. Please adhere to the code of conduct.
License
The gem is available as open source under the terms of the MIT License.