QpdfRuby
Patch & polish PDFs so that PAC 2024 finally turns green.
QpdfRuby is a very small Ruby wrapper around the battle‑tested QPDF >= 12 C++ library. Right now the library focuses on only three specialised tasks that are needed when PDFs are printed from Chromium‑based browsers and subsequently audited with the PAC 2024 accessibility checker:
- Export the structure tree as XML – handy for debugging.
-
Mark vector path objects as
/Artifact
so that decorative lines, boxes, &c. are ignored by assistive technologies. -
Add missing
/BBox
entries to every/Figure
element (derived from the page’s graphic operators) so that screen readers know the physical extent of each image.
Together these tweaks eliminate the most common complaints PAC 2024 has about browser‑generated PDFs.
Features in Detail
Feature | Ruby API |
---|---|
Dump structure tree as XML | doc.show_structure |
Mark path objects ( re … S/s/f/F/B/b ) |
doc.mark_paths_as_artifacts |
Ensure /Figure elements have a layout BBox¹ |
doc.ensure_bbox |
¹Internally the gem parses each page’s content stream, maps image
/MCID
s to their transformation matrix, computes the bounding box
(courtesy of a little linear algebra) and finally writes the result into
the structure tree.
Installation
Requirements
- Ruby >= 3.1
- QPDF >= 12.0.0 (headers & libs)
macOS
brew install qpdf
bundle config set --local build.qpdf_ruby "--with-qpdf-dir=$(brew --prefix qpdf)"
Debian/Ubuntu
# on Debian 11/Ubuntu 20.04 you may need newer packages from testing
sudo apt-get update && sudo apt-get install -y libqpdf-dev qpdf
If apt
cannot provide QPDF ≥ 12 you can compile it yourself or pull the
package from testing/unstable – see the Dockerfile for a working
apt preferences
snippet.
Add the gem
bundle add qpdf_ruby
# …or without bundler:
# gem install qpdf_ruby -- --with-qpdf-include=/usr/local/include/qpdf --with-qpdf-lib=/usr/local/lib
Quick Start
require "qpdf_ruby"
pdf = QpdfRuby::Document.new("input.pdf")
# 1. tag decorative paths
pdf.mark_paths_as_artifacts
# 2. add BBox to every <Figure>
pdf.ensure_bbox
# 3. introspect structure tree (optional)
File.write("structure.xml", pdf.show_structure)
# 4. save 🎉
pdf.write("fixed.pdf")
Run PAC 2024 on fixed.pdf
– it should report far fewer (or zero!)
errors compared to the original browser output.
Development
git clone https://github.com/dieter-medium/qpdf_ruby.git
cd qpdf_ruby
bin/setup # install gem + test deps
autotest # guard & RSpec
- Bump version.rb →
bundle exec rake release
to push a new gem.
Testing with local QPDF builds
If you tinker with QPDF itself, point Bundler to your custom prefix:
bundle config set --local build.qpdf_ruby "--with-qpdf-include=$HOME/opt/qpdf/include --with-qpdf-lib=$HOME/opt/qpdf/lib"
Roadmap
TBD
Contributing
Bug reports & pull requests are welcome at https://github.com/dieter-medium/qpdf_ruby.
Code Style
- C++ 17, clang‑format enforced
- Ruby 3.2, rubocop default rules
License
MIT – see LICENSE.txt
for full
text.