Project: pikuri-pdf - The Ruby Toolbox

Project

pikuri-pdf

0.0

The project is in a healthy, maintained state

pikuri-pdf Homepage Documentation Source Code Bug Tracker

pikuri-pdf plugs PDF → text extraction into pikuri-core's +Pikuri::Extractor+ registry. The bundled +Pikuri::Extractors::PDF+ extractor wraps the pure-Ruby pdf-reader gem and extracts lazily: paged reads (the +read+ tool's windows) parse only the pages the window needs, so the first page of a 500-page PDF never pays for the other 499. Shipped separately from pikuri-core so the core's dependency tree stays minimal and auditable: pdf-reader and its transitive deps (Ascii85, afm, hashery, ruby-rc4, ttfunk) ride along only for hosts that opt into PDF support. Registration is explicit — +Pikuri::Extractors::PDF.register+ — so requiring the gem changes nothing by itself; the host script picks which extractors it wires in. One registration extends the +read+ tool, +web_scrape+, and the pikuri-vectordb indexer simultaneously.

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016