Project
pikuri-pdf
pikuri-pdf plugs PDF → text extraction into pikuri-core's
+Pikuri::Extractor+ registry. The bundled +Pikuri::Extractors::PDF+
extractor wraps the pure-Ruby pdf-reader gem and extracts lazily:
paged reads (the +read+ tool's windows) parse only the pages the
window needs, so the first page of a 500-page PDF never pays for
the other 499.
Shipped separately from pikuri-core so the core's dependency tree
stays minimal and auditable: pdf-reader and its transitive deps
(Ascii85, afm, hashery, ruby-rc4, ttfunk) ride along only for hosts
that opt into PDF support.
Registration is explicit — +Pikuri::Extractors::PDF.register+ — so
requiring the gem changes nothing by itself; the host script picks
which extractors it wires in. One registration extends the +read+
tool, +web_scrape+, and the pikuri-vectordb indexer simultaneously.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
Development
Licenses
MIT
Dependencies
Runtime
~> 2.15
= 0.0.6