0.01
No commit activity in last 3 years
No release in over 3 years
document converter and plain text extractor
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
 Dependencies

Development

>= 0
>= 0

Runtime

 Project Readme

Proselytism

Document converter, text and image extractor using OpenOffice headless server (JOD or PYOD converter), pdf_tools and net_pbm

Handled formats for document conversion : odt, doc, rtf, sxw, docx, txt, html, htm, wps, pdf

Note

This gem has been originally written as a RoR 3.2 engine running on Ruby 1.8.7.

It is framework agnostic and has been tested on Ubuntu and MacOSX.

Installation

Install the required external librairies :

# aptitude install netpbm
# aptitude install xpdf
# aptitude install libreoffice

Add this line to your application's Gemfile:

gem 'proselytism'

Note : for ruby 1.9 use the branch 1.9

gem 'proselytism', :git => "git://github.com/itkin/proselytism.git", :branch => "1.9"

And then execute:

$ bundle

##Configuration

  • With a YAML config file:
rails g proselytism:config

As a Rails engine, Proselytism automatically load /config/proselytism.yml (if the file exists) and set its config params depending on the current rails env.

  • With an initializer (optional for Rails App) :

You can override the configuration file params by adding a custom initializer to /config/initializers . By default Proselytism will log in a separate log file, if you want to use the rails logger

#/config/initializers/proselytism.rb
Proselytism.config do |config|
  config.logger = Rails.logger
end

To generate a full config initializer:

rails g proselytism:initializer

Usage

Proselytism.convert source_file_path, :to => :pdf do |converted_file_path|

end
Proselytism.extract_text source_file_path do |extracted_text|

end
Proselytism.extract_images source_file_path do |image_files_paths|

end

Proselytism creates its converted files in temporary folders.

  • If you pass a block to the method above the folders are automatically deleted after the block is yield, so use or copy the file content within the block
  • If you don't pass a block, the mentioned folder and its content remains permanently, so don't forget to safely remove it yourself
pdf_file_path = Proselytism.convert source_file_path, :to => :pdf
#my code
FileUtils.remove_entry_secure File.dirname(pdf_file_path)

Add your own converters

Add your own converter by extending Proselytism::Converters::Base

  • Your converter will be automatically selected and used related to the params given to the :from and :to methods
  • Add a perform method which
    • calls the execute method with your custom command
    • returns the converted file(s) path(s)

Proselytism::Converters::Base takes care of

  • raising error (if the command execution fail)
  • logging the command output
class MyConverter < Proselytism::Converters::Base
  class Error < parent::Base::Error; end
  
  form :ext1, :ext2
  to :ext3, :ext4

  def perform(origin, options={})
    destination = destination_file_path(origin, options)
    command = "mycommand #{origin} #{destination} 2>&1"
    execute command
    destination
  end
end

Contributing

  1. Fork it
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request