Repository is archived
No commit activity in last 3 years
No release in over 3 years
DSL tools for building programs that can process S3 object key-by-key using threaded worker pool
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

~> 1.0
~> 1.8.7
~> 3.12
~> 2.8.0

Runtime

~> 1.3
~> 3.0
 Project Readme

s3-object-processor

This library tries to help in development of CLI programs that can process all objects (or from list file) stored in given S3 bucket.

It is using multi-threaded worker model to allow for processing parallelism.

Example usage

This example program can be used to call one of two HTTP endpoints for each object that matches a regexp and some other criteria. In addition total handled objects size is counted and time spent on getting S3 object data and spend on API call is measured and reported at the end of the run.

require 's3-object-processor/cli'
require 'httpclient'

S3ObjectProcessor::CLI.new do
	cli do
		option :endpoint,
			description: 'API endpoint URI for JPEG uploads',
			default: '/iss/v2/pictures'
		option :endpoint_as_is,
			description: 'API endpoint URI for as-is uploads',
			default: '/iss/v2/images'
		switch :as_is,
			description: 'upload images without conversion to JPEG'
		option :httpimagestore,
			description: 'URL to HTTP Image Store',
			default: 'http://localhost:3000'
	end

	cli_process do |settings|
	end

	report :input_object_size, 0 do
		report "total input object size [KiB]", "%d" do |value|
			(value.to_f / 1024).round
		end
	end
	report :s3_body_get_time, 0.0 do
		report "total S3 get body time [s]", "%.3f"
	end
	report :httpimagestore_time, 0.0 do
		report "total ISS request time [s]", "%.3f"
	end

	processor do |bucket, key, settings, log, reporter|
		unless key.to_s =~ %r{(^|.*?/)([0-f]{16})(|/.*)\.(.{3,4})$}
			log.warn "skipping bad format: #{key}"
			reporter.report :skipped_key, key
			next
		end

		dir = $1
		hash = $2
		name = $3
		extension = $4

		if name =~ /-(search|original|search_thumb|brochure|brochure_thumb|admin|admin_thumb|treatment_thumb|staff_member_thumb|consultation|clinic_google_map_thumb)$/
			log.debug "skipping not original: #{key}"
			reporter.report :skipped_key, key
			next
		end

		log.debug "processing original dir: '#{dir}' hash: '#{hash}' name: '#{name}' extension: '#{extension}'"

		data = nil
		reporter.time :s3_body_get_time do
			data = key.data
		end
		fail "no data for key; key not found?!" unless data
		reporter.report :input_object_size, data.length

		if settings.noop
			reporter.report :noop_key, key
			next
		end

		reporter.time :httpimagestore_time do
			if settings.as_is
				response = HTTPClient.put(settings.httpimagestore + settings.endpoint_as_is + "/#{hash}.#{extension}", data)
			else
				response = HTTPClient.put(settings.httpimagestore + settings.endpoint + "/#{hash}.jpg", data)
			end
			fail "bad HTTP Image Store response: #{response.status}: #{response.body}" if response.status != 200
		end
		reporter.report :handled_key, key
	end
end

Contributing to s3-object-processor

  • Check out the latest master to make sure the feature hasn't been implemented or the bug hasn't been fixed yet.
  • Check out the issue tracker to make sure someone already hasn't requested it and/or contributed it.
  • Fork the project.
  • Start a feature/bugfix branch.
  • Commit and push until you are happy with your contribution.
  • Make sure to add tests for it. This is important so I don't break it in a future version unintentionally.
  • Please try not to mess with the Rakefile, version, or history. If you want to have your own version, or is otherwise necessary, that is fine, but please isolate to its own commit so I can cherry-pick around it.

Copyright

Copyright (c) 2013 Jakub Pastuszek. See LICENSE.txt for further details.