No commit activity in last 3 years
No release in over 3 years
Get your data from Teradata AND GET TO THE CHOPPER!
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
 Dependencies

Runtime

jdbc-teradata
>= 0
 Project Readme

Teradata-extractor

Get your data from Teradata AND GET OUTTA THERE!

A beautifully thin wrapper around the jdbc-teradata driver that encapsulates the ugly java bits and gives you back a nice ruby enumerable thing. Because you want to get out of Java Territory as soon as you can.

The JDBC::Teradata adapter helps to make connecting and querying Teradata pretty easy, but dealing with the results is still very Java-centric, as it returns a java.sql.ResultSet object (http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html). Rather than dealing with the awkward parsing and use of metadata, we will just give you an enumerable hash array, or a CSV string.

JRuby only, dawg

Since connecting to Teradata from MRI ruby is not really a thing yet, this gem wraps jdbc-teradata, which of course only runs on JRuby.

usage

#Gemfile
gem 'teradata-extractor'
bundle install
extractor = TeradataExtractor::Query.new("server_name", "user", "password")

> #ruby Enumerator
> enum = extractor.enumerable("select Top 2 name, id, email_address, favorite_liquor from td.people_stuff")
=> #<Enumerator: #<JRuby::Generator::Threaded:...>
> enum.to_a
=> [{:name => "Steve", :id => 111, :email_address => "thestevemitchell@gmail.com", :favorite_liquor => "ALL"},
{:name => "Jerry", :id => 231, :email_address => "Jerry@jerrinson.com", :favorite_liquor => "none"}]
> #You get the idea...it's a ruby Enumberable

> #ruby String in CSV format
> headers, rows = extractor.csv_string_io("select Top 2 name, id, email_address, favorite_liquor from td.people_stuff")
=> [[:name, :id, :email_address, :favorite_liquor],
 <Enumerator: #<JRuby::Generator::Threaded:...>]
> rows.class
=> Enumerator
> rows.next
=> "Steve,111,thestevemitchell@gmail.com,ALL\nJerry,231,Jerry@jerrinson.com,none\n"
> #Next returns MORE THAN ONE ROW in CSV format.  See note on fetch_size  

Note on fetch_size

Both #enumerator and #csv_string_io have an optional second parameter, "fetch_size". When calling #enumerator, fetch_size is purely a performance concern. The enumerator returned will still yeild only 1 row when iterated using enum.next. Fetch size is an instruction to the Teradata resultSet object that tells it how many results it should fetch from the database at a time.

When calling #csv_string_io, fetch_size is significant. For convenience, #csv_string_io bundles rows into groups. So each call to rows.next will yeild a StringIO representing 1000 rows by default. If you like you can pass fetch_size of 1 to get a single row at a time. But if you're using something like https://github.com/theSteveMitchell/postgres_upsert, getting rows in a group is much more efficient, and convenient. You can just...

extractor = TeradataExtractor::Query.new(server_name, user_name, password)
headers, enum = extractor.csv_string_io("select name, id, email_address, favorite_liquor from td.people_stuff")
enum.each do |csv_stringio|
  Person.pg_upsert(csv_stringio, {header: false, columns: headers})
end

To-do's

  • support more robust conversion from java sql datatypes to ruby objects. Currently only Date and BigDecimal are handled, other data types like String and Integer are done implicitly.

Note on Patches/Pull Requests

  • Fork the project
  • add your feature/fix to your fork(specs please)
  • submit a PR
  • Lay back and bask in the karma you've earned.
  • If you find an issue but can't fix in in a PR, please log an issue. I'll do my best.