Safra
A Ruby Gem to extract data from apis with mininal configuration.
This is currently in a working, but messy state. Watch the Adding Tests to a Messy Ruby Library episode of the Code with Jason Meetup for a quick introduction, demo, and Jason graciously pairing with me to figure out where we could start adding tests.
If you somehow found this gem and want to try it out, I can pair with you to walk you through setting everything up and runing your first tap. Just send me an email to the address in the gemspec and we'll schedule it.
Installation
Add this to your Gemfile:
gem "safra"Create the requests model:
rails generate safra:install
rails db:migrateTo get newer changes from the main branch run bundle update safra
Usage
Create a tap:
rails generate safra:tap example
# => create app/extractors/example_tap.rb
# => create app/sql/example.sqlTaps inherit the initilizer and perform methods from Safra::Tap. To run our example tap, we can simply call:
ExampleTap.new.performThis will download all posts from jsonplaceholder as requests. To clean up and analize the response contents, check out dbt
How it works
The perform method takes no arguments and just runs until reached_end? returns true for any response received. It also handles retries when responses don't pass validation. Here's a flowchart with the specifcs. Please open a pr if you can untangle this mermaid chart.
flowchart TD
A["ExampleTap.new.perform"] --> B["@current_value = first_value || 1"]
B --> C["request_for(@current_value)"]
C --> D{"validate(response)"}
D --true--> E(save request)
E --> L{"reached_end?(res)"}
L --false--> J["@current_value = next_value"]
L --true--> M[END]
J --> C
D --false or error--> H{"reached_end?(res)"}
H --false--> F{retries count > MAX_RETRIES}
F --true--> G[raise 'Maximum number of retries reached']
H --true----> I(END)
F --false--> C
Working with an array parameter
When the value needed to build requests comes from a list (like ids or product codes), you can pass an array to the initializer as the first argument as in ProductsTap.new(skus).perform. This changes the behavior of the perform function so that instead of calling resquest_for with values counting up from 1, it passes each item of the array one by one. This also makes defining a reached_end? method unnecessary, perform ends when it has successfully saved a request for each array item.
Handling multiple array items per request
Define a method like request_for_20_values to better utilize APIs that suport retrieving multiple items per request, then build the request using the array received:
def request_for_20_values values
Typhoeus.get "imaginaryapi.com/products?ids={values.join ','}"
endWorking with a custom parameter
The first argument of the tap's initializer is saved to @parameter and can be any type.
Authentication
The named parameter auth: is saved to instance variable @auth:
ExampleTap.new(auth: {seller_id: "23899283", access_token: "jf943u0923nd"}).performdef request_for value
Typhoeus.get "imaginaryapi.com/sellers/#{@auth['seller_id']}/products?access_token=#{@auth['access_token']}"
endOptions
Define these constants to configure retry behavior.
Defaults
MAX_RETRIES = 4
ON_MAX_RETRIES = :failON_MAX_RETRIES accepts three possible options:
-
:failDefault. Raises 'Maximum number of retries reached' -
:save_to_errorsSaves the last invalid request to the requests table with the extractor_class column set to ExampleTap_errors -
:skip_silentlyDon't use this one.
Instance variables
-
@parameterThe postional first argument to.new. InExampleTap.new(Date.yesterday)@parameter would beDate.yesterday. This is not used by the Safra::Tap class. -
@authThe named argumentauth:to.new. InExampleTap.new(auth: {api_key: '328490284209'})@auth would be'328490284209'. This is not used by the Safra::Tap class. -
@last_responseIs set automaticaly to the last valid response received fromrequest_for.