Project

safra

0.0
No release in over 3 years
Extract data from APIs with mininal configuration
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
 Dependencies
 Project Readme

Gem Version

Note

This gem is being renamed to Safra so it can be published in rubygems.

Name change

  1. Replace gem "extractor", github: "felipedmesquita/extractor" with gem "safra"
  2. Done

Extractor

A Ruby Gem to extract data from apis with mininal configuration.

This is currently in a working, but messy state. Watch the Adding Tests to a Messy Ruby Library episode of the Code with Jason Meetup for a quick introduction, demo, and Jason graciously pairing with me to figure out where we could start adding tests.

If you somehow found this gem and want to try it out, I can pair with you to walk you through setting everything up and runing your first tap. Just send me an email to the address in the gemspec and we'll schedule it.

Installation

Add this to your Gemfile:

gem "extractor", github: "felipedmesquita/extractor"

Create the requests model:

rails generate extractor:install
rails db:migrate

To get newer changes from the main branch run bundle update extractor

Usage

Create a tap:

rails generate extractor:tap example
# => create  app/extractors/example_tap.rb
# => create  app/sql/example.sql

Taps inherit the initilizer and perform methods from Extractor::Tap. To run our example tap, we can simply call:

ExampleTap.new.perform

This will download all posts from jsonplaceholder as requests. To clean up and analize the response contents, check out dbt

How it works

The perform method takes no arguments and just runs until reached_end? returns true for any response received. It also handles retries when responses don't pass validation. Here's a flowchart with the specifcs. Please open a pr if you can untangle this mermaid chart.

flowchart TD
    A["ExampleTap.new.perform"] --> B["@current_value = first_value || 1"]
    B --> C["request_for(@current_value)"]
    C --> D{"validate(response)"}
    D --true--> E(save request)
    E --> L{"reached_end?(res)"}
    L --false--> J["@current_value = next_value"]
    L --true--> M[END]
    J --> C
    D --false or error--> H{"reached_end?(res)"}
    H --false--> F{retries count > MAX_RETRIES}
    F --true--> G[raise 'Maximum number of retries reached']
    H --true----> I(END)
    F --false--> C
Loading

Working with an array parameter

When the value needed to build requests comes from a list (like ids or product codes), you can pass an array to the initializer as the first argument as in ProductsTap.new(skus).perform. This changes the behavior of the perform function so that instead of calling resquest_for with values counting up from 1, it passes each item of the array one by one. This also makes defining a reached_end? method unnecessary, perform ends when it has successfully saved a request for each array item.

Handling multiple array items per request

Define a method like request_for_20_values to better utilize APIs that suport retrieving multiple items per request, then build the request using the array received:

def request_for_20_values values
  Typhoeus.get "imaginaryapi.com/products?ids={values.join ','}"
end

Working with a custom parameter

The first argument of the tap's initializer is saved to @parameter and can be any type.

Authentication

The named parameter auth: is saved to instance variable @auth:

ExampleTap.new(auth: {seller_id: "23899283", access_token: "jf943u0923nd"}).perform
def request_for value
  Typhoeus.get "imaginaryapi.com/sellers/#{@auth['seller_id']}/products?access_token=#{@auth['access_token']}"
end

Options

Define these constants to configure retry behavior.

Defaults

MAX_RETRIES = 4
ON_MAX_RETRIES = :fail

ON_MAX_RETRIES accepts three possible options:

  • :fail Default. Raises 'Maximum number of retries reached'
  • :save_to_errors Saves the last invalid request to the requests table with the extractor_class column set to ExampleTap_errors
  • :skip_silently Don't use this one.

Instance variables

  • @parameter The postional first argument to .new. In ExampleTap.new(Date.yesterday) @parameter would be Date.yesterday. This is not used by the Extractor::Tap class.
  • @auth The named argument auth: to .new. In ExampleTap.new(auth: {api_key: '328490284209'}) @auth would be '328490284209'. This is not used by the Extractor::Tap class.
  • @last_response Is set automaticaly to the last valid response received from request_for.