Project

dwh

0.0
No release in over 3 years
Provides a unified interface across data warehouses to connect, execute, and introspect. This is not an ORM but a fast integrationg solution. It is quite easy to add new database adapters. Supports popular cloud warehouses too.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
 Dependencies

Runtime

~> 3.3.5
~> 2.10.1
>= 0
 Project Readme

Ruby

DWH - Data Warehouse Adapter Library

A light weight library to connect, introspect, and query popular databases over a unified interface. This gem is intended for analtyical workloads. The library also provides database specific translations for common functions like date_trunc, date_add etc. The function tranlation is not comprehensive. But, it does provides good coverage for date handling, and some array handling as well.

Note

This is not an ORM nor will it cast types to ruby unless the underlying client does it out of the box. The goal here is to create an Architecture where new databases can be onboarded quickly.

Why do we need another database abstraction layer?

Libraries like Sequel are amazing and comprehensive. However, its broad coverage also makes it more laborious to add new databases. Especially, ones with only HTTP endpoints for Ruby. We seem to be having an explosion of databases recently and a light weight interface will allow us to integrate faster.

The adapter only has 5 core methods (6 including the connection method). A YAML settings controls how it interacts with a particular db. It is relatively fast to add a new db. See the Druid implementation as an example. And here is its corresponding YAML settings file.

Features

  • Unified Interface: Connect to multiple database types using the same API
  • SQL Function Translation: Automatically translates common SQL functions to database-specific syntax
  • Connection Pooling: Built-in connection pool management for high-performance applications
  • Rich Metadata: Extract table schemas, column information, and statistics

Supported Databases

  • Snowflake - High performance cloud warehouse
  • Trino (formerly Presto) - Distributed SQL query engine
  • AWS Athena - AWS big data warehouse
  • Apache Druid - Real-time analytics database
  • DuckDB - In-process analytical database
  • PostgreSQL - Full-featured RDBMS with advanced SQL support
  • MySQL - Popular open-source database
  • SQL Server - Microsoft's enterprise database

Integrations Coming Soon

  • Redshift - AWS data warehouse platform
  • ClickHouse - High performance analytical db
  • Databricks - Big data compute engine
  • MotherDuck - Hosted DuckDB service

Quick Start

Install it yourself as:

gem install dwh

Connect and Execute a Basic Query

require 'dwh'

# Connect to Druid
druid = DWH.create(:druid, {
  host: 'localhost',
  port: 8080,
  protocol: 'http'
})

# basic query execution
results = druid.execute("SELECT * FROM web_sales", format: :csv)

Core API

Standardized API across adapters:

connection
Creates a reusuable connection based on config hash passed in
tables(schema: nil, catalog: nil)
returns a list of tables from the default connection or from the specified schema and catalog
metadata(table_name, schema: nil, catalog: nil)
provides metadata about a table
stats(table_name, date_column: nil)
provides table row count and date range
execute(sql, format: :array, retries: 0)
runs a query and returns in given format
execute_stream(sql, io, stats: nil)
runs a query and streams it as csv into the given io

Tutorials and Guides

Testing

Certain databases have to be tested via docker. Those tests will try to launch docker compose services in test/support/compose*.yml

Run Unit Tests:

bundle exec rake test:unit

Run tests on RDBMS dbs:

bundle exec rake test:system:rdbms 

Run tests on druid:

bundle exec rake test:system:druid 

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake test to run the tests. You can also run bin/console for an interactive prompt.

To install this gem onto your local machine, run bundle exec rake install.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/stratasite/dwh.

License

This project is available as open source under the terms of the MIT License.

Version

Current version: 0.1.0