0.0
No release in over 3 years
Execute SQL statements against Databricks SQL Warehouse with synchronous and asynchronous polling support, inline results, and external link downloads.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
 Dependencies

Runtime

>= 3.3
 Project Readme

databricks_sql

Ruby gem for the Databricks SQL Statements API with support for:

  • Personal Access Token (PAT) authentication
  • synchronous and asynchronous execution (polling)
  • format: JSON_ARRAY
  • disposition: INLINE
  • disposition: EXTERNAL_LINK with automatic file download and parsing
  • HTTP and SQL execution error handling

Installation

Add to your Gemfile:

gem "databricks_sql"

Or install directly:

gem install databricks_sql

Global Configuration (Recommended)

Configure connection settings once and reuse them across your application:

require "databricks_sql"

Databricks.configure do |config|
	config.host = "https://adb-1234567890123456.7.azuredatabricks.net"
	config.token = ENV.fetch("DATABRICKS_TOKEN")
	config.warehouse_id = ENV.fetch("DATABRICKS_WAREHOUSE_ID")
	config.timeout = 30
	config.open_timeout = 10
	config.external_link_require_https = true
	config.external_link_allowed_hosts = ["files.example.com", "s3.amazonaws.com"]
end

Security notes:

  • The API host must use HTTPS.
  • EXTERNAL_LINK URLs are HTTPS-only by default.
  • If external_link_allowed_hosts is set, downloads are allowed only from those domains.

Then initialize your client without passing credentials again:

client = DatabricksSql::Client.new

You can also configure through DatabricksSql.configure:

DatabricksSql.configure do |config|
	config.host = "https://adb-1234567890123456.7.azuredatabricks.net"
	config.token = ENV.fetch("DATABRICKS_TOKEN")
	config.warehouse_id = ENV.fetch("DATABRICKS_WAREHOUSE_ID")
end

If needed, override per client instance:

client = DatabricksSql::Client.new(
	host: "https://adb-1234567890123456.7.azuredatabricks.net",
	token: ENV.fetch("DATABRICKS_TOKEN"),
	warehouse_id: ENV.fetch("DATABRICKS_WAREHOUSE_ID")
)

Synchronous Usage

execute_statement submits the query and waits for a terminal status (SUCCEEDED, FAILED, CANCELED, or CLOSED).

result = client.execute_statement(
	statement: "SELECT id, name FROM analytics.users LIMIT 5",
	format: "JSON_ARRAY",
	disposition: "INLINE"
)

puts result.status
puts result.columns.inspect
puts result.rows.inspect

SQL Context (catalog/schema)

result = client.execute_statement(
	statement: "SELECT current_catalog(), current_schema()",
	catalog: "main",
	schema: "analytics"
)

Type Mapping with column_schema

column_schema allows optional per-column coercion.

result = client.execute_statement(
	statement: "SELECT id, is_active, created_at FROM analytics.users LIMIT 2",
	column_schema: {
		"id" => :integer,
		"is_active" => :boolean,
		"created_at" => :datetime
	}
)

result.rows.each do |row|
	puts [row["id"].class, row["is_active"].class, row["created_at"].class].inspect
end

Asynchronous Usage (Polling)

1) Submit without blocking

submission = client.execute_statement_async(
	statement: "SELECT * FROM large_table",
	format: "JSON_ARRAY",
	disposition: "EXTERNAL_LINK",
	wait_timeout: "10s",
	on_wait_timeout: "CONTINUE"
)

statement_id = submission.fetch("statement_id")
puts "Statement ID: #{statement_id}"

2) Manual polling

loop do
	state = client.get_statement(statement_id: statement_id)
	puts "Current status: #{state["status"]}"
	break if %w[SUCCEEDED FAILED CANCELED CLOSED].include?(state["status"])
	sleep 1
end

3) Automatic polling with global timeout

result = client.wait_for_statement(
	statement_id: statement_id,
	disposition: "EXTERNAL_LINK",
	poll_interval: 1.0,
	max_wait: 120,
	cancel_on_timeout: true
)

puts result.rows.size

INLINE vs EXTERNAL_LINK

  • INLINE returns results directly in the API payload.
  • EXTERNAL_LINK extracts the download URL, downloads the file, and returns parsed content.

In EXTERNAL_LINK mode, JSON and CSV are parsed automatically.

Error Handling

Main error classes:

  • DatabricksSql::AuthenticationError (401)
  • DatabricksSql::AuthorizationError (403)
  • DatabricksSql::NotFoundError (404)
  • DatabricksSql::RateLimitError (429)
  • DatabricksSql::ServerError (5xx)
  • DatabricksSql::TimeoutError
  • DatabricksSql::ConnectionError
  • DatabricksSql::ExecutionError (logical SQL execution failure)
  • DatabricksSql::ParseError

Example:

begin
	result = client.execute_statement(statement: "SELECT * FROM missing_table")
	p result.rows
rescue DatabricksSql::ExecutionError => e
	warn "SQL execution failed: #{e.message}"
rescue DatabricksSql::HTTPError => e
	warn "HTTP error #{e.status_code}: #{e.message}"
rescue DatabricksSql::Error => e
	warn "DatabricksSql error: #{e.message}"
end

Development

bin/setup
bundle exec rubocop
bundle exec rspec

Install locally:

bundle exec rake install

License

MIT. See LICENSE.txt.