Active Record ADBC adapter

Description

Active Record ADBC adapter provides an ADBC adapter for Active Record.

This adapter is optimized for extracting and loading large data from/to DBs. The optimization is powered by Apache Arrow.

Install

`Gemfile`

Add rubygems-requirements-system plugin and activerecord-adbc-adapter gem to your Gemfile:

plugin "rubygems-requirements-system"

gem "activerecord-adbc-adapter"

This adapter requires ADBC libraries implemented in C++. The rubygems-requirements-system plugin installs it into your system automatically.

GitHub Actions

On GitHub Actions, you should not use the bundler-cache: true option. Because it only caches activerecord-adbc-adapter gem. It doesn't cache ADBC libraries installed into your system. You need to run bundle install after actions/setup-ruby something like the following:

- uses: ruby/setup-ruby@v1
  with:
    ruby-version: ruby
    # bundler-cache: true # No "bundler-cache: true"
- run: bundle install

ADBC driver

You also need to install ADBC drivers for systems you want to connect. For example, you need the ADBC driver for PostgreSQL when you want to connect to PostgreSQL.

See https://arrow.apache.org/adbc/current/driver/installation.html for the latest information.

Here is information for some drivers:

PostgreSQL:

# Debian/Ubuntu
sudo apt install libadbc-driver-postgresql-dev

# Red Hat Enterprise Linux variants
sudo dnf install adbc-driver-postgresql-devel

SQLite:

# Debian/Ubuntu
sudo apt install libadbc-driver-sqlite-dev

# Red Hat Enterprise Linux variants
sudo dnf install adbc-driver-sqlite-devel

DuckDB: You don't need to install additional packages for ADBC support. libduckdb.so includes ADBC support.

Usage

This adapter is optimized for extracting and loading large data from/to DBs. You should use the built-in adapters for normal CRUD use cases. You should use this adapter only for large data.

You can use the Active Record's multiple databases support feature.

`config/database.yml`

Here is a sample config/database.yml. The primary configuration is for the built-in PostgreSQL adapter. The adbc configuration is for this adapter.

default: &default
  primary:
    adapter: postgresql
    encoding: unicode
    pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>
    url: <%= ENV.fetch("DATABASE_URL") { "postgresql:///my_app_#{Rails.env}" } %>
  adbc:
    adapter: adbc
    driver: adbc_driver_postgresql
    # "uri" not "url"!!!
    uri: <%= ENV.fetch("DATABASE_URL") { "postgresql:///my_app_#{Rails.env}" } %>
    # You need this to avoid migration.
    database_tasks: false
development:
  <<: *default
test:
  <<: *default
production:
  <<: *default

Note

Configuration parameters are different for each ADBC driver. For example, the PostgreSQL driver and [the SQLite driver(https://arrow.apache.org/adbc/current/driver/sqlite.html) use uri for connection information but the DuckDB driver uses entrypoint and path:

adbc:
  adapter: adbc
  driver: duckdb
  entrypoint: duckdb_adbc_init # You should not change this.
  path: <%= ENV.fetch("DATABASE_PATH") { Rails.root.join("db", "#{Rails.env}.duckdb") } %>
  # You need this to avoid migration.
  database_tasks: false

See the driver's documentation for available parameters.

Model

You need to create an abstract class for the ADBC connection:

# app/models/adbc_application_record.rb
class AdbcApplicationRecord < ActiveRecord::Base
  include ActiveRecordADBCAdapter::Ingest

  self.abstract_class = true
  connects_to database: { writing: :adbc, reading: :adbc }
end

You can create a model for the ADBC connection:

# app/models/adbc_event.rb
class AdbcEvent < AdbcApplicationRecord
  self.table_name = :events
end

You can use Event (that is derived from ApplicationRecord not AdbcApplicationRecord) model for normal use case and AdbcEvent for large data use case. For the large data use case, you should use Apache Arrow data as-is as much as possible for performance. You can get Apache Arrow data by #to_arrow:

AdbcEvent.all.to_arrow

You can use .ingest to load Apache Arrow data:

AdbcEvent.ingest(arrow_data)

You can copy large data efficiently by these method. Here is an sample code to copy events in the latest 30 days to DuckDB from PostgreSQL:

DuckDBAdbcEvent.ingest(PgAdbcEvent.where(created_at: ..(1.month.ago)).to_arrow)

Performance

Here is a benchmark result with 100 integer columns and 10000 rows load/dump with PostgreSQL. This adapter is faster than Active Record's built-in PostgreSQL adapter and raw SQL (INSERT INTO ... and SELECT * FROM ...).

Load:

                   user     system      total        real
SQL            0.003738   0.003141   0.006879 (  0.350767)
Active Record  2.423670   0.047003   2.470673 (  2.844029)
ADBC           0.010229   0.010960   0.021189 (  0.075787)

Dump:

                   user     system      total        real
SQL            0.110934   0.006219   0.117153 (  0.123201)
Active Record  0.271154   0.008948   0.280102 (  0.283621)
ADBC           0.026966   0.006945   0.033911 (  0.067801)

See benchmark/ for details.

License

The MIT license. See LICENSE.txt for details.