No release in over a year
A Rails generator to create dbt sources, models, and seeds from ActiveRecord schema and enums.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
 Dependencies

Runtime

~> 2.6
~> 3.0
 Project Readme

ActiveRecord::Dbt

Gem Version Maintainability Ruby CI

ActiveRecord::Dbt generates dbt files from the information of the database connected via ActiveRecord.

Currently, it can generate:

  • Sources: YAML files
  • Staging models: SQL and YAML files
  • Seeds: CSV and YAML files (based on ActiveRecord enum values)

Warning

dbt Version Compatibility

Generated files are optimized for dbt v1.10+. For older versions, the following manual changes are required:

dbt Version Required Changes
v1.7 or below Rename data_tests: to tests: (reference)
v1.9 or below Move properties from under config: to the top level (reference)

Table of Contents

  • Installation
  • Usage
    • Configuration
    • Generate dbt Source File
    • Generate dbt Staging Files
    • Generate dbt Seed Files
  • Contributing
  • License

Installation

To install activerecord-dbt, add this line to your application's Gemfile:

gem 'activerecord-dbt'

Since it is only used in the development environment, it is recommended to add it to the development group:

group :development do
  gem 'activerecord-dbt'
end

Then run:

$ bundle

Alternatively, you can install it manually by running:

$ gem install activerecord-dbt

Usage

Configuration

ActiveRecord::Dbt Configuration

Create an initializer file for dbt:

$ bin/rails generate active_record:dbt:initializer

This will generate the config/initializers/dbt.rb file.

Configuration Description
config_directory_path The path to the directory where configuration files are stored. The default is lib/dbt.
export_directory_path The path to the directory where files generated by bin/rails generate active_record:dbt:* are stored. The default is doc/dbt.
dwh_platform Specify the data warehouse platform to which dbt connects. The default is bigquery.
data_sync_delayed Indicates whether there is a data delay. If set to true, severity: warn is applied to the relationships test. The default is false.
logger The destination for log output. The default is Logger.new('./log/active_record_dbt.log').
used_dbt_package_names An array of dbt package names to use.
locale I18n locale. The default is I18n.locale.

List of platforms that can currently be set with dwh_platform.

Data Warehouse Platform Link
bigquery BigQuery enterprise data warehouse - Google Cloud
postgres PostgreSQL: The world's most advanced open source database
redshift Cloud Data Warehouse - Amazon Redshift - AWS
snowflake The Snowflake AI Data Cloud - Mobilize Data, Apps, and AI
spark Apache Spark™ - Unified Engine for large-scale data analytics

List of packages that can currently be set with used_dbt_package_names.

dbt Package Name Link
dbt-labs/dbt-utils dbt-labs/dbt-utils: Utility functions for dbt projects.
datnguye/dbterd datnguye/dbterd: Generate the ERD as a code from dbt artifacts

Example:

Adjust the settings according to your environment.

# frozen_string_literal: true

if Rails.env.development?
  require 'active_record/dbt'

  ActiveRecord::Dbt.configure do |c|
    c.config_directory_path = 'lib/dbt'
    c.export_directory_path = 'doc/dbt'
    c.data_sync_delayed = false
    c.used_dbt_package_names = [
      'dbt-labs/dbt_utils',
      'datnguye/dbterd'
    ]
  end
end

Create Configuration Files

Create configuration files for dbt:

$ bin/rails generate active_record:dbt:config

This will create the following files.

File Description
#{config_directory_path}/source_config.yml Used to generate #{export_directory_path}/models/sources/#{source_name}/src_#{source_name}.yml.
#{config_directory_path}/staging_model.sql.tt Used to generate #{export_directory_path}/models/staging/#{source_name}/stg_#{source_name}__#{table_name}.sql.

Generate dbt Source File

dbt Source Configuration

In the #{config_directory_path}/source_config.yml file, describe the properties you want to set for the source. You can configure sources, table_overrides, defaults, and table_descriptions in this file.

The available properties for sources and table_overrides are detailed in Source properties | dbt Developer Hub.

sources

Set the configurations for sources.

Configuration Description
project_name Written to the beginning of the table or model description.
table_names in exclude Specify which table names you do not want output in sources.

Example:

sources:
  name: dummy
  description: |-
    Write a description of the 'dummy' source.
    You can write multiple lines.
  config:
    meta:
      project_name: dummy_project
      generated_by: activerecord-dbt
      exclude:
        table_names:
          - ar_internal_metadata
          - schema_migrations
table_overrides

Set all properties for tables except for name and description.

Example:

table_overrides:
  users:
    config:
      freshness:
        warn_after:
          count: 3
          period: day
        error_after:
          count: 5
          period: day
      loaded_at_field: created_at
    columns:
      created_at:
        data_tests:
          - not_null:
              where: 'id != 1'
defaults

Set the default value for the description(logical_name, description) of tables.

In the logical_name and description fields of table_descriptions, you can use {{ table_name }} to refer to the table name and {{ project_name }} to refer to the project name. In the description of table_descriptions.columns, you can use {{ table_name }} to refer to the table name and {{ column_name }} to refer to the column name.

Example:

defaults:
  table_descriptions:
    logical_name: Write the logical_name of the '{{ table_name }}' table in '{{ project_name }}'.
    columns:
      description: Write a description of the '{{ table_name }}.{{ column_name }}' column.

If nothing is set, it defaults to the following:

defaults:
  table_descriptions:
    logical_name: Write the logical_name of the '{{ table_name }}' table in '{{ project_name }}'.
    columns:
      description: Write a description of the '{{ table_name }}.{{ column_name }}' column.
table_descriptions

Set the name and description for tables.

Configuration Description
logical_name A title or one-line description to be output in the dbt description.
description A detailed description of logical_name to be output in the dbt description.

Example:

table_descriptions:
  ar_internal_metadata:
    logical_name: Internal Metadata
    description: |-
      By default Rails will store information about your Rails environment and schema
      in an internal table named `ar_internal_metadata`.
    columns:
      key: Key
      value: Value
      created_at: Created At
      updated_at: Updated At
  schema_migrations:
    logical_name: Schema Migrations
    description: |-
      Rails keeps track of which migrations have been committed to the database and
      stores them in a neighboring table in that same database called `schema_migrations`.
    columns:
      version: The version number of the migration.

Adjust the settings according to your environment.

sources:
  name: dummy
  description: |-
    Write a description of the 'dummy' source.
    You can write multiple lines.
  config:
    meta:
      project_name: dummy_project
      generated_by: activerecord-dbt
      exclude:
        table_names:
          - hoges

table_overrides:
  users:
    config:
      freshness:
        warn_after:
          count: 3
          period: day
        error_after:
          count: 5
          period: day
      loaded_at_field: created_at
    columns:
      created_at:
        data_tests:
          - not_null:
              where: 'id != 1'

defaults:
  table_descriptions:
    logical_name: Write the logical_name of the '{{ table_name }}' table in '{{ project_name }}'.
    columns:
      description: Write a description of the '{{ table_name }}.{{ column_name }}' column.
  seed_descriptions:
    enum:
      description: "{{ project_name }} {{ table_logical_name }} {{ column_description }} enum"

table_descriptions:
  ar_internal_metadata:
    logical_name: Internal Metadata
    description: |-
      By default Rails will store information about your Rails environment and schema
      in an internal table named `ar_internal_metadata`.
    columns:
      key: Key
      value: Value
      created_at: Created At
      updated_at: Updated At
  schema_migrations:
    logical_name: Schema Migrations
    description: |-
      Rails keeps track of which migrations have been committed to the database and
      stores them in a neighboring table in that same database called `schema_migrations`.
    columns:
      version: The version number of the migration.

Generate #{export_directory_path}/models/sources/#{source_name}/src_#{source_name}.yml

Generate a source file for dbt:

$ bin/rails generate active_record:dbt:source

Generate #{export_directory_path}/models/sources/#{source_name}/src_#{source_name}.yml.

Example:

Note

The output will be as shown below. It is recommended to indent the YAML file with a tool of your choice.

---
version: 2
sources:
- name: dummy
  description: |-
    Write a description of the 'dummy' source.
    You can write multiple lines.
  config:
    meta:
      project_name: dummy_project
      generated_by: activerecord-dbt
      exclude:
        table_names:
        - hoges
  tables:
  - name: ar_internal_metadata
    description: |-
      # dummy_project Internal Metadata
      By default Rails will store information about your Rails environment and schema
      in an internal table named `ar_internal_metadata`.
    columns:
    - name: key
      description: Key
      data_type: string
      data_tests:
      - unique
      - not_null
    - name: value
      description: Value
      data_type: string
    - name: created_at
      description: Created At
      data_type: datetime
      data_tests:
      - not_null
    - name: updated_at
      description: Updated At
      data_type: datetime
      data_tests:
      - not_null
  - name: companies
    description: Write the logical_name of the 'companies' table in 'dummy_project'.
    columns:
    - name: id
      description: id
      data_type: int64
      data_tests:
      - unique
      - not_null
    - name: name
      description: Write a description of the 'companies.name' column.
      data_type: string
      data_tests:
      - not_null
    - name: establishment_date
      description: Write a description of the 'companies.establishment_date' column.
      data_type: string
    - name: average_age
      description: Write a description of the 'companies.average_age' column.
      data_type: float64
    - name: published
      description: Write a description of the 'companies.published' column.
      data_type: bool
      data_tests:
      - not_null
      - accepted_values:
          arguments:
            values:
            - true
            - false
            quote: false
    - name: created_at
      description: Created At
      data_type: datetime
      data_tests:
      - not_null
    - name: updated_at
      description: Updated At
      data_type: datetime
      data_tests:
      - not_null
  - name: posts
    description: dummy_project Post
    columns:
    - name: id
      description: ID
      data_type: int64
      data_tests:
      - unique
      - not_null
    - name: user_id
      description: User
      data_type: int64
      data_tests:
      - not_null
      - relationships:
          arguments:
            to: source('dummy', 'users')
            field: id
            meta:
              relationship_type: many-to-one
    - name: title
      description: Title
      data_type: string
    - name: content
      description: Content
      data_type: string
    - name: created_at
      description: Post Created At
      data_type: datetime
      data_tests:
      - not_null
    - name: updated_at
      description: Post Updated At
      data_type: datetime
      data_tests:
      - not_null
    - name: status
      description: Status
      data_type: int64
      data_tests:
      - accepted_values:
          arguments:
            values:
            - 0
            - 1
            - 2
            quote: false
  - name: posts_tags
    description: Write the logical_name of the 'posts_tags' table in 'dummy_project'.
    data_tests:
    - dbt_utils.unique_combination_of_columns:
        arguments:
          combination_of_columns:
          - post_id
          - tag_id
    columns:
    - name: post_id
      description: post_id
      data_type: int64
      data_tests:
      - not_null
      - relationships:
          arguments:
            to: source('dummy', 'posts')
            field: id
            meta:
              relationship_type: many-to-one
              active_record_dbt_error:
                class: NameError
                message: |-
                  uninitialized constant PostsTag

                        Object.const_get(camel_cased_word)
                              ^^^^^^^^^^
    - name: tag_id
      description: tag_id
      data_type: int64
      data_tests:
      - not_null
      - relationships:
          arguments:
            to: source('dummy', 'tags')
            field: id
            meta:
              relationship_type: many-to-one
              active_record_dbt_error:
                class: NameError
                message: |-
                  uninitialized constant PostsTag

                        Object.const_get(camel_cased_word)
                              ^^^^^^^^^^
  - name: profiles
    description: Write the logical_name of the 'profiles' table in 'dummy_project'.
    columns:
    - name: id
      description: id
      data_type: int64
      data_tests:
      - unique
      - not_null
    - name: user_id
      description: user_id
      data_type: int64
      data_tests:
      - unique
      - not_null
      - relationships:
          arguments:
            to: source('dummy', 'users')
            field: id
            meta:
              relationship_type: one-to-one
    - name: first_name
      description: Write a description of the 'profiles.first_name' column.
      data_type: string
      data_tests:
      - not_null
    - name: last_name
      description: Write a description of the 'profiles.last_name' column.
      data_type: string
      data_tests:
      - not_null
    - name: created_at
      description: Created At
      data_type: datetime
      data_tests:
      - not_null
    - name: updated_at
      description: Updated At
      data_type: datetime
      data_tests:
      - not_null
  - name: relationships
    description: Write the logical_name of the 'relationships' table in 'dummy_project'.
    data_tests:
    - dbt_utils.unique_combination_of_columns:
        arguments:
          combination_of_columns:
          - follower_id
          - followed_id
    columns:
    - name: id
      description: id
      data_type: int64
      data_tests:
      - unique
      - not_null
    - name: follower_id
      description: follower_id
      data_type: int64
      data_tests:
      - not_null
      - relationships:
          arguments:
            to: source('dummy', 'users')
            field: id
            meta:
              relationship_type: many-to-one
    - name: followed_id
      description: followed_id
      data_type: int64
      data_tests:
      - not_null
      - relationships:
          arguments:
            to: source('dummy', 'users')
            field: id
            meta:
              relationship_type: many-to-one
    - name: created_at
      description: Created At
      data_type: datetime
      data_tests:
      - not_null
    - name: updated_at
      description: Updated At
      data_type: datetime
      data_tests:
      - not_null
  - name: schema_migrations
    description: |-
      # dummy_project Schema Migrations
      Rails keeps track of which migrations have been committed to the database and
      stores them in a neighboring table in that same database called `schema_migrations`.
    columns:
    - name: version
      description: The version number of the migration.
      data_type: string
      data_tests:
      - unique
      - not_null
  - name: tags
    description: Write the logical_name of the 'tags' table in 'dummy_project'.
    columns:
    - name: id
      description: id
      data_type: int64
      data_tests:
      - unique
      - not_null
    - name: name
      description: Write a description of the 'tags.name' column.
      data_type: string
      data_tests:
      - unique
      - not_null
    - name: created_at
      description: Created At
      data_type: datetime
      data_tests:
      - not_null
    - name: updated_at
      description: Updated At
      data_type: datetime
      data_tests:
      - not_null
  - name: user_tags
    description: Write the logical_name of the 'user_tags' table in 'dummy_project'.
    data_tests:
    - dbt_utils.unique_combination_of_columns:
        arguments:
          combination_of_columns:
          - user_id
          - tag_id
    columns:
    - name: id
      description: id
      data_type: int64
      data_tests:
      - unique
      - not_null
    - name: user_id
      description: user_id
      data_type: int64
      data_tests:
      - not_null
      - relationships:
          arguments:
            to: source('dummy', 'users')
            field: id
            meta:
              relationship_type: many-to-one
    - name: tag_id
      description: tag_id
      data_type: int64
      data_tests:
      - not_null
      - relationships:
          arguments:
            to: source('dummy', 'tags')
            field: id
            meta:
              relationship_type: many-to-one
    - name: created_at
      description: Created At
      data_type: datetime
      data_tests:
      - not_null
    - name: updated_at
      description: Updated At
      data_type: datetime
      data_tests:
      - not_null
  - name: users
    description: dummy_project User
    config:
      freshness:
        warn_after:
          count: 3
          period: day
        error_after:
          count: 5
          period: day
      loaded_at_field: created_at
    columns:
    - name: id
      description: ID
      data_type: int64
      data_tests:
      - unique
      - not_null
    - name: created_at
      description: User Created At
      data_type: datetime
      data_tests:
      - not_null:
          where: id != 1
    - name: updated_at
      description: User Updated At
      data_type: datetime
      data_tests:
      - not_null
    - name: company_id
      description: company_id
      data_type: int64
      data_tests:
      - relationships:
          arguments:
            to: source('dummy', 'companies')
            field: id
            meta:
              relationship_type: many-to-one

Generate dbt Staging Files

dbt Staging Configuration

In the #{config_directory_path}/staging_model.sql.tt file, write the SQL template for the staging model you want to create. You can use sql.source_name, sql.table_name, sql.select_column_names, sql.primary_key_eql_id?, and sql.rename_primary_id within this file.

Example:

with

source as (

    select * from {{ source('<%= sql.source_name %>', '<%= sql.table_name %>') }}

),

renamed as (

    select

        <%- sql.select_column_names.each_with_index do |(column_type, columns), column_type_index| -%>
          -- <%= column_type %>
          <%- columns.each_with_index do |column, column_index| -%>
          <%- is_rename_primary_id = sql.primary_key_eql_id? && sql.primary_key?(column.name) -%>
          <%- is_last_column = column_type_index == sql.select_column_names.size - 1 && column_index == columns.size - 1 -%>
          <%= is_rename_primary_id ? "id as #{sql.rename_primary_id}" : column.name %><% unless is_last_column -%>,<%- end %>
          <%- if column_type_index != sql.select_column_names.size - 1 && column_index == columns.size - 1 -%>

          <%- end -%>
          <%- end -%>
        <%- end -%>

    from source

)

select * from renamed

Different Pattern:

#standardSQL

with source as (
  select
    <%- if sql.primary_key_eql_id? -%>
    id as <%= sql.rename_primary_id %>
    , * except(id)
    <%- else -%>
    *
    <%- end -%>
  from {{ source('<%= sql.source_name %>', '<%= sql.table_name %>') }}
)

, final as (
  select
  <%- sql.select_column_names.each_with_index do |(column_type, columns), column_type_index| -%>
    -- <%= column_type %>
    <%- columns.each_with_index do |column, column_index| -%>
    <% unless column_type_index == 0 && column_index == 0 -%>, <%- end %><%= (sql.primary_key_eql_id? && sql.primary_key?(column.name) ? sql.rename_primary_id : column.name) %>
    <%- if column_type_index != sql.select_column_names.size - 1 && column_index == columns.size - 1 -%>

    <%- end -%>
    <%- end -%>
  <%- end -%>
  from source
)

select
  *
from final

Generate dbt Staging Files

Generate staging model files for dbt:

$ bin/rails generate active_record:dbt:staging_model TABLE_NAME

Generate staging model files for dbt that reference the specified TABLE_NAME.

File Description
#{export_directory_path}/models/staging/#{source_name}/stg_#{source_name}__#{table_name}.sql Staging model file for dbt.
#{export_directory_path}/models/staging/#{source_name}/stg_#{source_name}__#{table_name}.yml Staging model documentation file for dbt.

Example:

$ bin/rails generate active_record:dbt:staging_model profiles
Generate #{export_directory_path}/models/staging/#{source_name}/stg_#{source_name}__#{table_name}.sql

Example:

with

source as (

    select * from {{ source('dummy', 'profiles') }}

),

renamed as (

    select

          -- ids
          id as profile_id,
          user_id,

          -- strings
          first_name,
          last_name,

          -- datetimes
          created_at,
          updated_at

    from source

)

select * from renamed

Different Pattern:

#standardSQL

with source as (
  select
    id as profile_id
    , * except(id)
  from {{ source('dummy', 'profiles') }}
)

, final as (
  select
    -- ids
    profile_id
    , user_id

    -- strings
    , first_name
    , last_name

    -- datetimes
    , created_at
    , updated_at
  from source
)

select
  *
from final
Generate #{export_directory_path}/models/staging/#{source_name}/stg_#{source_name}__#{table_name}.yml

Example:

Note

The output will be as shown below. It is recommended to indent the YAML file with a tool of your choice.

---
version: 2
models:
- name: stg_dummy__profiles
  description: Write the logical_name of the 'profiles' table in 'dummy_project'.
  columns:
  - name: profile_id
    description: profile_id
    data_type: int64
    data_tests:
    - unique
    - not_null
    - relationships:
        arguments:
          to: source('dummy', 'profiles')
          field: id
          meta:
            relationship_type: one-to-one
  - name: user_id
    description: user_id
    data_type: int64
    data_tests:
    - unique
    - not_null
    - relationships:
        arguments:
          to: source('dummy', 'users')
          field: id
          meta:
            relationship_type: one-to-one
  - name: first_name
    description: Write a description of the 'profiles.first_name' column.
    data_type: string
    data_tests:
    - not_null
  - name: last_name
    description: Write a description of the 'profiles.last_name' column.
    data_type: string
    data_tests:
    - not_null
  - name: created_at
    description: Created At
    data_type: datetime
    data_tests:
    - not_null
  - name: updated_at
    description: Updated At
    data_type: datetime
    data_tests:
    - not_null

Generate dbt Seed Files

dbt Seed Configuration

In the #{config_directory_path}/source_config.yml file, describe the properties you want to set for the seed enum. You can configure defaults in this file.

defaults

Set the default value for the description of the seeds enum.

In the description of seed_descriptions.enum, you can refer to the project name with {{ project_name }}, the table logical table name with {{ table_logical_name }}, and the column description with {{ column_description }}.

Example:

defaults:
  seed_descriptions:
    enum:
      description: "{{ project_name }} {{ table_logical_name }} {{ column_description }} enum"

If nothing is set, it defaults to the following:

defaults:
  seed_descriptions:
    enum:
      description: "{{ project_name }} {{ table_logical_name }} {{ column_description }} enum"

Generate dbt Seed Enum Files

Generate seed enum files for dbt:

$ bin/rails generate active_record:dbt:enum TABLE_NAME ENUM_COLUMN_NAME

Generate seed enum files for dbt from the specified TABLE_NAME and ENUM_COLUMN_NAME.

File Description
#{export_directory_path}/seeds/#{source_name}/seed_#{source_name}__#{table_name_singularize}_enum_#{enum_pluralized}.csv Seed enum file for dbt.
#{export_directory_path}/seeds/#{source_name}/seed_#{source_name}__#{table_name_singularize}_enum_#{enum_pluralized}.yml Seed enum documentation file for dbt.

Example:

$ bin/rails generate active_record:dbt:enum posts status
Generate #{export_directory_path}/seeds/#{source_name}/seed_#{source_name}__#{table_name_singularize}_enum_#{enum_pluralized}.csv

Example:

status_before_type_of_cast,status_key,status_en,status_ja
0,draft,Draft,下書き
1,published,Published,公開
2,deleted,Deleted,削除
Generate #{export_directory_path}/seeds/#{source_name}/seed_#{source_name}__#{table_name_singularize}_enum_#{enum_pluralized}.yml

Example:

Note

The output will be as shown below. It is recommended to indent the YAML file with a tool of your choice.

---
version: 2
seeds:
- name: seed_dummy__post_enum_statuses
  description: dummy_project Post Status enum
  config:
    column_types:
      status_before_type_of_cast: int64
      status_key: string
      status_en: string
      status_ja: string
  columns:
  - name: status_before_type_of_cast
    description: Status(before_type_of_cast)
    data_type: int64
    data_tests:
    - unique
    - not_null
  - name: status_key
    description: Status(key)
    data_type: string
    data_tests:
    - unique
    - not_null
  - name: status_en
    description: Status(en)
    data_type: string
    data_tests:
    - unique
    - not_null
  - name: status_ja
    description: Status(ja)
    data_type: string
    data_tests:
    - unique
    - not_null

Contributing

Contribution directions go here.

License

The gem is available as open source under the terms of the MIT License.