Avro::Builder
Avro::Builder provides a Ruby DSL to create Apache Avro Schemas.
This DSL was created because:
- The Avro IDL is not supported in Ruby.
- The Avro IDL can only be used to define Protocols.
- Schemas can be extracted as JSON from an IDL Protocol but support for imports is still limited.
Additional background on why we developed avro-builder is provided
here.
Features
- The syntax is designed for ease-of-use.
- Definitions can be imported by name. This includes auto-loading from a configured set of paths. This allows definitions to split across files and even reused between projects.
- Record definitions can inherit from other record definitions.
- Schema Store to load files written in the DSL and return
Avro::Schemaobjects.
Limitations
- Only Avro Schemas, not Protocols are supported.
- See Issues for functionality that has yet to be implemented.
Installation
Add this line to your application's Gemfile:
gem 'avro-builder'And then execute:
$ bundle
Or install it yourself as:
$ gem install avro-builder
Railtie
When included in a Rails project, #{Rails.root}/avro/dsl is configured as a
load path for the DSL.
A rake task is also defined for generating Avro JSON schemas from the DSL.
Usage
To use Avro::Builder, define a schema:
namespace 'com.example'
fixed :password, 8
enum :user_type, :ADMIN, :REGULAR
record :user do
required :id, :long
required :user_name, :string
required :type, :user_type, default: :REGULAR
required :pw, :password
optional :full_name, :string
required :nicknames, :array, items: :string
required :permissions, :map, values: :bytes
endThe schema definition may be passed as a string or a block to
Avro::Builder.build.
This generates the following Avro JSON schema:
{
"type": "record",
"name": "user",
"namespace": "com.example",
"fields": [
{
"name": "id",
"type": "long"
},
{
"name": "user_name",
"type": "string"
},
{
"name": "type",
"type": {
"name": "user_type",
"type": "enum",
"symbols": [
"ADMIN",
"REGULAR"
],
"namespace": "com.example"
},
"default": "REGULAR"
},
{
"name": "pw",
"type": {
"name": "password",
"type": "fixed",
"size": 8,
"namespace": "com.example"
}
},
{
"name": "full_name",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "nicknames",
"type": {
"type": "array",
"items": "string"
}
},
{
"name": "permissions",
"type": {
"type": "map",
"values": "bytes"
}
}
]
}Required and Optional
Fields for a record are specified as required or optional. Optional fields are
implemented as a union in Avro, where null is the first type in the union and
the field has a default value of null.
Named Types
fixed and enum fields may be specified inline as part of a record
or as standalone named types.
# Either syntax is supported for specifying the size
fixed :f, 4
fixed :g, size: 8
# Either syntax is supported for specifying symbols
enum :e, :X, :Y, :Z
enum :d, symbols: [:A, :B]
# defaults can be set for enums with Ruby Avro v1.10.0
enum :c, symbols: [:A, :B], default: :A
record :my_record_with_named do
required :f_ref, :f
required :fixed_inline, :fixed, size: 9
required :e_ref, :e
required :enum_inline, :enum, symbols: [:P, :Q]
endComplex Types
Array, maps and unions can each be embedded within another complex type using methods that match the type name:
record :complex_types do
required :array_of_unions, :array, items: union(:int, :string)
required :array_or_map, :union, types: [array(:int), map(:int)]
endMethods may also be used for complex types instead of separately specifying the type name and options:
record :complex_types do
required :array_of_unions, array(union(:int, :string))
required :array_or_map, union(array(:int), map(:int))
endFor more on unions see below.
Nested Records
Nested records may be created by referring to the name of the previously
defined record or using the field type :record.
record :sub_rec do
required :i, :int
end
record :top_rec do
required :sub, :sub_rec
endDefinining a subrecord inline:
record :my_rec do
required :nested, :record do
required :s, :string
end
endNested record types defined without an explicit name are given a generated
name based on the name of the field and record that they are nested within.
In the example above, the nested record type would have the generated name
__my_rec_nested_record:
{
"type": "record",
"name": "my_rec",
"fields": [
{
"name": "nested",
"type": {
"type": "record",
"name": "__my_rec_nested_record",
"fields": [
{
"name": "s",
"type": "string"
}
]
}
}
]
}Unions
A union may be specified within a record using required and optional with
the :union type:
record :my_record_with_unions do
required :req_union, :union, types: [:string, :int]
optional :opt_union, :union, types: [:float, :long]
endFor an optional union, null is automatically added as the first type for
the union and the field defaults to null.
Unions may also be defined using the union method instead of specifying the
:union type and member types separately:
record :my_record_with_unions do
required :req_union, union(:string, :int)
optional :opt_union, union(:float, :long)
endLogical Types
The DSL supports setting a logical type on any type except a union. The Avro
spec lists the logical types
that are currently defined. Note: avro-builder is more permissive and any logical type can
be specified on a type.
A logical type can be specified for a field using the logical_type attribute:
record :with_timestamp
required :created_at, :long, logical_type: 'timestamp-micros'
endPrimitive types with a logical type can also be embedded within complex types
using either the generic type method:
record :with_date_array
required :date_array, :array, type(:int, logical_type: date)
endOr using a primitive type specific method:
record :with_date_array
required :date_array, :array, int(logical_type: date)
endDecimal Logical Types
The decimal logical type, for bytes and fixed types, is currently the only logical type that requires additional
attributes. For decimals, precision must be specified and scale may optionally be specified. avro-builder
supports both of these attributes for bytes and fixed decimals. See the Avro
spec for more details.
Abstract Types
Types can be declared as abstract in the DSL. Declaring a type as abstract prevents the rake task from generating an Avro JSON schema for the type.
A type can be declared as abstract using either an option or a method in the DSL when defining the type:
record :unique_id, abstract: true
required :uuid, :fixed, size: 38
end
enum :status do
symbols %w(valid invalid)
abstract true
endType Macros
avro-builder allows type macros to be defined that expand to types that
cannot normally be named in Avro schemas. These macro names are not retained
in generated schemas but allow definitions to be reused across DSL files:
type_macro :timestamp, long(logical_type: 'timestamp-millis')
record :user do
required :created_at, :timestamp
required :updated_at, :timestamp
endType macros inherit the namespace from the context where they are defined or an explicit namespace option may be specified:
type_macro :timestamp, long(logical_type: 'timestamp-millis'),
namespace: 'com.my_company'Type macros are always marked as abstract and do not generate an Avro JSON schema file when using the rake task.
Auto-loading and Imports
Specify paths to search for definitions:
Avro::Builder.add_load_path('/path/to/dsl/files')Undefined references are automatically loaded from a file with the same name.
The load paths are searched for .rb file with a matching name.
Files may also be explicitly imported using import <filename>.
Extends
A previously defined record may be referenced in the definition of another
record using extends <record_name>. This adds all of the fields from
the referenced record to the current record. The current record may override
fields in the record that it extends.
record :original do
required :first, :string
required :second, :int
end
record :extended do
extends :original
optional :first, :string
end
Additionally you can provide a namespace to extends if necessary to remove ambiguity.
namespace 'com.newbie'
record :original, namespace: 'com.og' do
required :first, :string
required :second, :int
end
record :original do
required :first, :string
required :second, :int
end
record :extended do
extends :original, namespace: 'com.og'
optional :first, :string
end
Schema Store
The Avro::Builder::SchemaStore can be used to load DSL files and return cached
Avro::Schema objects. This schema store can be used as the schema store for
avromatic
to generate models directly from schemas defined using the DSL.
The schema store must be initialized with the path where DSL files are located:
schema_store = Avro::Builder::SchemaStore.new(path: '/path/to/dsl/files')
schema_store.find('schema_name', 'my_namespace')
#=> Avro::Schema (for file at '/path/to/dsl/files/my_namespace/schema_name.rb')To configure Avromatic to use this schema store and its Messaging API:
Avromatic.configure do |config|
config.schema_store = Avro::Builder::SchemaStore.new(path: 'avro/dsl')
config.registry_url = 'https://builder:avro@avro-schema-registry.salsify.com'
config.build_messaging!
endAvro Generate Rake Task
There is a rake task that can be used to generate Avro schemas from all DSL files.
A rake task is automatically defined via a Railtie for Rails projects that uses
#{Rails.root}/avro/dsl as the root for Avro DSL files.
Custom rake tasks can also be defined:
require 'avro/builder/rake/avro_generate_task'
Avro::Builder::Rake::AvroGenerateTask.new(name: :custom_gen,
dependencies: [:load_app]) do |task|
task.filetype = 'avsc' # default option
task.root = '/path/to/dsl/files'
task.load_paths << '/additional/dsl/files'
endDevelopment
After checking out the repo, run bin/setup to install dependencies. Then, run rake spec to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.
Contributing
Issues and pull requests are welcome on GitHub at https://github.com/salsify/avro-builder.
License
The gem is available as open source under the terms of the MIT License.