sorbet-baml
Ruby-idiomatic conversion from Sorbet types to BAML (Boundary AI Markup Language) for efficient LLM prompting.
What is this?
This gem provides a clean, Ruby-idiomatic API to convert your Sorbet type definitions (T::Struct, T::Enum) into BAML's concise format. BAML uses approximately 60% fewer tokens than JSON Schema while maintaining complete type information, making your LLM interactions more efficient and cost-effective.
Why?
When working with LLMs, token efficiency directly impacts:
- Cost: Fewer tokens = lower API costs
- Performance: Smaller prompts = faster responses
- Context: More room for actual content vs. type definitions
BAML provides the perfect balance: concise, readable, and LLM-friendly.
Example: Autonomous Research Workflow
# Complex LLM workflow types for autonomous research
class ComplexityLevel < T::Enum
enums do
# Basic analysis requiring straightforward research
Basic = new('basic')
# Advanced analysis requiring deep domain expertise
Advanced = new('advanced')
end
end
class TaskDecomposition < T::Struct
# The main research topic being investigated
const :research_topic, String
# Target complexity level for the decomposition
const :complexity_level, ComplexityLevel
# Autonomously generated list of research subtasks
const :subtasks, T::Array[String]
# Strategic priority rankings for each subtask
const :priority_order, T::Array[Integer]
end
# Ruby-idiomatic conversion with dependencies
TaskDecomposition.to_baml
# =>
# enum ComplexityLevel {
# "basic" @description("Basic analysis requiring straightforward research")
# "advanced" @description("Advanced analysis requiring deep domain expertise")
# }
#
# class TaskDecomposition {
# research_topic string @description("The main research topic being investigated")
# complexity_level ComplexityLevel @description("Target complexity level for the decomposition")
# subtasks string[] @description("Autonomously generated list of research subtasks")
# priority_order int[] @description("Strategic priority rankings for each subtask")
# }
Installation
Add to your Gemfile:
gem 'sorbet-baml'
Or install directly:
gem install sorbet-baml
Usage
require 'sorbet-baml'
# π― Ruby-idiomatic API for complex LLM workflows
class ConfidenceLevel < T::Enum
enums do
# Low confidence, requires further verification
Low = new('low')
# High confidence, strongly supported by multiple sources
High = new('high')
end
end
class ResearchFindings < T::Struct
# Detailed findings and analysis results
const :findings, String
# Key actionable insights extracted
const :key_insights, T::Array[String]
# Assessment of evidence quality and reliability
const :evidence_quality, ConfidenceLevel
# Confidence score for the findings (1-10 scale)
const :confidence_score, Integer
end
class ResearchSynthesis < T::Struct
# High-level executive summary of all findings
const :executive_summary, String
# Primary conclusions drawn from the research
const :key_conclusions, T::Array[String]
# Collection of research findings
const :findings_collection, T::Array[ResearchFindings]
end
# Convert with smart defaults (dependencies + descriptions included!)
ResearchSynthesis.to_baml
# π Smart defaults include dependencies and descriptions automatically
# =>
# enum ConfidenceLevel {
# "low" @description("Low confidence, requires further verification")
# "high" @description("High confidence, strongly supported by multiple sources")
# }
#
# class ResearchFindings {
# findings string @description("Detailed findings and analysis results")
# key_insights string[] @description("Key actionable insights extracted")
# evidence_quality ConfidenceLevel @description("Assessment of evidence quality and reliability")
# confidence_score int @description("Confidence score for the findings (1-10 scale)")
# }
#
# class ResearchSynthesis {
# executive_summary string @description("High-level executive summary of all findings")
# key_conclusions string[] @description("Primary conclusions drawn from the research")
# findings_collection ResearchFindings[] @description("Collection of research findings")
# }
# π― Disable features if needed
ResearchSynthesis.to_baml(include_descriptions: false)
ResearchSynthesis.to_baml(include_dependencies: false)
# π Customize formatting (smart defaults still apply)
ResearchSynthesis.to_baml(indent_size: 4)
# Legacy API (no smart defaults, for backwards compatibility)
SorbetBaml.from_struct(ResearchSynthesis)
SorbetBaml.from_structs([ResearchSynthesis, ResearchFindings])
π― Field Descriptions for LLM Context
Add crucial context to your BAML types by documenting fields - essential for autonomous agents and complex workflows. The gem supports two ways to provide field descriptions:
Method 1: Comments (Traditional)
class ResearchTask < T::Struct
# Clear description of the research objective
const :objective, String
# Strategic priority ranking (1-5 scale)
const :priority, Integer
end
Method 2: Description Parameter (New in v0.2.0+)
class ResearchTask < T::Struct
const :objective, String, description: "Clear description of the research objective"
const :priority, Integer, description: "Strategic priority ranking (1-5 scale)"
end
Both methods produce identical BAML output with @description
annotations. The description:
parameter provides a more explicit, Sorbet-native way to document fields.
Complete Example
class TaskType < T::Enum
enums do
# Literature review and information gathering
Research = new('research')
# Combining multiple sources into coherent insights
Synthesis = new('synthesis')
# Evaluating options or making recommendations
Evaluation = new('evaluation')
end
end
class ResearchSubtask < T::Struct
# Clear description of the specific research objective
const :objective, String
# Type of research task to be performed
const :task_type, TaskType
# Strategic priority ranking for task sequencing (1-5 scale)
const :priority, Integer
# Estimated effort required in hours
const :estimated_hours, Integer
# Suggested agent capabilities needed for optimal execution
const :required_capabilities, T::Array[String]
end
# Generate BAML (descriptions included by default!)
ResearchSubtask.to_baml
# =>
# enum TaskType {
# "research" @description("Literature review and information gathering")
# "synthesis" @description("Combining multiple sources into coherent insights")
# "evaluation" @description("Evaluating options or making recommendations")
# }
#
# class ResearchSubtask {
# objective string @description("Clear description of the specific research objective")
# task_type TaskType @description("Type of research task to be performed")
# priority int @description("Strategic priority ranking for task sequencing (1-5 scale)")
# estimated_hours int @description("Estimated effort required in hours")
# required_capabilities string[] @description("Suggested agent capabilities needed for optimal execution")
# }
Why descriptions matter: LLMs use field descriptions to understand context and generate more accurate, meaningful data. This is crucial for complex domains where field names alone aren't sufficient.
π οΈ Tool Type Definitions
Generate BAML tool specifications for agentic workflows, function calling, and structured LLM interactions:
T::Struct-based Tools
class ReplyTool < T::Struct
# The response message to send back to the user
const :response, String
end
class SearchTool < T::Struct
# Using description: parameter (both methods work)
const :query, String, description: "The search query to execute"
const :limit, T.nilable(Integer), description: "Maximum number of results to return"
end
# Generate BAML tool definitions
ReplyTool.to_baml_tool
# =>
# class ReplyTool {
# response string @description("The response message to send back to the user")
# }
SearchTool.to_baml_tool
# =>
# class SearchTool {
# query string @description("The search query to execute")
# limit int? @description("Maximum number of results to return")
# }
# Module API also available
SorbetBaml.from_tool(ReplyTool)
DSPy-style Tools (Optional)
When dspy.rb
is available, automatically convert DSPy tools with rich metadata:
class CalculatorTool < DSPy::Tools::Base
extend T::Sig
tool_name 'calculator'
tool_description 'Performs basic arithmetic operations'
sig { params(operation: String, num1: Float, num2: Float).returns(T.any(Float, String)) }
def call(operation:, num1:, num2:)
case operation.downcase
when 'add' then num1 + num2
when 'subtract' then num1 - num2
when 'multiply' then num1 * num2
when 'divide'
return "Error: Cannot divide by zero" if num2 == 0
num1 / num2
else
"Error: Unknown operation '#{operation}'. Use add, subtract, multiply, or divide"
end
end
end
# Automatic extraction of tool metadata and parameter types
CalculatorTool.to_baml
# =>
# // Performs basic arithmetic operations
# class calculator {
# operation string @description("Parameter operation")
# num1 float @description("Parameter num1")
# num2 float @description("Parameter num2")
# }
# Optional parameters handled correctly
class SearchTool < DSPy::Tools::Base
extend T::Sig
tool_name 'search'
tool_description 'Search for information'
sig { params(query: String, limit: T.nilable(Integer)).returns(T::Array[String]) }
def call(query:, limit: nil)
# Implementation...
end
end
SearchTool.to_baml
# =>
# // Search for information
# class search {
# query string @description("Parameter query")
# limit int? @description("Parameter limit (optional)")
# }
# Module API also available
SorbetBaml.from_dspy_tool(CalculatorTool)
Tool Features:
- β T::Struct tools: Convert any struct to BAML tool definition
- β DSPy integration: Automatic extraction from DSPy::Tools::Base classes
- β Parameter types: Full Sorbet type support (string, int, float, arrays, maps, etc.)
- β
Optional parameters: Automatically detect and mark with
?
- β
Descriptions: Extract from comments or
description:
parameter (T::Struct) or automatic generation (DSPy) - β Tool metadata: Names, descriptions, and parameter documentation
- β
Ruby-idiomatic:
.to_baml_tool()
and.to_baml()
methods
π― Complete Type Support
β Fully Supported
Basic Types
-
String
βstring
-
Integer
βint
-
Float
βfloat
-
T::Boolean
βbool
-
Symbol
βstring
-
Date/DateTime/Time
βstring
Complex Types
-
T.nilable(T)
βT?
(optional types) -
T::Array[T]
βT[]
(arrays) -
T::Hash[K,V]
βmap<K,V>
(hash maps) -
T.any(T1, T2)
βT1 | T2
(union types) -
T.nilable(T.any(T1, T2))
β(T1 | T2)?
(optional unions) -
T::Array[T.any(T1, T2)]
β(T1 | T2)[]
(union arrays)
Structured Types
-
T::Struct
βclass Name { ... }
(classes with fields) -
T::Enum
βenum Name { "value1" "value2" }
(enums) - Nested structs with proper reference handling
- Automatic dependency resolution with topological sorting
π Advanced Features
-
Ruby-idiomatic API: Every T::Struct and T::Enum gets
.to_baml
method - Tool definitions: Generate BAML tool specs for function calling and agentic workflows
- DSPy integration: Automatic tool conversion from DSPy::Tools::Base classes
- Smart defaults: Field descriptions and dependencies included automatically
-
Field descriptions: Extracts from comments or
description:
parameter for LLM context - Dependency management: Automatically includes all referenced types
- Proper ordering: Dependencies are sorted topologically (no forward references needed)
- Circular reference handling: Won't get stuck in infinite loops
- Customizable formatting: Control indentation and other output options
- Type-safe: Full Sorbet type checking throughout
Type Mapping Reference
Sorbet Type | BAML Output | Example |
---|---|---|
String |
string |
name string |
Integer |
int |
age int |
Float |
float |
price float |
T::Boolean |
bool |
active bool |
T.nilable(String) |
string? |
email string? |
T::Array[String] |
string[] |
tags string[] |
T::Hash[String, Integer] |
map<string, int> |
counts map<string, int> |
T.any(String, Integer) |
string | int |
value string | int |
T.nilable(T.any(String, Integer)) |
(string | int)? |
optional_value (string | int)? |
T::Array[T.any(String, Integer)] |
(string | int)[] |
mixed_array (string | int)[] |
MyStruct |
MyStruct |
user MyStruct |
MyEnum |
MyEnum |
status MyEnum |
π Production Ready
This gem has reached feature completeness for core BAML conversion needs. The Ruby-idiomatic API is stable and thoroughly tested with 80 test cases covering all type combinations, tool definitions, and edge cases.
π Quality Metrics
- β Comprehensive Test Coverage - All features thoroughly tested
- β Full Sorbet Type Safety - Zero type errors throughout codebase
- β 80 Test Cases - Covering basic types, complex combinations, tool definitions, and edge cases
- β TDD Development - All features built test-first
- β Field Descriptions - Automatic comment extraction for LLM context
- β Tool Definitions - BAML tool specifications for function calling and agentic workflows
- β DSPy Integration - Automatic tool conversion from DSPy::Tools::Base classes
- β Smart Defaults - Dependencies and descriptions included by default
- β Zero Breaking Changes - Maintains backward compatibility
β Complete Feature Set
- β
Ruby-idiomatic API: Every T::Struct and T::Enum gets
.to_baml
method - β Tool definitions: Generate BAML tool specifications from T::Struct classes
- β DSPy integration: Automatic tool conversion from DSPy::Tools::Base classes
- β Smart defaults: Field descriptions and dependencies included automatically
- β
Field descriptions: Extract documentation from comments or
description:
parameter for LLM context - β Dependency management: Automatically includes all referenced types
- β Proper ordering: Dependencies are sorted topologically
- β Type safety: Full Sorbet type checking throughout
πΊοΈ Future Enhancements (Optional)
- DSPy-independent tool API: Tools shouldn't require DSPy, just follow the same API pattern
- Type aliases:
T.type_alias { String }
βtype Alias = string
- Custom naming: Convert between snake_case β camelCase
- CLI tool:
sorbet-baml convert MyStruct
command - Validation: Verify generated BAML syntax
- Self-referential types:
Employee
withmanager: T.nilable(Employee)
π Version History
- v0.0.1 - Initial implementation with basic type support
- v0.1.0 - Complete type system + Ruby-idiomatic API + field descriptions + smart defaults
- v0.2.0 - Description parameter support and enhanced field extraction
- v0.3.0 - Tool type definitions + DSPy integration + 80+ test cases + comprehensive documentation
π Real-World Usage: Autonomous Research Agents
Perfect for agentic workflows, deep research systems, and complex LLM applications:
# Define your autonomous research workflow types
class TaskDecomposition < T::Struct
# Your complex research schema...
end
# Generate BAML for LLM agents
prompt = <<~PROMPT
You are an autonomous research agent. Analyze this topic and decompose it into strategic subtasks.
Schema for your output:
#{TaskDecomposition.to_baml}
Topic: "Impact of AI on healthcare delivery systems"
Provide a comprehensive task decomposition in JSON format.
PROMPT
# Use with OpenAI, Anthropic, or any LLM provider
response = llm_client.chat(prompt)
result = TaskDecomposition.from_json(response.content)
π Integration Examples
With OpenAI structured outputs:
User.to_baml(include_dependencies: true)
# Use the generated BAML in your function calling schemas
With prompt engineering:
# More efficient than JSON Schema
schema = User.to_baml(include_dependencies: true)
prompt = "Generate data matching: #{schema}"
With documentation generation:
# Auto-generate API docs
api_types = [User, Order, Product].map(&:to_baml).join("\n\n")
π Token Efficiency: BAML vs JSON Schema
Here's a real-world comparison using a complex agentic workflow from production DSPy.rb usage:
Complex T::Struct Types (Production Agentic Workflow)
# Real autonomous research workflow from production DSPy.rb usage
class ComplexityLevel < T::Enum
enums do
# Basic analysis requiring straightforward research
Basic = new('basic')
# Intermediate analysis requiring synthesis of multiple sources
Intermediate = new('intermediate')
# Advanced analysis requiring deep domain expertise and complex reasoning
Advanced = new('advanced')
end
end
class TaskDecomposition < T::Struct
# The main research topic being investigated
const :research_topic, String
# Additional context or constraints for the research
const :context, String
# Target complexity level for the decomposition
const :complexity_level, ComplexityLevel
# Autonomously generated list of research subtasks
const :subtasks, T::Array[String]
# Type classification for each task (analysis, synthesis, investigation, etc.)
const :task_types, T::Array[String]
# Strategic priority rankings (1-5 scale) for each subtask
const :priority_order, T::Array[Integer]
# Effort estimates in hours for each subtask
const :estimated_effort, T::Array[Integer]
# Task dependency relationships for optimal sequencing
const :dependencies, T::Array[String]
# Suggested agent types/skills needed for each task
const :agent_requirements, T::Array[String]
end
class ResearchExecution < T::Struct
# The specific research subtask to execute
const :subtask, String
# Accumulated context from previous research steps
const :context, String
# Any specific constraints or focus areas for this research
const :constraints, String
# Detailed research findings and analysis
const :findings, String
# Key actionable insights extracted from the research
const :key_insights, T::Array[String]
# Confidence in findings quality (1-10 scale)
const :confidence_level, Integer
# Assessment of evidence quality and reliability
const :evidence_quality, String
# Recommended next steps based on these findings
const :next_steps, T::Array[String]
# Identified gaps in knowledge or areas needing further research
const :knowledge_gaps, T::Array[String]
end
π BAML Output (Ruby-idiomatic with descriptions)
[ComplexityLevel, TaskDecomposition, ResearchExecution].map(&:to_baml).join("\n\n")
enum ComplexityLevel {
"basic" @description("Basic analysis requiring straightforward research")
"intermediate" @description("Intermediate analysis requiring synthesis of multiple sources")
"advanced" @description("Advanced analysis requiring deep domain expertise and complex reasoning")
}
class TaskDecomposition {
research_topic string @description("The main research topic being investigated")
context string @description("Additional context or constraints for the research")
complexity_level ComplexityLevel @description("Target complexity level for the decomposition")
subtasks string[] @description("Autonomously generated list of research subtasks")
task_types string[] @description("Type classification for each task (analysis, synthesis, investigation, etc.)")
priority_order int[] @description("Strategic priority rankings (1-5 scale) for each subtask")
estimated_effort int[] @description("Effort estimates in hours for each subtask")
dependencies string[] @description("Task dependency relationships for optimal sequencing")
agent_requirements string[] @description("Suggested agent types/skills needed for each task")
}
class ResearchExecution {
subtask string @description("The specific research subtask to execute")
context string @description("Accumulated context from previous research steps")
constraints string @description("Any specific constraints or focus areas for this research")
findings string @description("Detailed research findings and analysis")
key_insights string[] @description("Key actionable insights extracted from the research")
confidence_level int @description("Confidence in findings quality (1-10 scale)")
evidence_quality string @description("Assessment of evidence quality and reliability")
next_steps string[] @description("Recommended next steps based on these findings")
knowledge_gaps string[] @description("Identified gaps in knowledge or areas needing further research")
}
BAML Token Count: ~320 tokens
π JSON Schema Equivalent
{
"ComplexityLevel": {
"type": "string",
"enum": ["basic", "intermediate", "advanced"],
"description": "Complexity level enumeration"
},
"TaskDecomposition": {
"type": "object",
"properties": {
"topic": {"type": "string"},
"context": {"type": "string"},
"complexity_level": {"$ref": "#/definitions/ComplexityLevel"},
"subtasks": {
"type": "array",
"items": {"type": "string"}
},
"task_types": {
"type": "array",
"items": {"type": "string"}
},
"priority_order": {
"type": "array",
"items": {"type": "integer"}
},
"estimated_effort": {
"type": "array",
"items": {"type": "integer"}
},
"dependencies": {
"type": "array",
"items": {"type": "string"}
},
"agent_requirements": {
"type": "array",
"items": {"type": "string"}
}
},
"required": ["topic", "context", "complexity_level", "subtasks", "task_types", "priority_order", "estimated_effort", "dependencies", "agent_requirements"],
"additionalProperties": false
},
"ResearchExecution": {
"type": "object",
"properties": {
"subtask": {"type": "string"},
"context": {"type": "string"},
"constraints": {"type": "string"},
"findings": {"type": "string"},
"key_insights": {
"type": "array",
"items": {"type": "string"}
},
"confidence_level": {"type": "integer"},
"evidence_quality": {"type": "string"},
"next_steps": {
"type": "array",
"items": {"type": "string"}
},
"knowledge_gaps": {
"type": "array",
"items": {"type": "string"}
}
},
"required": ["subtask", "context", "constraints", "findings", "key_insights", "confidence_level", "evidence_quality", "next_steps", "knowledge_gaps"],
"additionalProperties": false
}
}
JSON Schema Token Count: ~680 tokens
π― Results: 53% Token Reduction (with descriptions)
Format | Tokens | Reduction |
---|---|---|
JSON Schema | ~680 | baseline |
BAML | ~320 | π₯ 53% fewer |
Without descriptions:
Format | Tokens | Reduction |
---|---|---|
JSON Schema | ~450 | baseline |
BAML | ~180 | π₯ 60% fewer |
Real Impact:
- Cost Savings: 53-60% reduction in prompt tokens = significant LLM API cost savings
- Performance: Smaller prompts = faster LLM response times
- Context Efficiency: More room for actual content vs. type definitions
- LLM Understanding: Descriptions provide crucial context for autonomous agents
- Readability: BAML is human-readable and maintainable
This example represents actual agentic workflows from production DSPy.rb applications using complex nested types, enums, and arrays - exactly the scenarios where token efficiency and LLM understanding matter most.
Credits
This project was inspired by sorbet-schema
which provides excellent Sorbet type introspection capabilities. While sorbet-schema focuses on serialization/deserialization, sorbet-baml focuses on generating efficient type definitions for LLM consumption.
Contributing
Bug reports and pull requests are welcome at https://github.com/vicentereig/sorbet-baml
License
MIT License. See LICENSE.txt for details.