0.0
The project is in a healthy, maintained state
A Ruby gem that converts T::Struct and T::Enum to BAML (Boundary AI Markup Language) type definitions. BAML uses 60% fewer tokens than JSON Schema while maintaining type safety.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
 Dependencies

Runtime

 Project Readme

sorbet-baml

Gem Version Total Downloads License Sorbet Compatible

Ruby-idiomatic conversion from Sorbet types to BAML (Boundary AI Markup Language) for efficient LLM prompting.

What is this?

This gem provides a clean, Ruby-idiomatic API to convert your Sorbet type definitions (T::Struct, T::Enum) into BAML's concise format. BAML uses approximately 60% fewer tokens than JSON Schema while maintaining complete type information, making your LLM interactions more efficient and cost-effective.

Why?

When working with LLMs, token efficiency directly impacts:

  • Cost: Fewer tokens = lower API costs
  • Performance: Smaller prompts = faster responses
  • Context: More room for actual content vs. type definitions

BAML provides the perfect balance: concise, readable, and LLM-friendly.

Example: Autonomous Research Workflow

# Complex LLM workflow types for autonomous research
class ComplexityLevel < T::Enum
  enums do
    # Basic analysis requiring straightforward research
    Basic = new('basic')
    # Advanced analysis requiring deep domain expertise
    Advanced = new('advanced')
  end
end

class TaskDecomposition < T::Struct
  # The main research topic being investigated
  const :research_topic, String
  # Target complexity level for the decomposition
  const :complexity_level, ComplexityLevel
  # Autonomously generated list of research subtasks
  const :subtasks, T::Array[String]
  # Strategic priority rankings for each subtask
  const :priority_order, T::Array[Integer]
end

# Ruby-idiomatic conversion with dependencies
TaskDecomposition.to_baml
# =>
# enum ComplexityLevel {
#   "basic" @description("Basic analysis requiring straightforward research")
#   "advanced" @description("Advanced analysis requiring deep domain expertise")
# }
# 
# class TaskDecomposition {
#   research_topic string @description("The main research topic being investigated")
#   complexity_level ComplexityLevel @description("Target complexity level for the decomposition")
#   subtasks string[] @description("Autonomously generated list of research subtasks")
#   priority_order int[] @description("Strategic priority rankings for each subtask")
# }

Installation

Add to your Gemfile:

gem 'sorbet-baml'

Or install directly:

gem install sorbet-baml

Usage

require 'sorbet-baml'

# 🎯 Ruby-idiomatic API for complex LLM workflows

class ConfidenceLevel < T::Enum
  enums do
    # Low confidence, requires further verification
    Low = new('low')
    # High confidence, strongly supported by multiple sources
    High = new('high')
  end
end

class ResearchFindings < T::Struct
  # Detailed findings and analysis results
  const :findings, String
  # Key actionable insights extracted
  const :key_insights, T::Array[String]
  # Assessment of evidence quality and reliability
  const :evidence_quality, ConfidenceLevel
  # Confidence score for the findings (1-10 scale)
  const :confidence_score, Integer
end

class ResearchSynthesis < T::Struct
  # High-level executive summary of all findings
  const :executive_summary, String
  # Primary conclusions drawn from the research
  const :key_conclusions, T::Array[String]
  # Collection of research findings
  const :findings_collection, T::Array[ResearchFindings]
end

# Convert with smart defaults (dependencies + descriptions included!)
ResearchSynthesis.to_baml

# πŸš€ Smart defaults include dependencies and descriptions automatically
# =>
# enum ConfidenceLevel {
#   "low" @description("Low confidence, requires further verification")
#   "high" @description("High confidence, strongly supported by multiple sources")
# }
# 
# class ResearchFindings {
#   findings string @description("Detailed findings and analysis results")
#   key_insights string[] @description("Key actionable insights extracted")
#   evidence_quality ConfidenceLevel @description("Assessment of evidence quality and reliability")
#   confidence_score int @description("Confidence score for the findings (1-10 scale)")
# }
# 
# class ResearchSynthesis {
#   executive_summary string @description("High-level executive summary of all findings")
#   key_conclusions string[] @description("Primary conclusions drawn from the research")
#   findings_collection ResearchFindings[] @description("Collection of research findings")
# }

# 🎯 Disable features if needed  
ResearchSynthesis.to_baml(include_descriptions: false)
ResearchSynthesis.to_baml(include_dependencies: false)

# πŸš€ Customize formatting (smart defaults still apply)
ResearchSynthesis.to_baml(indent_size: 4)

# Legacy API (no smart defaults, for backwards compatibility)
SorbetBaml.from_struct(ResearchSynthesis)
SorbetBaml.from_structs([ResearchSynthesis, ResearchFindings])

🎯 Field Descriptions for LLM Context

Add crucial context to your BAML types by documenting fields - essential for autonomous agents and complex workflows. The gem supports two ways to provide field descriptions:

Method 1: Comments (Traditional)

class ResearchTask < T::Struct
  # Clear description of the research objective
  const :objective, String
  # Strategic priority ranking (1-5 scale)
  const :priority, Integer
end

Method 2: Description Parameter (New in v0.2.0+)

class ResearchTask < T::Struct
  const :objective, String, description: "Clear description of the research objective"
  const :priority, Integer, description: "Strategic priority ranking (1-5 scale)"
end

Both methods produce identical BAML output with @description annotations. The description: parameter provides a more explicit, Sorbet-native way to document fields.

Complete Example

class TaskType < T::Enum
  enums do
    # Literature review and information gathering
    Research = new('research')
    # Combining multiple sources into coherent insights
    Synthesis = new('synthesis')
    # Evaluating options or making recommendations
    Evaluation = new('evaluation')
  end
end

class ResearchSubtask < T::Struct
  # Clear description of the specific research objective
  const :objective, String
  
  # Type of research task to be performed
  const :task_type, TaskType
  
  # Strategic priority ranking for task sequencing (1-5 scale)
  const :priority, Integer
  
  # Estimated effort required in hours
  const :estimated_hours, Integer
  
  # Suggested agent capabilities needed for optimal execution
  const :required_capabilities, T::Array[String]
end

# Generate BAML (descriptions included by default!)
ResearchSubtask.to_baml
# =>
# enum TaskType {
#   "research" @description("Literature review and information gathering")
#   "synthesis" @description("Combining multiple sources into coherent insights")
#   "evaluation" @description("Evaluating options or making recommendations")
# }
# 
# class ResearchSubtask {
#   objective string @description("Clear description of the specific research objective")
#   task_type TaskType @description("Type of research task to be performed")
#   priority int @description("Strategic priority ranking for task sequencing (1-5 scale)")
#   estimated_hours int @description("Estimated effort required in hours")
#   required_capabilities string[] @description("Suggested agent capabilities needed for optimal execution")
# }

Why descriptions matter: LLMs use field descriptions to understand context and generate more accurate, meaningful data. This is crucial for complex domains where field names alone aren't sufficient.

πŸ› οΈ Tool Type Definitions

Generate BAML tool specifications for agentic workflows, function calling, and structured LLM interactions:

T::Struct-based Tools

class ReplyTool < T::Struct
  # The response message to send back to the user
  const :response, String
end

class SearchTool < T::Struct
  # Using description: parameter (both methods work)
  const :query, String, description: "The search query to execute"
  const :limit, T.nilable(Integer), description: "Maximum number of results to return"
end

# Generate BAML tool definitions
ReplyTool.to_baml_tool
# =>
# class ReplyTool {
#   response string @description("The response message to send back to the user")
# }

SearchTool.to_baml_tool
# =>
# class SearchTool {
#   query string @description("The search query to execute")
#   limit int? @description("Maximum number of results to return")
# }

# Module API also available
SorbetBaml.from_tool(ReplyTool)

DSPy-style Tools (Optional)

When dspy.rb is available, automatically convert DSPy tools with rich metadata:

class CalculatorTool < DSPy::Tools::Base
  extend T::Sig
  
  tool_name 'calculator'
  tool_description 'Performs basic arithmetic operations'

  sig { params(operation: String, num1: Float, num2: Float).returns(T.any(Float, String)) }
  def call(operation:, num1:, num2:)
    case operation.downcase
    when 'add' then num1 + num2
    when 'subtract' then num1 - num2
    when 'multiply' then num1 * num2
    when 'divide'
      return "Error: Cannot divide by zero" if num2 == 0
      num1 / num2
    else
      "Error: Unknown operation '#{operation}'. Use add, subtract, multiply, or divide"
    end
  end
end

# Automatic extraction of tool metadata and parameter types
CalculatorTool.to_baml
# =>
# // Performs basic arithmetic operations
# class calculator {
#   operation string @description("Parameter operation")
#   num1 float @description("Parameter num1")
#   num2 float @description("Parameter num2")
# }

# Optional parameters handled correctly
class SearchTool < DSPy::Tools::Base
  extend T::Sig
  
  tool_name 'search'
  tool_description 'Search for information'

  sig { params(query: String, limit: T.nilable(Integer)).returns(T::Array[String]) }
  def call(query:, limit: nil)
    # Implementation...
  end
end

SearchTool.to_baml
# =>
# // Search for information
# class search {
#   query string @description("Parameter query")
#   limit int? @description("Parameter limit (optional)")
# }

# Module API also available
SorbetBaml.from_dspy_tool(CalculatorTool)

Tool Features:

  • βœ… T::Struct tools: Convert any struct to BAML tool definition
  • βœ… DSPy integration: Automatic extraction from DSPy::Tools::Base classes
  • βœ… Parameter types: Full Sorbet type support (string, int, float, arrays, maps, etc.)
  • βœ… Optional parameters: Automatically detect and mark with ?
  • βœ… Descriptions: Extract from comments or description: parameter (T::Struct) or automatic generation (DSPy)
  • βœ… Tool metadata: Names, descriptions, and parameter documentation
  • βœ… Ruby-idiomatic: .to_baml_tool() and .to_baml() methods

🎯 Complete Type Support

βœ… Fully Supported

Basic Types

  • String β†’ string
  • Integer β†’ int
  • Float β†’ float
  • T::Boolean β†’ bool
  • Symbol β†’ string
  • Date/DateTime/Time β†’ string

Complex Types

  • T.nilable(T) β†’ T? (optional types)
  • T::Array[T] β†’ T[] (arrays)
  • T::Hash[K,V] β†’ map<K,V> (hash maps)
  • T.any(T1, T2) β†’ T1 | T2 (union types)
  • T.nilable(T.any(T1, T2)) β†’ (T1 | T2)? (optional unions)
  • T::Array[T.any(T1, T2)] β†’ (T1 | T2)[] (union arrays)

Structured Types

  • T::Struct β†’ class Name { ... } (classes with fields)
  • T::Enum β†’ enum Name { "value1" "value2" } (enums)
  • Nested structs with proper reference handling
  • Automatic dependency resolution with topological sorting

πŸš€ Advanced Features

  • Ruby-idiomatic API: Every T::Struct and T::Enum gets .to_baml method
  • Tool definitions: Generate BAML tool specs for function calling and agentic workflows
  • DSPy integration: Automatic tool conversion from DSPy::Tools::Base classes
  • Smart defaults: Field descriptions and dependencies included automatically
  • Field descriptions: Extracts from comments or description: parameter for LLM context
  • Dependency management: Automatically includes all referenced types
  • Proper ordering: Dependencies are sorted topologically (no forward references needed)
  • Circular reference handling: Won't get stuck in infinite loops
  • Customizable formatting: Control indentation and other output options
  • Type-safe: Full Sorbet type checking throughout

Type Mapping Reference

Sorbet Type BAML Output Example
String string name string
Integer int age int
Float float price float
T::Boolean bool active bool
T.nilable(String) string? email string?
T::Array[String] string[] tags string[]
T::Hash[String, Integer] map<string, int> counts map<string, int>
T.any(String, Integer) string | int value string | int
T.nilable(T.any(String, Integer)) (string | int)? optional_value (string | int)?
T::Array[T.any(String, Integer)] (string | int)[] mixed_array (string | int)[]
MyStruct MyStruct user MyStruct
MyEnum MyEnum status MyEnum

🏁 Production Ready

This gem has reached feature completeness for core BAML conversion needs. The Ruby-idiomatic API is stable and thoroughly tested with 80 test cases covering all type combinations, tool definitions, and edge cases.

πŸ“Š Quality Metrics

  • βœ… Comprehensive Test Coverage - All features thoroughly tested
  • βœ… Full Sorbet Type Safety - Zero type errors throughout codebase
  • βœ… 80 Test Cases - Covering basic types, complex combinations, tool definitions, and edge cases
  • βœ… TDD Development - All features built test-first
  • βœ… Field Descriptions - Automatic comment extraction for LLM context
  • βœ… Tool Definitions - BAML tool specifications for function calling and agentic workflows
  • βœ… DSPy Integration - Automatic tool conversion from DSPy::Tools::Base classes
  • βœ… Smart Defaults - Dependencies and descriptions included by default
  • βœ… Zero Breaking Changes - Maintains backward compatibility

βœ… Complete Feature Set

  • βœ… Ruby-idiomatic API: Every T::Struct and T::Enum gets .to_baml method
  • βœ… Tool definitions: Generate BAML tool specifications from T::Struct classes
  • βœ… DSPy integration: Automatic tool conversion from DSPy::Tools::Base classes
  • βœ… Smart defaults: Field descriptions and dependencies included automatically
  • βœ… Field descriptions: Extract documentation from comments or description: parameter for LLM context
  • βœ… Dependency management: Automatically includes all referenced types
  • βœ… Proper ordering: Dependencies are sorted topologically
  • βœ… Type safety: Full Sorbet type checking throughout

πŸ—ΊοΈ Future Enhancements (Optional)

  • DSPy-independent tool API: Tools shouldn't require DSPy, just follow the same API pattern
  • Type aliases: T.type_alias { String } β†’ type Alias = string
  • Custom naming: Convert between snake_case ↔ camelCase
  • CLI tool: sorbet-baml convert MyStruct command
  • Validation: Verify generated BAML syntax
  • Self-referential types: Employee with manager: T.nilable(Employee)

πŸ“ˆ Version History

  • v0.0.1 - Initial implementation with basic type support
  • v0.1.0 - Complete type system + Ruby-idiomatic API + field descriptions + smart defaults
  • v0.2.0 - Description parameter support and enhanced field extraction
  • v0.3.0 - Tool type definitions + DSPy integration + 80+ test cases + comprehensive documentation

🌟 Real-World Usage: Autonomous Research Agents

Perfect for agentic workflows, deep research systems, and complex LLM applications:

# Define your autonomous research workflow types
class TaskDecomposition < T::Struct
  # Your complex research schema...
end

# Generate BAML for LLM agents
prompt = <<~PROMPT
  You are an autonomous research agent. Analyze this topic and decompose it into strategic subtasks.
  
  Schema for your output:
  #{TaskDecomposition.to_baml}
  
  Topic: "Impact of AI on healthcare delivery systems"
  
  Provide a comprehensive task decomposition in JSON format.
PROMPT

# Use with OpenAI, Anthropic, or any LLM provider
response = llm_client.chat(prompt)
result = TaskDecomposition.from_json(response.content)

πŸ”— Integration Examples

With OpenAI structured outputs:

User.to_baml(include_dependencies: true)
# Use the generated BAML in your function calling schemas

With prompt engineering:

# More efficient than JSON Schema
schema = User.to_baml(include_dependencies: true)
prompt = "Generate data matching: #{schema}"

With documentation generation:

# Auto-generate API docs
api_types = [User, Order, Product].map(&:to_baml).join("\n\n")

πŸ† Token Efficiency: BAML vs JSON Schema

Here's a real-world comparison using a complex agentic workflow from production DSPy.rb usage:

Complex T::Struct Types (Production Agentic Workflow)

# Real autonomous research workflow from production DSPy.rb usage
class ComplexityLevel < T::Enum
  enums do
    # Basic analysis requiring straightforward research
    Basic = new('basic')
    # Intermediate analysis requiring synthesis of multiple sources
    Intermediate = new('intermediate') 
    # Advanced analysis requiring deep domain expertise and complex reasoning
    Advanced = new('advanced')
  end
end

class TaskDecomposition < T::Struct
  # The main research topic being investigated
  const :research_topic, String
  # Additional context or constraints for the research
  const :context, String
  # Target complexity level for the decomposition
  const :complexity_level, ComplexityLevel
  # Autonomously generated list of research subtasks
  const :subtasks, T::Array[String]
  # Type classification for each task (analysis, synthesis, investigation, etc.)
  const :task_types, T::Array[String]
  # Strategic priority rankings (1-5 scale) for each subtask
  const :priority_order, T::Array[Integer]
  # Effort estimates in hours for each subtask
  const :estimated_effort, T::Array[Integer]
  # Task dependency relationships for optimal sequencing
  const :dependencies, T::Array[String]
  # Suggested agent types/skills needed for each task
  const :agent_requirements, T::Array[String]
end

class ResearchExecution < T::Struct
  # The specific research subtask to execute
  const :subtask, String
  # Accumulated context from previous research steps
  const :context, String
  # Any specific constraints or focus areas for this research
  const :constraints, String
  # Detailed research findings and analysis
  const :findings, String
  # Key actionable insights extracted from the research
  const :key_insights, T::Array[String]
  # Confidence in findings quality (1-10 scale)
  const :confidence_level, Integer
  # Assessment of evidence quality and reliability
  const :evidence_quality, String
  # Recommended next steps based on these findings
  const :next_steps, T::Array[String]
  # Identified gaps in knowledge or areas needing further research
  const :knowledge_gaps, T::Array[String]
end

πŸ“Š BAML Output (Ruby-idiomatic with descriptions)

[ComplexityLevel, TaskDecomposition, ResearchExecution].map(&:to_baml).join("\n\n")
enum ComplexityLevel {
  "basic" @description("Basic analysis requiring straightforward research")
  "intermediate" @description("Intermediate analysis requiring synthesis of multiple sources")
  "advanced" @description("Advanced analysis requiring deep domain expertise and complex reasoning")
}

class TaskDecomposition {
  research_topic string @description("The main research topic being investigated")
  context string @description("Additional context or constraints for the research")
  complexity_level ComplexityLevel @description("Target complexity level for the decomposition")
  subtasks string[] @description("Autonomously generated list of research subtasks")
  task_types string[] @description("Type classification for each task (analysis, synthesis, investigation, etc.)")
  priority_order int[] @description("Strategic priority rankings (1-5 scale) for each subtask")
  estimated_effort int[] @description("Effort estimates in hours for each subtask")
  dependencies string[] @description("Task dependency relationships for optimal sequencing")
  agent_requirements string[] @description("Suggested agent types/skills needed for each task")
}

class ResearchExecution {
  subtask string @description("The specific research subtask to execute")
  context string @description("Accumulated context from previous research steps")
  constraints string @description("Any specific constraints or focus areas for this research")
  findings string @description("Detailed research findings and analysis")
  key_insights string[] @description("Key actionable insights extracted from the research")
  confidence_level int @description("Confidence in findings quality (1-10 scale)")
  evidence_quality string @description("Assessment of evidence quality and reliability")
  next_steps string[] @description("Recommended next steps based on these findings")
  knowledge_gaps string[] @description("Identified gaps in knowledge or areas needing further research")
}

BAML Token Count: ~320 tokens

πŸ“Š JSON Schema Equivalent

{
  "ComplexityLevel": {
    "type": "string",
    "enum": ["basic", "intermediate", "advanced"],
    "description": "Complexity level enumeration"
  },
  "TaskDecomposition": {
    "type": "object",
    "properties": {
      "topic": {"type": "string"},
      "context": {"type": "string"},
      "complexity_level": {"$ref": "#/definitions/ComplexityLevel"},
      "subtasks": {
        "type": "array",
        "items": {"type": "string"}
      },
      "task_types": {
        "type": "array", 
        "items": {"type": "string"}
      },
      "priority_order": {
        "type": "array",
        "items": {"type": "integer"}
      },
      "estimated_effort": {
        "type": "array",
        "items": {"type": "integer"}
      },
      "dependencies": {
        "type": "array",
        "items": {"type": "string"}
      },
      "agent_requirements": {
        "type": "array",
        "items": {"type": "string"}
      }
    },
    "required": ["topic", "context", "complexity_level", "subtasks", "task_types", "priority_order", "estimated_effort", "dependencies", "agent_requirements"],
    "additionalProperties": false
  },
  "ResearchExecution": {
    "type": "object",
    "properties": {
      "subtask": {"type": "string"},
      "context": {"type": "string"}, 
      "constraints": {"type": "string"},
      "findings": {"type": "string"},
      "key_insights": {
        "type": "array",
        "items": {"type": "string"}
      },
      "confidence_level": {"type": "integer"},
      "evidence_quality": {"type": "string"},
      "next_steps": {
        "type": "array",
        "items": {"type": "string"}
      },
      "knowledge_gaps": {
        "type": "array", 
        "items": {"type": "string"}
      }
    },
    "required": ["subtask", "context", "constraints", "findings", "key_insights", "confidence_level", "evidence_quality", "next_steps", "knowledge_gaps"],
    "additionalProperties": false
  }
}

JSON Schema Token Count: ~680 tokens

🎯 Results: 53% Token Reduction (with descriptions)

Format Tokens Reduction
JSON Schema ~680 baseline
BAML ~320 πŸ”₯ 53% fewer

Without descriptions:

Format Tokens Reduction
JSON Schema ~450 baseline
BAML ~180 πŸ”₯ 60% fewer

Real Impact:

  • Cost Savings: 53-60% reduction in prompt tokens = significant LLM API cost savings
  • Performance: Smaller prompts = faster LLM response times
  • Context Efficiency: More room for actual content vs. type definitions
  • LLM Understanding: Descriptions provide crucial context for autonomous agents
  • Readability: BAML is human-readable and maintainable

This example represents actual agentic workflows from production DSPy.rb applications using complex nested types, enums, and arrays - exactly the scenarios where token efficiency and LLM understanding matter most.

Credits

This project was inspired by sorbet-schema which provides excellent Sorbet type introspection capabilities. While sorbet-schema focuses on serialization/deserialization, sorbet-baml focuses on generating efficient type definitions for LLM consumption.

Contributing

Bug reports and pull requests are welcome at https://github.com/vicentereig/sorbet-baml

License

MIT License. See LICENSE.txt for details.