No release in over 3 years
A RubyLLM provider that implements OpenAI's Responses API, providing access to built-in tools (web search, code interpreter, file search, shell, apply patch), stateful conversations, server-side compaction, containers API, background mode, and MCP support.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
 Dependencies

Development

~> 13.0
~> 3.0
~> 1.0
~> 1.4
~> 6.0
~> 3.0

Runtime

>= 1.0
 Project Readme

RubyLLM Responses API

A RubyLLM provider for OpenAI's Responses API.

Installation

gem 'ruby_llm-responses_api'

Quick Start

require 'ruby_llm-responses_api'

RubyLLM.configure do |config|
  config.openai_api_key = ENV['OPENAI_API_KEY']
end

chat = RubyLLM.chat(model: 'gpt-4o-mini', provider: :openai_responses)
response = chat.ask("Hello!")
puts response.content

All standard RubyLLM features work as expected (streaming, tools, vision, structured output).

Stateful Conversations

Conversations automatically chain via previous_response_id:

chat = RubyLLM.chat(model: 'gpt-4o-mini', provider: :openai_responses)
chat.ask("My name is Alice.")
chat.ask("What's my name?")  # => "Your name is Alice."

Rails Persistence

For conversations that survive app restarts, add a migration:

class AddResponseIdToMessages < ActiveRecord::Migration[7.0]
  def change
    add_column :messages, :response_id, :string
  end
end

Then use normally:

# Day 1
chat = Chat.create!(model_id: 'gpt-4o-mini', provider: :openai_responses)
chat.ask("My name is Alice.")

# Day 2 (after restart)
chat = Chat.find(1)
chat.ask("What's my name?")  # => "Alice"

Built-in Tools

The Responses API provides built-in tools that don't require custom implementation. Pass them as hashes via with_params, or use the BuiltInTools helper module.

Web Search

chat.with_params(tools: [{ type: 'web_search_preview' }])
chat.ask("Latest news about Ruby 3.4?")

# Or with helper
tool = RubyLLM::ResponsesAPI::BuiltInTools.web_search(search_context_size: 'high')
chat.with_params(tools: [tool])

Code Interpreter

Execute Python code in a sandbox:

chat.with_params(tools: [{ type: 'code_interpreter' }])
chat.ask("Calculate the first 20 Fibonacci numbers and plot them")

File Search

Search through uploaded files (requires vector store setup):

chat.with_params(tools: [{ type: 'file_search', vector_store_ids: ['vs_abc123'] }])
chat.ask("What does the documentation say about authentication?")

Shell

Execute commands in hosted containers or local terminal environments. Requires GPT-5 family models.

# Auto-provisioned container (default)
chat = RubyLLM.chat(model: 'gpt-5.2', provider: :openai_responses)
chat.with_params(tools: [{ type: 'shell', environment: { type: 'container_auto' } }])
chat.ask("List all Python files in the project")

# Using helper
tool = RubyLLM::ResponsesAPI::BuiltInTools.shell
chat.with_params(tools: [tool])

# Reuse an existing container
tool = RubyLLM::ResponsesAPI::BuiltInTools.shell(container_id: 'cntr_abc123')

# With networking (allow specific domains)
tool = RubyLLM::ResponsesAPI::BuiltInTools.shell(
  network_policy: {
    type: 'allowlist',
    allowed_domains: ['pypi.org', 'github.com'],
    domain_secrets: [
      { domain: 'github.com', name: 'GITHUB_TOKEN', value: ENV['GITHUB_TOKEN'] }
    ]
  }
)

# With memory limit
tool = RubyLLM::ResponsesAPI::BuiltInTools.shell(memory_limit: '4g')

# Local execution (you handle running commands yourself)
tool = RubyLLM::ResponsesAPI::BuiltInTools.shell(environment_type: 'local')

Apply Patch

Structured diff-based file editing. Requires GPT-5 family models.

chat = RubyLLM.chat(model: 'gpt-5.2', provider: :openai_responses)
chat.with_params(tools: [{ type: 'apply_patch' }])
chat.ask("Add error handling to the User#save method")

# Using helper
tool = RubyLLM::ResponsesAPI::BuiltInTools.apply_patch
chat.with_params(tools: [tool])

Image Generation

chat.with_params(tools: [{ type: 'image_generation' }])
chat.ask("Generate an image of a sunset over mountains")

MCP (Model Context Protocol)

tool = RubyLLM::ResponsesAPI::BuiltInTools.mcp(
  server_label: 'github',
  server_url: 'https://api.github.com/mcp',
  require_approval: 'never'
)
chat.with_params(tools: [tool])

Combining Tools

chat.with_params(tools: [
  { type: 'web_search_preview' },
  { type: 'code_interpreter' },
  { type: 'shell', environment: { type: 'container_auto' } }
])
chat.ask("Research the latest sorting algorithms and benchmark them")

Server-Side Compaction

For multi-hour agent runs, enable server-side compaction to automatically compress conversation context when it exceeds a token threshold:

chat = RubyLLM.chat(model: 'gpt-4o', provider: :openai_responses)

# Pass directly
chat.with_params(
  context_management: [{ type: 'compaction', compact_threshold: 200_000 }]
)

# Or use the helper
chat.with_params(
  **RubyLLM::ResponsesAPI::Compaction.compaction_params(compact_threshold: 150_000)
)

# Now run a long agent loop without worrying about context limits
loop do
  response = chat.ask(next_prompt)
  break if done?(response)
end

When the token count crosses the threshold, the server automatically compacts the conversation. The compacted state is carried forward transparently via previous_response_id.

Containers API

Manage persistent execution environments for the shell tool and code interpreter:

chat = RubyLLM.chat(model: 'gpt-5.2', provider: :openai_responses)
provider = chat.instance_variable_get(:@provider)

# Create a container
container = provider.create_container(
  name: 'my-analysis-env',
  expires_after: { anchor: 'last_active_at', minutes: 60 },
  memory_limit: '4g'
)

# Use it with the shell tool
tool = RubyLLM::ResponsesAPI::BuiltInTools.shell(container_id: container['id'])
chat.with_params(tools: [tool])
chat.ask("Install pandas and run my analysis script")

# List files created in the container
files = provider.list_container_files(container['id'])

# Retrieve a specific file
content = provider.retrieve_container_file_content(container['id'], file_id)

# Clean up
provider.delete_container(container['id'])

Background Mode

For long-running tasks:

chat = RubyLLM.chat(model: 'gpt-4o', provider: :openai_responses)
chat.with_params(background: true)
response = chat.ask("Analyze this large dataset...")

# Poll for completion
provider = chat.instance_variable_get(:@provider)
result = provider.poll_response(response.response_id, interval: 2.0) do |status|
  puts "Status: #{status['status']}"
end

Parsing Built-in Tool Results

When the API returns results from built-in tools, use the parsers to extract structured data:

# Access raw response output (available via response.raw)
output = response.raw.body['output']

# Parse results by tool type
web_results    = RubyLLM::ResponsesAPI::BuiltInTools.parse_web_search_results(output)
code_results   = RubyLLM::ResponsesAPI::BuiltInTools.parse_code_interpreter_results(output)
file_results   = RubyLLM::ResponsesAPI::BuiltInTools.parse_file_search_results(output)
shell_results  = RubyLLM::ResponsesAPI::BuiltInTools.parse_shell_call_results(output)
patch_results  = RubyLLM::ResponsesAPI::BuiltInTools.parse_apply_patch_results(output)
image_results  = RubyLLM::ResponsesAPI::BuiltInTools.parse_image_generation_results(output)
citations      = RubyLLM::ResponsesAPI::BuiltInTools.extract_citations(message_content)

WebSocket Mode

For agentic workflows with many tool-call round trips, WebSocket mode provides lower latency by maintaining a persistent connection instead of HTTP requests per turn.

Requires the websocket-client-simple gem:

gem 'websocket-client-simple'

Usage

Just add transport: :websocket to your params -- the standard chat.ask API works as-is:

chat = RubyLLM.chat(model: 'gpt-4o', provider: :openai_responses)
chat.with_params(transport: :websocket)

chat.ask("Hello!")
chat.ask("What's 2+2?")  # reuses the same WebSocket connection

Streaming works the same way:

chat.ask("Tell me a story") { |chunk| print chunk.content }

Direct WebSocket access

For advanced use cases (raw Responses API format, warmup, explicit connection management):

ws = RubyLLM::ResponsesAPI::WebSocket.new(api_key: ENV['OPENAI_API_KEY'])
ws.connect

ws.create_response(
  model: 'gpt-4o',
  input: [{ type: 'message', role: 'user', content: 'Hello!' }]
) { |chunk| print chunk.content }

# Pre-cache model weights
ws.warmup(model: 'gpt-4o')

ws.disconnect

Why Use the Responses API?

  • Built-in tools - Web search, code execution, file search, shell, apply patch without custom implementation
  • Stateful conversations - OpenAI stores context server-side via previous_response_id
  • Simpler multi-turn - No need to send full message history on each request
  • Server-side compaction - Run multi-hour agent sessions without hitting context limits
  • Containers - Persistent execution environments with networking and file management
  • WebSocket mode - Lower-latency persistent connections for agentic tool-call loops

License

MIT