0.0
The project is in a healthy, maintained state
Ollama provider integration for the LegionIO LLM routing framework.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
 Dependencies

Runtime

>= 1.2.1
>= 0.4.3
 Project Readme

lex-llm-ollama

LegionIO LLM provider extension for Ollama.

This gem lives under Legion::Extensions::Llm::Ollama and depends on lex-llm >= 0.4.3 for shared provider-neutral routing, response normalization, fleet envelopes, responder execution, transport, and registry primitives. It does not carry a runtime legion-llm dependency; legion-llm owns higher-level routing and discovers this provider through normal extension loading.

Load it with require 'legion/extensions/llm/ollama'.

What It Provides

Feature Endpoint Provider Method
Chat completion POST /api/chat Inherited from Lex-llm base provider
Streaming chat POST /api/chat stream_response
List models GET /api/tags list_models
Running models GET /api/ps list_running_models
Model details POST /api/show show_model, fetch_model_detail
Pull models POST /api/pull pull_model
Embeddings POST /api/embed Inherited from Lex-llm base provider
Readiness check GET /api/version readiness(live: false)

All responses pass through the shared Lex-llm normalization layer: Message, Chunk, Embedding, and Model::Info.

File Index

lib/
  legion/extensions/llm/ollama.rb              # Extension entry point, instance discovery, default settings
  legion/extensions/llm/ollama/provider.rb     # Provider — chat, stream, embed, models, offerings
  legion/extensions/llm/ollama/version.rb      # VERSION constant
  legion/extensions/llm/ollama/actors/
    discovery_refresh.rb                       # Periodic model discovery actor (Every, 30min default)
    fleet_worker.rb                            # Fleet request subscription actor (Subscription)
  legion/extensions/llm/ollama/runners/
    fleet_worker.rb                            # Fleet request execution runner (delegates to lex-llm)

Architecture

Legion::Extensions::Llm::Ollama
├── Provider                          # Ollama provider implementation
│   ├── Capabilities                  # Capability predicates (chat, streaming, vision, functions, embeddings)
│   ├── #render_payload               # Build Ollama chat payload from messages, tools, schema
│   ├── #stream_response              # NDJSON streaming via Faraday on_data
│   ├── #discover_offerings           # Build ModelOffering array from live/cached models
│   ├── #fetch_model_detail           # Call /api/show, extract context_window + capabilities
│   ├── #render_embedding_payload     # Build Ollama embedding payload
│   └── (inherited from lex-llm)      # Chat, embedding, connection, registry helpers
├── Actor::DiscoveryRefresh           # Every actor; refreshes model list, repopulates auto rules
├── Actor::FleetWorker                # Subscription actor; gates on respond_to_requests
└── Runners::FleetWorker              # Module function; delegates to ProviderResponder.call

Shared from lex-llm:
├── Fleet::ProviderResponder          # Fleet request execution harness
├── RegistryPublisher                 # Publishes readiness + model events to llm.registry
├── RegistryEventBuilder              # Builds registry event payloads
├── AutoRegistration                  # Self-registers discovered instances
└── CredentialSources                 # Socket probing + setting lookup for instance discovery

Key Classes

Legion::Extensions::Llm::Ollama (module)

  • default_settings — Returns the full settings schema via Lex-llm.provider_settings.
  • provider_class — Returns Provider.
  • discover_instances — Probes 127.0.0.1:11434 socket + reads configured instances from settings.
  • normalize_instance_config(config) — Normalizes endpoint/api_base/ollama_api_base aliases to base_url.
  • registry_publisher — Lazily instantiated RegistryPublisher for the :ollama family.

Provider

Extends Legion::Extensions::Llm::Provider. Implements the Ollama-specific contract:

Method Purpose
api_base Resolves base URL from resolve_base_url, settings, or default 127.0.0.1:11434
completion_url /api/chat
stream_url /api/chat
models_url /api/tags
running_models_url /api/ps
show_model_url /api/show
embedding_url /api/embed
pull_url /api/pull
version_url /api/version
list_running_models GET /api/ps, returns array of running model hashes
readiness(live:) Checks Ollama version endpoint; publishes readiness event when live: true
list_models GET /api/tags, parses and publishes model events via registry
show_model(model) POST /api/show, returns raw model detail hash
fetch_model_detail(model) Wraps show_model; extracts context_window and capabilities
pull_model(model, stream:) POST /api/pull to download a model
discover_offerings(live:) Builds ModelOffering array from live or cached models
render_payload(...) Converts Legion messages/tools to Ollama NDJSON format
stream_response(conn, payload) Posts with Faraday on_data handler for NDJSON streaming
parse_completion_response(resp) Normalizes Ollama chat response to Legion::Extensions::Llm::Message
build_chunk(data) Normalizes a stream NDJSON line to Legion::Extensions::Llm::Chunk
render_embedding_payload(text, model:, dimensions:) Builds embedding request body
parse_embedding_response(resp, ...) Normalizes embedding response to Legion::Extensions::Llm::Embedding

Capabilities (module inside Provider)

Module functions providing capability predicates used during offering construction:

Method Always Returns
chat?(model) true
streaming?(model) true
vision?(model) true
functions?(model) true
embeddings?(model) true

CONTEXT_WINDOWS (constant)

Static fallback map keyed by model name prefix (e.g., 'qwen3' => 128_000). Used when /api/show is unavailable to infer context window. Covers qwen, llama, gemma, mistral, deepseek, phi, command-r, codellama, and embedding families.

Actor::DiscoveryRefresh

An Every actor that runs every 30 minutes (configurable via settings[:extensions][:llm][:ollama][:discovery_interval]). On each tick:

  1. Calls Legion::LLM::Discovery.refresh_discovered_models!(provider: :ollama)
  2. Repopulates auto routing rules if Legion::LLM::Router is available
  3. Invalidates the offerings cache if Legion::LLM::Inventory is available

Actor::FleetWorker

A Subscription actor that starts only when at least one instance has fleet.respond_to_requests: true. Routes messages to the fleet worker runner.

Runners::FleetWorker

A module with handle_fleet_request(payload, delivery:, properties:). Delegates to Legion::Extensions::Llm::Fleet::ProviderResponder.call with the Ollama provider family, provider class, and instance discovery lambda.

Defaults

Legion::Extensions::Llm::Ollama.default_settings
# {
#   enabled: true,
#   provider_family: :ollama,
#   instances: {
#     default: {
#       endpoint: 'http://127.0.0.1:11434',
#       default_model: 'qwen3.5:latest',
#       tier: :local,
#       transport: :http,
#       credentials: {},
#       usage: { inference: true, embedding: true, image: false },
#       limits: { concurrency: 1 },
#       fleet: {
#         enabled: false,
#         respond_to_requests: false,
#         capabilities: %i[chat stream_chat embed],
#         lanes: [],
#         concurrency: 1,
#         queue_suffix: nil
#       }
#     }
#   }
# }

Configuration

Instance Discovery

discover_instances auto-detects a local instance when the socket at 127.0.0.1:11434 is reachable. Additional instances can be defined in settings using any of the recognized endpoint aliases (base_url, endpoint, api_base, ollama_api_base); the extension normalizes all to base_url.

extensions:
  llm:
    ollama:
      discovery_interval: 1800          # DiscoveryRefresh actor interval (seconds)
      instances:
        lab:
          base_url: http://ollama-lab:11434
          default_model: qwen3.5:latest

Fleet Responder

Provider instances can opt in to consuming Legion LLM fleet requests. The fleet actor only starts when at least one instance enables respond_to_requests, and the runner delegates execution to the shared lex-llm responder helper.

extensions:
  llm:
    ollama:
      instances:
        local:
          fleet:
            enabled: true
            respond_to_requests: true
            capabilities:
              - chat
              - stream_chat
              - embed

Ollama API Surface

Legion Method Ollama Route HTTP Verb
Chat /api/chat POST
Stream chat /api/chat POST
List models /api/tags GET
Running models /api/ps GET
Model details /api/show POST
Pull model /api/pull POST
Embeddings /api/embed POST
Readiness /api/version GET

Error Handling

Every rescue block uses handle_exception from Legion::Logging::Helper with explicit level, handled:, and operation: parameters. Connection failures during discover_offerings produce a warn-level log and return an empty array (never raise).

Usage

require 'legion/extensions/llm/ollama'

# Access the module
Legion::Extensions::Llm::Ollama.discover_instances
Legion::Extensions::Llm::Ollama.default_settings

# Create a provider instance (usually done by lex-llm routing)
provider = Legion::Extensions::Llm::Ollama::Provider.new(config:)

# Discover offerings
provider.discover_offerings(live: true)

# Chat
result = provider.chat(messages: [...], model: 'llama3', temperature: 0.7)

# Stream chat
provider.stream_chat(messages: [...], model: 'llama3') do |chunk|
  print chunk.content
end

# Embeddings
embeddings = provider.embed(text: "Hello world", model: 'nomic-embed-text')

Dependencies

Gem Minimum Version Purpose
lex-llm >= 0.4.3 Base provider contract, routing, fleet responder, registry, credential sources
legion-transport >= 1.4.14 Faraday connection management
legion-json JSON serialization (Legion::JSON)
legion-logging Structured logging (Legion::Logging::Helper)
legion-settings Configuration access
legion-extensions Extension framework (Core, Actors::Every, Actors::Subscription)

Development

cd /Users/matt.iverson@optum.com/rubymine/legion/extensions-ai/lex-llm-ollama
bundle install

# Run specs
bundle exec rspec

# Lint (auto-correct)
bundle exec rubocop -A

Spec count: 52 examples across 7 spec files.

License

MIT