No release in over 3 years
Azure AI Foundry and Azure OpenAI hosted provider integration for LegionIO LLM routing.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
 Dependencies

Runtime

>= 1.2.1
>= 0.1.5
 Project Readme

lex-llm-azure-foundry

LegionIO LLM provider extension for Azure AI Foundry Models and Azure OpenAI hosted deployments.

This gem lives under Legion::Extensions::Llm::AzureFoundry and depends on lex-llm >= 0.1.5 for shared provider-neutral routing, fleet, model-offering, readiness, canonical-alias, and schema primitives.

Load it with require 'legion/extensions/llm/azure_foundry'.

What It Provides

  • Legion::Extensions::Llm::Provider registration as :azure_foundry
  • Azure AI Foundry model inference chat completions through POST /models/chat/completions?api-version=...
  • Azure AI Foundry model inference embeddings through POST /models/embeddings?api-version=...
  • Azure AI Foundry model info health check through GET /models/info?api-version=... when live: true
  • Azure OpenAI v1-compatible endpoint support through /openai/v1/chat/completions and /openai/v1/embeddings
  • deployment-name-preserving routing offerings for hosted Azure deployments
  • explicit model_family and canonical_model_alias metadata for deployments whose base model cannot be proven from Azure metadata
  • offline-first discovery from configured deployments
  • shared OpenAI-compatible request and response mapping via Legion::Extensions::Llm::Provider::OpenAICompatible
  • conservative token-counting metadata when no portable Azure token-counting REST endpoint is configured

API Contract

The implementation follows Microsoft Learn REST documentation for Azure AI Foundry Models:

  • Azure AI Foundry model inference endpoints use deployment names as the request model.
  • The model inference endpoint supports chat completions and embeddings.
  • The documented model-info endpoint is used only for explicit live health checks.
  • Azure deployment metadata is not assumed to reliably prove base model family or version, so routing metadata should be configured explicitly.

Defaults

Legion::Extensions::Llm::AzureFoundry.default_settings
# {
#   provider_family: :azure_foundry,
#   discovery: { enabled: true, live: false },
#   instances: {
#     default: {
#       endpoint: "https://<resource>.services.ai.azure.com",
#       api_version: "2024-05-01-preview",
#       surface: :model_inference,
#       tier: :frontier,
#       transport: :http,
#       credentials: {
#         api_key: "env://AZURE_INFERENCE_CREDENTIAL",
#         bearer_token: "env://AZURE_FOUNDRY_BEARER_TOKEN",
#         entra_scope: "https://cognitiveservices.azure.com/.default"
#       },
#       deployments: [],
#       usage: { inference: true, embedding: true, token_counting: false },
#       limits: { concurrency: 4 }
#     }
#   }
# }

Configuration

Legion::Extensions::Llm.configure do |config|
  config.azure_foundry_endpoint = ENV.fetch("AZURE_FOUNDRY_ENDPOINT")
  config.azure_foundry_api_key = ENV["AZURE_INFERENCE_CREDENTIAL"]
  config.azure_foundry_bearer_token = ENV["AZURE_FOUNDRY_BEARER_TOKEN"]
  config.azure_foundry_api_version = "2024-05-01-preview"
  config.azure_foundry_surface = :model_inference
  config.azure_foundry_deployments = [
    {
      deployment: "gpt-4o-prod",
      model_family: :openai,
      canonical_model_alias: "gpt-4o",
      usage_type: :inference
    },
    {
      deployment: "mistral-large-prod",
      model_family: :mistral,
      canonical_model_alias: "mistral-large",
      usage_type: :inference
    },
    {
      deployment: "embedding-prod",
      model_family: :openai,
      canonical_model_alias: "text-embedding-3-small",
      usage_type: :embedding
    }
  ]
end

Use config.azure_foundry_surface = :openai_v1 when the target endpoint should be treated as the OpenAI v1-compatible Azure route. The provider appends /openai/v1 when the configured endpoint does not already include it.

Provider Methods

provider = Legion::Extensions::Llm::AzureFoundry.provider_class.new(Legion::Extensions::Llm.config)

provider.discover_offerings(live: false)
provider.offering_for(model: "gpt-4o-prod", model_family: :openai, canonical_model_alias: "gpt-4o")
provider.health(live: false)
provider.chat(messages, model: "gpt-4o-prod")
provider.stream(messages, model: "gpt-4o-prod") { |chunk| puts chunk.content }
provider.embed(["hello"], model: "embedding-prod")
provider.count_tokens(messages, model: "gpt-4o-prod")

discover_offerings(live: false) never calls Azure. It maps configured deployments into Legion::Extensions::Llm::Routing::ModelOffering values with provider_family: :azure_foundry.

health(live: true) calls the documented model-info endpoint for the configured model-inference surface. Keep live: false for startup paths and tests that must not require Azure.

count_tokens returns a structured unsupported result by default because the Microsoft REST contract used here does not define a portable token-counting endpoint across Azure AI Foundry deployments.

Routing Metadata

Azure deployments are aliases. A deployment name can hide provider, model, and version details, so this extension preserves the deployment name as model and treats canonical_model_alias and model_family as routing metadata.

Supported model_family values are intentionally open-ended symbols, including:

  • :openai
  • :mistral
  • :meta
  • :xai
  • :anthropic
  • :microsoft

When model_family or canonical_model_alias is missing, offerings include requires_explicit_model_metadata: true.