emb

A simple yet powerful text embeddings generator.

redis-cli EMB minilm "hello world"
→ \x7c\x8e\x80\xbd...   (384 float32s × 4 bytes)

Install

curl -fsSL https://github.com/elcuervo/emb/raw/main/install.sh | sh

Installs to /usr/local/bin. Set EMB_INSTALL_DIR to change the target:

curl -fsSL https://github.com/elcuervo/emb/raw/main/install.sh | EMB_INSTALL_DIR=~/.local/bin sh

Platforms: macOS (Apple Silicon), Linux (amd64, arm64).

Quick start

# Auto-downloads a model from HuggingFace and starts the server
emb -model-repo Xenova/all-MiniLM-L6-v2

# In another terminal:
redis-cli EMB minilm "hello world"
# → \x7c\x8e\x80\xbd...   (384 float32s × 4 bytes)

Features

Redis protocol: any Redis client works (redis-cli, redis-py, redis-rb, etc.)
ONNX Runtime: fast CPU/GPU inference via CGo bindings
HuggingFace integration: auto-download models and auto-detect dim, max_length, output tensor, pooling strategy from ONNX graph + config.json
Multi-model queries: EMB.MULTI calls different models in one command (MGET-style partial failures)

Quick start

One-liner (no config file)

# Auto-downloads a model from HuggingFace and starts the server
emb -model-repo Xenova/all-MiniLM-L6-v2

# With password authentication
emb -model-repo Xenova/all-MiniLM-L6-v2 -password "hunter2"

# In another terminal:
redis-cli EMB model "hello world"

Two models inline

emb \
  -model minilm -model-onnx ./models/minilm/model.onnx -model-tokenizer ./models/minilm/tokenizer.json \
  -model bge   -model-repo Xenova/bge-small-en-v1.5

redis-cli EMB.MULTI minilm "hello" bge "world"

Local development (with config file)

# Download a model from HuggingFace
just download-model

# Start the server
just dev

# In another terminal:
redis-cli EMB minilm "hello world"

Commands

Command	Description
`EMB <model> <text> [text...]`	Embed one or more texts. Single text → bulk string, multiple → array of bulk strings
`EMB.MODELS`	List loaded models with dimensions and status
`EMB.INFO <model>`	Model details: dim, workers, requests served, avg latency
`EMB.STATS`	Server statistics: uptime, total requests, per-model breakdown
`EMB.MULTI <model> <text> [<model> <text>...]`	Embed texts across different models in one call
`EMB.READY`	Health check: `+OK` (ready), `-ERR loading` (loading), `-ERR draining` (shutting down)
`EMB.HELP`	Command reference
`AUTH <password>`	Authenticate the connection (required if `password` is set in config)
`PING`	PONG

EMB.READY for health checks

EMB.READY returns +OK when the server is ready to serve traffic, or -ERR with a reason (loading, draining, no models). Use this in your load balancer's HTTP/TCP health check or monitoring system:

redis-cli EMB.READY
→ OK

The emb gem exposes Emb.ready? (boolean) and Emb.ready (reason string):

Emb.ready?
# => true

Emb.ready
# => "ready"

EMB.MULTI example

redis-cli EMB.MULTI minilm "hello" siglip2 "a photo of a cat"
1) \x7c\x8e\x80\xbd...   (minilm, 384 floats)
2) \x4a\x9f\x31\xc2...   (siglip2, 768 floats)

Configuration

listen: ":6379"

# password: "hunter2"
# tls_cert: /etc/emb/cert.pem
# tls_key:  /etc/emb/key.pem

models:
  minilm:
    onnx: ./models/minilm/model.onnx

  siglip2:
    onnx: ./models/siglip2/text_model.onnx
    tokenizer: ./models/siglip2/tokenizer.json
    output_tensor: pooler_output
    pooling: none
    normalize: true
    dim: 768

  # Auto-download from HuggingFace
  e5:
    model_repo: intfloat/e5-small-v2
    pooling: none
    normalize: false

Model options

Field	Default	Description
`onnx`	—	Path to ONNX model file
`tokenizer`	`<model-dir>/tokenizer.json`	Path to HuggingFace tokenizer JSON
`model_repo`	—	HuggingFace repo (auto-downloads ONNX + tokenizer)
`dim`	auto-detected	Embedding dimension
`max_length`	auto-detected (or 512)	Max token sequence length
`pooling`	auto-detected	`mean` (3D output) or `none` (2D pre-pooled)
`normalize`	`false`	L2-normalize the output
`output_tensor`	auto-detected	ONNX output tensor name
`preload`	`false`	Load model at startup instead of on first request
`pad_output`	`false`	Pad sequences to `max_length` with trailing zeros (compatibility with legacy implementations that don't pass attention mask)
`workers`	auto-tuned	Number of worker goroutines
`batching`	`{timeout: 1, max_batch: 32}`	Smart batching settings (set `timeout: 0` to disable)

Clients

The response is raw little-endian float32 bytes. Any Redis client works.

Ruby:

require "redis_client"

redis = RedisClient.new(port: 6379)
raw = redis.call("EMB", "minilm", "hello world")
emb = raw.unpack("e*")

Or use the emb gem:

require "emb"

Emb[:minilm]["hello world"]
# => [0.0123, -0.0456, 0.0789, ...]

Python:

import struct
raw = redis.execute_command("EMB", "minilm", "hello world")
emb = list(struct.unpack(f"<{len(raw)//4}f", raw))

Go:

var vec []float32
binary.Read(bytes.NewReader(raw), binary.LittleEndian, &vec)

Ruby Gems

Ruby gems for emb:

emb — Client library with connection pooling, proxy, and multi-model support. Auto-decodes float32 responses. README
emb-server — Precompiled server binary. Install and run emb directly. README

Development

Commands

just format          # Format all Go code
just lint            # Run linters
just test            # Run tests
just bench           # Run Go benchmarks
just bench-all       # Run redis-benchmark suite (see [BENCHMARK.md](BENCHMARK.md))
just build           # Build the emb binary
just dev             # Build and run the server
just download-model  # Download a model from HuggingFace

Nix

A flake.nix is provided for reproducible development shells:

nix develop

This provides Go, ONNX Runtime, golangci-lint, just, and all CGo configuration.

Docker

# Run with a model mounted:
docker run -v ./models:/models elcuervo/emb \
  -config /models/config.yaml