emb
A simple yet powerful text embeddings generator.
redis-cli EMB minilm "hello world"
→ \x7c\x8e\x80\xbd... (384 float32s × 4 bytes)
Install
curl -fsSL https://github.com/elcuervo/emb/raw/main/install.sh | shInstalls to /usr/local/bin. Set EMB_INSTALL_DIR to change the target:
curl -fsSL https://github.com/elcuervo/emb/raw/main/install.sh | EMB_INSTALL_DIR=~/.local/bin shPlatforms: macOS (Apple Silicon), Linux (amd64, arm64).
Quick start
# Auto-downloads a model from HuggingFace and starts the server
emb -model-repo Xenova/all-MiniLM-L6-v2
# In another terminal:
redis-cli EMB minilm "hello world"
# → \x7c\x8e\x80\xbd... (384 float32s × 4 bytes)Features
-
Redis protocol: any Redis client works (
redis-cli,redis-py,redis-rb, etc.) - ONNX Runtime: fast CPU/GPU inference via CGo bindings
- HuggingFace integration: auto-download models and auto-detect dim, max_length, output tensor, pooling strategy from ONNX graph + config.json
-
Multi-model queries:
EMB.MULTIcalls different models in one command (MGET-style partial failures)
Quick start
One-liner (no config file)
# Auto-downloads a model from HuggingFace and starts the server
emb -model-repo Xenova/all-MiniLM-L6-v2
# With password authentication
emb -model-repo Xenova/all-MiniLM-L6-v2 -password "hunter2"
# In another terminal:
redis-cli EMB model "hello world"Two models inline
emb \
-model minilm -model-onnx ./models/minilm/model.onnx -model-tokenizer ./models/minilm/tokenizer.json \
-model bge -model-repo Xenova/bge-small-en-v1.5
redis-cli EMB.MULTI minilm "hello" bge "world"Local development (with config file)
# Download a model from HuggingFace
just download-model
# Start the server
just dev
# In another terminal:
redis-cli EMB minilm "hello world"Commands
| Command | Description |
|---|---|
EMB <model> <text> [text...] |
Embed one or more texts. Single text → bulk string, multiple → array of bulk strings |
EMB.MODELS |
List loaded models with dimensions and status |
EMB.INFO <model> |
Model details: dim, workers, requests served, avg latency |
EMB.STATS |
Server statistics: uptime, total requests, per-model breakdown |
EMB.MULTI <model> <text> [<model> <text>...] |
Embed texts across different models in one call |
EMB.READY |
Health check: +OK (ready), -ERR loading (loading), -ERR draining (shutting down) |
EMB.HELP |
Command reference |
AUTH <password> |
Authenticate the connection (required if password is set in config) |
PING |
PONG |
EMB.READY for health checks
EMB.READY returns +OK when the server is ready to serve traffic, or -ERR with a reason (loading, draining, no models). Use this in your load balancer's HTTP/TCP health check or monitoring system:
redis-cli EMB.READY
→ OK
The emb gem exposes Emb.ready? (boolean) and Emb.ready (reason string):
Emb.ready?
# => true
Emb.ready
# => "ready"EMB.MULTI example
redis-cli EMB.MULTI minilm "hello" siglip2 "a photo of a cat"
1) \x7c\x8e\x80\xbd... (minilm, 384 floats)
2) \x4a\x9f\x31\xc2... (siglip2, 768 floats)
Configuration
listen: ":6379"
# password: "hunter2"
# tls_cert: /etc/emb/cert.pem
# tls_key: /etc/emb/key.pem
models:
minilm:
onnx: ./models/minilm/model.onnx
siglip2:
onnx: ./models/siglip2/text_model.onnx
tokenizer: ./models/siglip2/tokenizer.json
output_tensor: pooler_output
pooling: none
normalize: true
dim: 768
# Auto-download from HuggingFace
e5:
model_repo: intfloat/e5-small-v2
pooling: none
normalize: falseModel options
| Field | Default | Description |
|---|---|---|
onnx |
— | Path to ONNX model file |
tokenizer |
<model-dir>/tokenizer.json |
Path to HuggingFace tokenizer JSON |
model_repo |
— | HuggingFace repo (auto-downloads ONNX + tokenizer) |
dim |
auto-detected | Embedding dimension |
max_length |
auto-detected (or 512) | Max token sequence length |
pooling |
auto-detected |
mean (3D output) or none (2D pre-pooled) |
normalize |
false |
L2-normalize the output |
output_tensor |
auto-detected | ONNX output tensor name |
preload |
false |
Load model at startup instead of on first request |
pad_output |
false |
Pad sequences to max_length with trailing zeros (compatibility with legacy implementations that don't pass attention mask) |
workers |
auto-tuned | Number of worker goroutines |
batching |
{timeout: 1, max_batch: 32} |
Smart batching settings (set timeout: 0 to disable) |
Clients
The response is raw little-endian float32 bytes. Any Redis client works.
Ruby:
require "redis_client"
redis = RedisClient.new(port: 6379)
raw = redis.call("EMB", "minilm", "hello world")
emb = raw.unpack("e*")Or use the emb gem:
require "emb"
Emb[:minilm]["hello world"]
# => [0.0123, -0.0456, 0.0789, ...]Python:
import struct
raw = redis.execute_command("EMB", "minilm", "hello world")
emb = list(struct.unpack(f"<{len(raw)//4}f", raw))Go:
var vec []float32
binary.Read(bytes.NewReader(raw), binary.LittleEndian, &vec)Ruby Gems
Ruby gems for emb:
-
emb— Client library with connection pooling, proxy, and multi-model support. Auto-decodes float32 responses. README -
emb-server— Precompiled server binary. Install and runembdirectly. README
Development
Commands
just format # Format all Go code
just lint # Run linters
just test # Run tests
just bench # Run Go benchmarks
just bench-all # Run redis-benchmark suite (see [BENCHMARK.md](BENCHMARK.md))
just build # Build the emb binary
just dev # Build and run the server
just download-model # Download a model from HuggingFaceNix
A flake.nix is provided for reproducible development shells:
nix developThis provides Go, ONNX Runtime, golangci-lint, just, and all CGo configuration.
Docker
# Run with a model mounted:
docker run -v ./models:/models elcuervo/emb \
-config /models/config.yaml