Project

siglip2

0.0
The project is in a healthy, maintained state
Ruby implementation of Google's SigLIP2 model for creating text and image embeddings. Uses ONNX models from HuggingFace onnx-community.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
 Dependencies

Runtime

 Project Readme

SigLIP2-rb

CI Gem Version

Ruby implementation of Google's SigLIP2 (Sigmoid Loss for Language Image Pre-Training 2) for creating text and image embeddings. Uses ONNX models from HuggingFace onnx-community.

What is this for?

SigLIP2 creates numerical representations (embeddings) of images and text in the same vector space. This means you can directly compare text with images using cosine similarity.

SigLIP2 is multilingual - it understands text in multiple languages out of the box, so you can search images using queries in English, Polish, German, Japanese, etc.

Common use cases:

  • Image search - find images matching a text query without manual tagging
  • Content moderation - detect unwanted content by comparing against text descriptions ("violence", "nudity", etc.)
  • Image clustering - group similar images by comparing their embeddings
  • Duplicate detection - find near-duplicate images in large collections
  • Auto-tagging - assign labels to images by finding best matching text descriptions

Installation

Add this line to your application's Gemfile:

gem 'siglip2'

Or install directly:

gem install siglip2

Requirements

  • Ruby >= 3.0.0
  • ImageMagick (for image processing)

Usage

Ruby API

require 'siglip2'

# Create model with default settings (base-patch16-224)
model = Siglip2::Model.new

# Or specify a different model and quantization
model = Siglip2::Model.new(
  model_name: "large-patch16-256",
  quantization: "int8"
)

# Encode text
text_embedding = model.encode_text("a photo of a cat")

# Encode image
image_embedding = model.encode_image("cat.jpg")

# Calculate similarity
score = model.similarity("a photo of a cat", "cat.jpg")
puts "Similarity: #{score}"

# Batch similarity
texts = ["a cat", "a dog", "a car"]
images = ["image1.jpg", "image2.jpg"]
scores = model.batch_similarity(texts, images)

CLI Tools

Embed text

siglip2-embed-text "a photo of a cat"
siglip2-embed-text -m large-patch16-256 "a photo of a cat"
siglip2-embed-text -q int8 "a photo of a cat"
siglip2-embed-text -f csv "a photo of a cat"

Embed image

siglip2-embed-image cat.jpg
siglip2-embed-image -m large-patch16-256 cat.jpg
siglip2-embed-image -q int8 cat.jpg

Calculate similarity

siglip2-similarity "a photo of a cat" cat.jpg

List available models

siglip2-embed-text -l

List quantization options

siglip2-embed-text -L

Use Case Examples

Image Search

model = Siglip2::Model.new

# Pre-compute embeddings for all images (store in database)
image_embeddings = images.map { |path| [path, model.encode_image(path)] }

# Search by text query
query_embedding = model.encode_text("sunset over mountains")
results = image_embeddings
  .map { |path, emb| [path, dot_product(query_embedding, emb)] }
  .sort_by { |_, score| -score }
  .first(10)

Content Moderation

model = Siglip2::Model.new

# Define unwanted content categories
categories = ["violence", "gore", "nudity", "drugs"]
category_embeddings = categories.map { |c| model.encode_text(c) }

# Check uploaded image
image_emb = model.encode_image(uploaded_file)
scores = category_embeddings.map { |ce| dot_product(image_emb, ce) }

if scores.max > 0.25  # threshold
  flag_for_review(uploaded_file)
end

Auto-tagging

model = Siglip2::Model.new

tags = ["cat", "dog", "car", "landscape", "portrait", "food"]
tag_embeddings = tags.map { |t| [t, model.encode_text("a photo of #{t}")] }

image_emb = model.encode_image("photo.jpg")
matched_tags = tag_embeddings
  .map { |tag, emb| [tag, dot_product(image_emb, emb)] }
  .select { |_, score| score > 0.2 }
  .map(&:first)
# => ["cat"]

Available Models

Model Image Size Description
base-patch16-224 224x224 Default, smallest
base-patch16-256 256x256
base-patch16-384 384x384
base-patch16-512 512x512
base-patch32-256 256x256 Larger patch size
base-patch16-naflex 224x224 Flexible resolution
large-patch16-256 256x256 Larger model
large-patch16-384 384x384
large-patch16-512 512x512
giant-opt-patch16-256 256x256 Optimized giant
giant-opt-patch16-384 384x384
so400m-patch14-224 224x224 400M parameters
so400m-patch14-384 384x384
so400m-patch16-256 256x256
so400m-patch16-384 384x384
so400m-patch16-512 512x512

Quantization Options

Option Description
fp32 Full precision (default)
fp16 Half precision
int8 8-bit integer
uint8 Unsigned 8-bit integer
q4 4-bit quantization
q4f16 4-bit with fp16
bnb4 BitsAndBytes 4-bit

Model Storage

Models are automatically downloaded on first use and stored in ~/.siglip2_models/. You can change this location:

Siglip2.models_dir = "/path/to/models"

License

MIT License