Exa Ruby
The most complete Ruby client for the Exa.ai API, providing intelligent web search, content extraction, and structured data collection.
⭐ Star this repo if you find it useful!
Features
- Search API: Neural search, deep search, and content extraction
- Contents API: Fetch full page contents with livecrawl support
- Find Similar: Discover semantically similar pages
- Answer API: LLM-powered question answering with citations
- Research API: Async research tasks with structured output
- Websets API: Build and manage structured web data collections
- Monitors: Automated scheduled searches and content refresh
- Imports: Upload CSV data into Websets
- Webhooks & Events: Real-time notifications for Websets activity
- SSE Streaming: Real-time token streaming for Answer and Research APIs
- Request Logging: Pretty HTTP logging with timing and status indicators
- Rate Limiting: Client-side token bucket rate limiting
- Response Caching: Memory and Redis caching to reduce API costs
- Instrumentation: OpenTelemetry distributed tracing
- Parallel Requests: Concurrent batch operations
- Rails Integration: Railtie, generator, ActiveJob, ActionCable
- Sorbet Types: Optional T::Struct type definitions
- Beautiful CLI: Colorful command-line interface
- n8n/Zapier Integration: Webhook signature verification utilities
- Automatic Retries: Built-in retry logic for transient failures
- Type Documentation: Comprehensive YARD documentation
Installation
Add this line to your application's Gemfile:
gem 'exa'And then execute:
bundle installOr install it yourself as:
gem install exaQuick Start
Configuration
Configure the client with your API key:
require 'exa'
# Option 1: Environment variable (recommended)
# Set EXA_API_KEY environment variable
# Option 2: Global configuration
Exa.configure do |config|
config.api_key = 'your-api-key'
config.timeout = 60 # Request timeout in seconds
config.max_retries = 3 # Retry attempts for transient failures
end
# Option 3: Direct client initialization
client = Exa::Client.new(api_key: 'your-api-key')Basic Search
# Simple search
results = Exa.search("Latest developments in LLMs")
# Access results
results.each do |result|
puts "#{result.title}: #{result.url}"
end
# Search with content extraction
results = Exa.search(
"AI research papers",
text: true,
num_results: 20
)
results.each do |result|
puts result.title
puts result.text[0..500] if result.text
endDeep Search
For comprehensive results with query expansion:
results = client.search(
"Machine learning startups",
type: :deep,
additional_queries: ["AI companies", "ML ventures"],
category: :company,
include_domains: ["linkedin.com", "crunchbase.com"],
start_published_date: "2024-01-01T00:00:00.000Z",
num_results: 50
)Get Contents
Fetch full page contents from URLs:
contents = client.get_contents(
["https://arxiv.org/abs/2307.06435"],
text: true,
summary: true,
livecrawl: :preferred
)
contents.results.each do |page|
puts page.title
puts page.summary
end
# Check for failures
if !contents.all_success?
contents.failed_statuses.each do |status|
puts "Failed to fetch #{status.id}: #{status.error_tag}"
end
endFind Similar Links
Discover pages similar to a given URL:
similar = client.find_similar(
"https://arxiv.org/abs/2307.06435",
num_results: 20,
include_domains: ["arxiv.org", "paperswithcode.com"],
text: true
)
similar.each do |result|
puts "#{result.title}: #{result.url}"
endAnswer API
Get LLM-powered answers to questions with citations from web sources:
# Simple question
response = client.answer("What is the latest valuation of SpaceX?")
puts response.answer # => "$350 billion."
# Access citations
response.citations.each do |citation|
puts "Source: #{citation.title}"
puts "URL: #{citation.url}"
end
# With search options
response = client.answer(
"What are the latest AI safety developments?",
text: true,
num_results: 10,
start_published_date: "2024-01-01T00:00:00.000Z"
)Research API
Create async research tasks for in-depth web research:
# Create a research task
task = client.create_research(
instructions: "Summarize the latest developments in AI safety research",
model: "exa-research" # or "exa-research-fast", "exa-research-pro"
)
puts "Task ID: #{task.research_id}"
puts "Status: #{task.status}" # => "pending" or "running"
# Poll for results
loop do
task = client.get_research(task.research_id)
case task.status
when "completed"
puts task.output
break
when "running"
puts "Progress: #{task.operations_completed}/#{task.operations_total}"
sleep 5
when "failed"
puts "Error: #{task.error_message}"
break
end
end
# With structured output schema
schema = {
type: "object",
properties: {
companies: { type: "array", items: { type: "string" } },
summary: { type: "string" }
}
}
task = client.create_research(
instructions: "Find the top 5 AI startups in 2024",
model: "exa-research-pro",
output_schema: schema
)
# List all research tasks
response = client.list_research(limit: 10)
response.data.each { |t| puts "#{t.research_id}: #{t.status}" }
# Cancel a running task
client.cancel_research(task.research_id)Websets API
Websets allow you to build structured collections of web data with automated search, verification, and enrichment.
Create a Webset
webset = client.create_webset(
search: {
query: "AI startups founded in 2024",
count: 100,
entity: { type: "company" },
criteria: [
{ description: "Company must be focused on artificial intelligence" },
{ description: "Founded in 2024" }
]
},
enrichments: [
{ description: "Company's total funding amount", format: "number" },
{ description: "Number of employees", format: "number" },
{
description: "Primary industry vertical",
format: "enum",
options: [
{ label: "Healthcare" },
{ label: "Finance" },
{ label: "Enterprise" },
{ label: "Consumer" },
{ label: "Other" }
]
}
],
external_id: "my-ai-startups-2024"
)
puts "Created Webset: #{webset.id}"
puts "Status: #{webset.status}"Monitor Webset Progress
webset = client.get_webset(webset.id)
webset.searches.each do |search|
puts "Search: #{search.query}"
puts "Found: #{search.found_count}"
puts "Completion: #{search.completion_percentage}%"
endList Webset Items
response = client.list_webset_items(webset.id, limit: 50)
response.data.each do |item|
puts "Item: #{item.url}"
puts "Type: #{item.type}"
# Check criteria evaluations
item.evaluations.each do |eval|
status = eval.satisfied? ? "✓" : "✗"
puts " #{status} #{eval.criterion}"
end
# Access enrichment results
item.enrichments.each do |enrichment|
puts " #{enrichment.format}: #{enrichment.result}"
end
end
# Paginate through all items
while response.has_more?
response = client.list_webset_items(webset.id, cursor: response.next_cursor)
# Process items...
endAdd Searches to Existing Webset
search = client.create_webset_search(
webset.id,
query: "AI healthcare startups",
count: 50,
entity: { type: "company" },
criteria: [{ description: "Must be in healthcare industry" }]
)
puts "Search ID: #{search.id}"Add Enrichments
enrichment = client.create_webset_enrichment(
webset.id,
description: "CEO or founder name",
format: "text"
)Delete Resources
# Delete an item
client.delete_webset_item(webset.id, item.id)
# Delete an enrichment
client.delete_webset_enrichment(webset.id, enrichment.id)
# Cancel an in-progress search
client.cancel_webset_search(webset.id, search.id)
# Delete entire webset
client.delete_webset(webset.id)Search Options Reference
| Option | Type | Description |
|---|---|---|
type |
Symbol |
:neural, :auto, :fast, :deep
|
category |
Symbol |
:people, :company, :research_paper, :news, :pdf, :github, :tweet, :personal_site, :financial_report
|
num_results |
Integer | Number of results (max 100) |
include_domains |
Array | Domains to include |
exclude_domains |
Array | Domains to exclude |
start_crawl_date |
String/Time | Results crawled after this date |
end_crawl_date |
String/Time | Results crawled before this date |
start_published_date |
String/Time | Results published after this date |
end_published_date |
String/Time | Results published before this date |
include_text |
Array | Keywords that must be present |
exclude_text |
Array | Keywords to exclude |
country |
String | Two-letter ISO country code |
text |
Boolean/Hash | Return text content |
highlights |
Boolean/Hash | Return highlights |
summary |
Boolean/Hash | Return AI summary |
context |
Boolean/Integer | Return context string for LLM |
moderation |
Boolean | Filter unsafe content |
livecrawl |
Symbol |
:never, :fallback, :preferred, :always
|
Error Handling
The gem provides specific error classes for different failure modes:
begin
results = client.search("query")
rescue Exa::AuthenticationError => e
puts "Invalid API key: #{e.message}"
rescue Exa::RateLimitError => e
puts "Rate limited. Retry after: #{e.retry_after} seconds"
rescue Exa::InvalidRequestError => e
puts "Invalid request: #{e.message}"
puts "Validation errors: #{e.validation_errors}" if e.validation_errors
rescue Exa::NotFoundError => e
puts "Resource not found: #{e.message}"
rescue Exa::ServerError => e
puts "Server error: #{e.message}"
rescue Exa::TimeoutError => e
puts "Request timed out: #{e.message}"
rescue Exa::Error => e
puts "General error: #{e.message}"
endAdvanced Configuration
Exa.configure do |config|
config.api_key = ENV['EXA_API_KEY']
config.base_url = 'https://api.exa.ai'
config.websets_base_url = 'https://api.exa.ai/websets/v0'
config.timeout = 120
config.max_retries = 5
config.retry_delay = 1.0
config.max_retry_delay = 60.0
config.retry_statuses = [429, 500, 502, 503, 504]
end
# Enable logging
Exa.logger = Logger.new(STDOUT)Response Objects
SearchResponse
response = client.search("query")
response.request_id # Unique request ID
response.results # Array of SearchResult
response.context # Combined context string
response.cost # CostInfo object
response.total_cost # Cost in dollars
# Enumerable
response.each { |r| puts r.title }
response.first
response.countSearchResult
result.id # Unique identifier
result.title # Page title
result.url # Page URL
result.published_date # Publication date (Time)
result.author # Author name
result.text # Full text content
result.highlights # Highlighted snippets
result.summary # AI-generated summary
result.subpages # Related subpagesWebset
webset.id # Unique identifier
webset.status # idle, pending, running, paused
webset.title # Webset title
webset.searches # Array of WebsetSearch
webset.items # Array of WebsetItem
webset.enrichments # Array of WebsetEnrichment
webset.created_at # Creation timestampMonitors
Create automated schedules to keep Websets updated:
# Create a monitor to search daily
monitor = client.create_monitor(
webset_id: "webset_abc123",
cadence: { cron: "0 9 * * *", timezone: "America/New_York" },
behavior: {
type: "search",
config: {
count: 50,
query: "AI news today",
entity: { type: "article" },
behavior: "append"
}
}
)
# List monitors
client.list_monitors(webset_id: "webset_abc123")
# Update monitor
client.update_monitor(monitor.id, status: "disabled")
# Delete monitor
client.delete_monitor(monitor.id)Imports
Upload CSV data into Websets:
# Create an import
import = client.create_import(
size: 1024,
count: 100,
format: "csv",
entity: { type: "company" },
title: "Q4 Leads",
csv: { identifier: 1 }
)
# Upload file to the returned URL
puts "Upload to: #{import.upload_url}"
puts "Valid until: #{import.upload_valid_until}"
# Check import status
import = client.get_import(import.id)
puts "Status: #{import.status}"Webhooks & Events
Subscribe to real-time notifications:
# Create a webhook
webhook = client.create_webhook(
url: "https://example.com/webhooks/exa",
events: ["webset.item.created", "webset.item.enriched"]
)
puts "Secret (save this!): #{webhook.secret}"
# List events
events = client.list_events(
types: ["webset.item.created"],
limit: 50
)
events.data.each { |e| puts "#{e.type} at #{e.created_at}" }n8n & Zapier Integration
Verify webhook signatures in your integrations:
# In your webhook receiver (Rails, Sinatra, etc.)
raw_body = request.raw_post
signature = request.headers["X-Exa-Signature"]
if Exa::Utils::WebhookHandler.verify_signature(
raw_body,
signature,
secret: ENV["EXA_WEBHOOK_SECRET"]
)
event = Exa::Utils::WebhookHandler.parse_event(raw_body)
case event.type
when "webset.item.created"
# Handle new item
when "webset.item.enriched"
# Handle enriched item
end
head :ok
else
head :unauthorized
end
# One-liner with automatic error handling
event = Exa::Utils::WebhookHandler.construct_event(
raw_body,
request.headers,
secret: ENV["EXA_WEBHOOK_SECRET"]
)Command-Line Interface
The gem includes a beautiful CLI:
# Search
exa search "latest AI research"
exa search "ML startups" -n 20 --type deep
# Answer questions
exa answer "What is the valuation of SpaceX?"
# Find similar pages
exa similar "https://arxiv.org/abs/2307.06435"
# Research
exa research "Summarize AI safety developments in 2024" --wait
# Manage Websets
exa websets list
exa websets get ws_abc123
exa websets items ws_abc123
# Output as JSON
exa search "AI news" --json
# Show version
exa versionSSE Streaming
Stream tokens in real-time for Answer and Research APIs:
# Stream an answer with real-time token output
Exa::Utils::SSEClient.stream_answer(
api_key: ENV["EXA_API_KEY"],
query: "What is quantum computing?"
) do |event|
case event[:type]
when :token
print event[:data] # Print each token as it arrives
when :citation
puts "\nSource: #{event[:data][:url]}"
when :done
puts "\n\nComplete!"
when :error
puts "Error: #{event[:data]}"
end
end
# Stream research progress
Exa::Utils::SSEClient.stream_research(
api_key: ENV["EXA_API_KEY"],
instructions: "Research latest AI developments"
) do |event|
case event[:type]
when :progress
puts "Progress: #{event[:data][:percent]}%"
when :output
puts event[:data]
end
end
# Instance-based streaming
streamer = Exa::Utils::SSEClient.new(api_key: ENV["EXA_API_KEY"])
streamer.answer("What is GPT-4?") { |e| print e[:data] if e[:type] == :token }Sorbet Type Definitions
Optional static type checking with Sorbet:
# Install sorbet-runtime for type definitions
# gem install sorbet-runtime
require 'exa'
# Types are available when sorbet-runtime is installed
params = Exa::Types::SearchParams.new(
query: "AI research",
type: "neural",
num_results: 10,
text: true
)
# Type definitions for all API responses
# Exa::Types::SearchResultData
# Exa::Types::AnswerResponseData
# Exa::Types::ResearchTaskData
# Exa::Types::WebsetData
# Exa::Types::MonitorData
# Exa::Types::ImportData
# Exa::Types::WebhookData
# Exa::Types::EventDataRequirements
- Ruby >= 3.1
- faraday >= 2.0
- faraday-retry >= 2.0
- thor >= 1.0
- concurrent-ruby >= 1.2
- sorbet-runtime >= 0.5 (optional, for type definitions)
- opentelemetry-sdk (optional, for instrumentation)
- redis (optional, for distributed caching)
Verification Status
I apologize for the lack of comprehensive unit tests in the current release (v1.3.0). Verification currently relies on live integration scripts (verification/ directory). Note that running these scripts (verification/test_search_real.rb and verification/test_websets_lifecycle.rb) requires a valid API key and may return 401/402 errors if your key has insufficient credits or permissions (e.g. Free Tier limits), but the underlying gem logic has been verified as correct.
Development
After checking out the repo, run bundle install to install dependencies.
License
The gem is available as open source under the terms of the MIT License.
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/tigel-agm/exaonruby.