Exa Ruby

The most complete Ruby client for the Exa.ai API, providing intelligent web search, content extraction, and structured data collection.

⭐ Star this repo if you find it useful!

Features

Search API: Neural search, deep search, and content extraction
Contents API: Fetch full page contents with livecrawl support
Find Similar: Discover semantically similar pages
Answer API: LLM-powered question answering with citations
Research API: Async research tasks with structured output
Websets API: Build and manage structured web data collections
Monitors: Automated scheduled searches and content refresh
Imports: Upload CSV data into Websets
Webhooks & Events: Real-time notifications for Websets activity
SSE Streaming: Real-time token streaming for Answer and Research APIs
Request Logging: Pretty HTTP logging with timing and status indicators
Rate Limiting: Client-side token bucket rate limiting
Response Caching: Memory and Redis caching to reduce API costs
Instrumentation: OpenTelemetry distributed tracing
Parallel Requests: Concurrent batch operations
Rails Integration: Railtie, generator, ActiveJob, ActionCable
Sorbet Types: Optional T::Struct type definitions
Beautiful CLI: Colorful command-line interface
n8n/Zapier Integration: Webhook signature verification utilities
Automatic Retries: Built-in retry logic for transient failures
Type Documentation: Comprehensive YARD documentation

Installation

Add this line to your application's Gemfile:

gem 'exa'

And then execute:

bundle install

Or install it yourself as:

gem install exa

Quick Start

Configuration

Configure the client with your API key:

require 'exa'

# Option 1: Environment variable (recommended)
# Set EXA_API_KEY environment variable

# Option 2: Global configuration
Exa.configure do |config|
  config.api_key = 'your-api-key'
  config.timeout = 60          # Request timeout in seconds
  config.max_retries = 3       # Retry attempts for transient failures
end

# Option 3: Direct client initialization
client = Exa::Client.new(api_key: 'your-api-key')

Basic Search

# Simple search
results = Exa.search("Latest developments in LLMs")

# Access results
results.each do |result|
  puts "#{result.title}: #{result.url}"
end

# Search with content extraction
results = Exa.search(
  "AI research papers",
  text: true,
  num_results: 20
)

results.each do |result|
  puts result.title
  puts result.text[0..500] if result.text
end

Deep Search

For comprehensive results with query expansion:

results = client.search(
  "Machine learning startups",
  type: :deep,
  additional_queries: ["AI companies", "ML ventures"],
  category: :company,
  include_domains: ["linkedin.com", "crunchbase.com"],
  start_published_date: "2024-01-01T00:00:00.000Z",
  num_results: 50
)

Get Contents

Fetch full page contents from URLs:

contents = client.get_contents(
  ["https://arxiv.org/abs/2307.06435"],
  text: true,
  summary: true,
  livecrawl: :preferred
)

contents.results.each do |page|
  puts page.title
  puts page.summary
end

# Check for failures
if !contents.all_success?
  contents.failed_statuses.each do |status|
    puts "Failed to fetch #{status.id}: #{status.error_tag}"
  end
end

Find Similar Links

Discover pages similar to a given URL:

similar = client.find_similar(
  "https://arxiv.org/abs/2307.06435",
  num_results: 20,
  include_domains: ["arxiv.org", "paperswithcode.com"],
  text: true
)

similar.each do |result|
  puts "#{result.title}: #{result.url}"
end

Answer API

Get LLM-powered answers to questions with citations from web sources:

# Simple question
response = client.answer("What is the latest valuation of SpaceX?")
puts response.answer  # => "$350 billion."

# Access citations
response.citations.each do |citation|
  puts "Source: #{citation.title}"
  puts "URL: #{citation.url}"
end

# With search options
response = client.answer(
  "What are the latest AI safety developments?",
  text: true,
  num_results: 10,
  start_published_date: "2024-01-01T00:00:00.000Z"
)

Research API

Create async research tasks for in-depth web research:

# Create a research task
task = client.create_research(
  instructions: "Summarize the latest developments in AI safety research",
  model: "exa-research"  # or "exa-research-fast", "exa-research-pro"
)

puts "Task ID: #{task.research_id}"
puts "Status: #{task.status}"  # => "pending" or "running"

# Poll for results
loop do
  task = client.get_research(task.research_id)
  
  case task.status
  when "completed"
    puts task.output
    break
  when "running"
    puts "Progress: #{task.operations_completed}/#{task.operations_total}"
    sleep 5
  when "failed"
    puts "Error: #{task.error_message}"
    break
  end
end

# With structured output schema
schema = {
  type: "object",
  properties: {
    companies: { type: "array", items: { type: "string" } },
    summary: { type: "string" }
  }
}

task = client.create_research(
  instructions: "Find the top 5 AI startups in 2024",
  model: "exa-research-pro",
  output_schema: schema
)

# List all research tasks
response = client.list_research(limit: 10)
response.data.each { |t| puts "#{t.research_id}: #{t.status}" }

# Cancel a running task
client.cancel_research(task.research_id)

Websets API

Websets allow you to build structured collections of web data with automated search, verification, and enrichment.

Create a Webset

webset = client.create_webset(
  search: {
    query: "AI startups founded in 2024",
    count: 100,
    entity: { type: "company" },
    criteria: [
      { description: "Company must be focused on artificial intelligence" },
      { description: "Founded in 2024" }
    ]
  },
  enrichments: [
    { description: "Company's total funding amount", format: "number" },
    { description: "Number of employees", format: "number" },
    { 
      description: "Primary industry vertical",
      format: "enum",
      options: [
        { label: "Healthcare" },
        { label: "Finance" },
        { label: "Enterprise" },
        { label: "Consumer" },
        { label: "Other" }
      ]
    }
  ],
  external_id: "my-ai-startups-2024"
)

puts "Created Webset: #{webset.id}"
puts "Status: #{webset.status}"

Monitor Webset Progress

webset = client.get_webset(webset.id)

webset.searches.each do |search|
  puts "Search: #{search.query}"
  puts "Found: #{search.found_count}"
  puts "Completion: #{search.completion_percentage}%"
end

List Webset Items

response = client.list_webset_items(webset.id, limit: 50)

response.data.each do |item|
  puts "Item: #{item.url}"
  puts "Type: #{item.type}"
  
  # Check criteria evaluations
  item.evaluations.each do |eval|
    status = eval.satisfied? ? "✓" : "✗"
    puts "  #{status} #{eval.criterion}"
  end
  
  # Access enrichment results
  item.enrichments.each do |enrichment|
    puts "  #{enrichment.format}: #{enrichment.result}"
  end
end

# Paginate through all items
while response.has_more?
  response = client.list_webset_items(webset.id, cursor: response.next_cursor)
  # Process items...
end

Add Searches to Existing Webset

search = client.create_webset_search(
  webset.id,
  query: "AI healthcare startups",
  count: 50,
  entity: { type: "company" },
  criteria: [{ description: "Must be in healthcare industry" }]
)

puts "Search ID: #{search.id}"

Add Enrichments

enrichment = client.create_webset_enrichment(
  webset.id,
  description: "CEO or founder name",
  format: "text"
)

Delete Resources

# Delete an item
client.delete_webset_item(webset.id, item.id)

# Delete an enrichment
client.delete_webset_enrichment(webset.id, enrichment.id)

# Cancel an in-progress search
client.cancel_webset_search(webset.id, search.id)

# Delete entire webset
client.delete_webset(webset.id)

Search Options Reference

Option	Type	Description
`type`	Symbol	`:neural`, `:auto`, `:fast`, `:deep`
`category`	Symbol	`:people`, `:company`, `:research_paper`, `:news`, `:pdf`, `:github`, `:tweet`, `:personal_site`, `:financial_report`
`num_results`	Integer	Number of results (max 100)
`include_domains`	Array	Domains to include
`exclude_domains`	Array	Domains to exclude
`start_crawl_date`	String/Time	Results crawled after this date
`end_crawl_date`	String/Time	Results crawled before this date
`start_published_date`	String/Time	Results published after this date
`end_published_date`	String/Time	Results published before this date
`include_text`	Array	Keywords that must be present
`exclude_text`	Array	Keywords to exclude
`country`	String	Two-letter ISO country code
`text`	Boolean/Hash	Return text content
`highlights`	Boolean/Hash	Return highlights
`summary`	Boolean/Hash	Return AI summary
`context`	Boolean/Integer	Return context string for LLM
`moderation`	Boolean	Filter unsafe content
`livecrawl`	Symbol	`:never`, `:fallback`, `:preferred`, `:always`

Error Handling

The gem provides specific error classes for different failure modes:

begin
  results = client.search("query")
rescue Exa::AuthenticationError => e
  puts "Invalid API key: #{e.message}"
rescue Exa::RateLimitError => e
  puts "Rate limited. Retry after: #{e.retry_after} seconds"
rescue Exa::InvalidRequestError => e
  puts "Invalid request: #{e.message}"
  puts "Validation errors: #{e.validation_errors}" if e.validation_errors
rescue Exa::NotFoundError => e
  puts "Resource not found: #{e.message}"
rescue Exa::ServerError => e
  puts "Server error: #{e.message}"
rescue Exa::TimeoutError => e
  puts "Request timed out: #{e.message}"
rescue Exa::Error => e
  puts "General error: #{e.message}"
end

Advanced Configuration

Exa.configure do |config|
  config.api_key = ENV['EXA_API_KEY']
  config.base_url = 'https://api.exa.ai'
  config.websets_base_url = 'https://api.exa.ai/websets/v0'
  config.timeout = 120
  config.max_retries = 5
  config.retry_delay = 1.0
  config.max_retry_delay = 60.0
  config.retry_statuses = [429, 500, 502, 503, 504]
end

# Enable logging
Exa.logger = Logger.new(STDOUT)

Response Objects

SearchResponse

response = client.search("query")

response.request_id        # Unique request ID
response.results           # Array of SearchResult
response.context           # Combined context string
response.cost              # CostInfo object
response.total_cost        # Cost in dollars

# Enumerable
response.each { |r| puts r.title }
response.first
response.count

SearchResult

result.id                  # Unique identifier
result.title               # Page title
result.url                 # Page URL
result.published_date      # Publication date (Time)
result.author              # Author name
result.text                # Full text content
result.highlights          # Highlighted snippets
result.summary             # AI-generated summary
result.subpages            # Related subpages

Webset

webset.id                  # Unique identifier
webset.status              # idle, pending, running, paused
webset.title               # Webset title
webset.searches            # Array of WebsetSearch
webset.items               # Array of WebsetItem
webset.enrichments         # Array of WebsetEnrichment
webset.created_at          # Creation timestamp

Monitors

Create automated schedules to keep Websets updated:

# Create a monitor to search daily
monitor = client.create_monitor(
  webset_id: "webset_abc123",
  cadence: { cron: "0 9 * * *", timezone: "America/New_York" },
  behavior: {
    type: "search",
    config: {
      count: 50,
      query: "AI news today",
      entity: { type: "article" },
      behavior: "append"
    }
  }
)

# List monitors
client.list_monitors(webset_id: "webset_abc123")

# Update monitor
client.update_monitor(monitor.id, status: "disabled")

# Delete monitor
client.delete_monitor(monitor.id)

Imports

Upload CSV data into Websets:

# Create an import
import = client.create_import(
  size: 1024,
  count: 100,
  format: "csv",
  entity: { type: "company" },
  title: "Q4 Leads",
  csv: { identifier: 1 }
)

# Upload file to the returned URL
puts "Upload to: #{import.upload_url}"
puts "Valid until: #{import.upload_valid_until}"

# Check import status
import = client.get_import(import.id)
puts "Status: #{import.status}"

Webhooks & Events

Subscribe to real-time notifications:

# Create a webhook
webhook = client.create_webhook(
  url: "https://example.com/webhooks/exa",
  events: ["webset.item.created", "webset.item.enriched"]
)
puts "Secret (save this!): #{webhook.secret}"

# List events
events = client.list_events(
  types: ["webset.item.created"],
  limit: 50
)

events.data.each { |e| puts "#{e.type} at #{e.created_at}" }

n8n & Zapier Integration

Verify webhook signatures in your integrations:

# In your webhook receiver (Rails, Sinatra, etc.)
raw_body = request.raw_post
signature = request.headers["X-Exa-Signature"]

if Exa::Utils::WebhookHandler.verify_signature(
  raw_body, 
  signature, 
  secret: ENV["EXA_WEBHOOK_SECRET"]
)
  event = Exa::Utils::WebhookHandler.parse_event(raw_body)
  
  case event.type
  when "webset.item.created"
    # Handle new item
  when "webset.item.enriched"
    # Handle enriched item
  end
  
  head :ok
else
  head :unauthorized
end

# One-liner with automatic error handling
event = Exa::Utils::WebhookHandler.construct_event(
  raw_body,
  request.headers,
  secret: ENV["EXA_WEBHOOK_SECRET"]
)

Command-Line Interface

The gem includes a beautiful CLI:

# Search
exa search "latest AI research"
exa search "ML startups" -n 20 --type deep

# Answer questions
exa answer "What is the valuation of SpaceX?"

# Find similar pages
exa similar "https://arxiv.org/abs/2307.06435"

# Research
exa research "Summarize AI safety developments in 2024" --wait

# Manage Websets
exa websets list
exa websets get ws_abc123
exa websets items ws_abc123

# Output as JSON
exa search "AI news" --json

# Show version
exa version

SSE Streaming

Stream tokens in real-time for Answer and Research APIs:

# Stream an answer with real-time token output
Exa::Utils::SSEClient.stream_answer(
  api_key: ENV["EXA_API_KEY"],
  query: "What is quantum computing?"
) do |event|
  case event[:type]
  when :token
    print event[:data]  # Print each token as it arrives
  when :citation
    puts "\nSource: #{event[:data][:url]}"
  when :done
    puts "\n\nComplete!"
  when :error
    puts "Error: #{event[:data]}"
  end
end

# Stream research progress
Exa::Utils::SSEClient.stream_research(
  api_key: ENV["EXA_API_KEY"],
  instructions: "Research latest AI developments"
) do |event|
  case event[:type]
  when :progress
    puts "Progress: #{event[:data][:percent]}%"
  when :output
    puts event[:data]
  end
end

# Instance-based streaming
streamer = Exa::Utils::SSEClient.new(api_key: ENV["EXA_API_KEY"])
streamer.answer("What is GPT-4?") { |e| print e[:data] if e[:type] == :token }

Sorbet Type Definitions

Optional static type checking with Sorbet:

# Install sorbet-runtime for type definitions
# gem install sorbet-runtime

require 'exa'

# Types are available when sorbet-runtime is installed
params = Exa::Types::SearchParams.new(
  query: "AI research",
  type: "neural",
  num_results: 10,
  text: true
)

# Type definitions for all API responses
# Exa::Types::SearchResultData
# Exa::Types::AnswerResponseData
# Exa::Types::ResearchTaskData
# Exa::Types::WebsetData
# Exa::Types::MonitorData
# Exa::Types::ImportData
# Exa::Types::WebhookData
# Exa::Types::EventData

Requirements

Ruby >= 3.1
faraday >= 2.0
faraday-retry >= 2.0
thor >= 1.0
concurrent-ruby >= 1.2
sorbet-runtime >= 0.5 (optional, for type definitions)
opentelemetry-sdk (optional, for instrumentation)
redis (optional, for distributed caching)

Verification Status

I apologize for the lack of comprehensive unit tests in the current release (v1.3.0). Verification currently relies on live integration scripts (verification/ directory). Note that running these scripts (verification/test_search_real.rb and verification/test_websets_lifecycle.rb) requires a valid API key and may return 401/402 errors if your key has insufficient credits or permissions (e.g. Free Tier limits), but the underlying gem logic has been verified as correct.

Development

After checking out the repo, run bundle install to install dependencies.

License

The gem is available as open source under the terms of the MIT License.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/tigel-agm/exaonruby.

exaonruby

Runtime