Project

exaonruby

0.0
No release in over 3 years
A production-ready Ruby gem wrapper for the Exa.ai Search and Websets APIs. Features neural search, LLM-powered answers, async research tasks, Websets management (monitors, imports, webhooks), SSE streaming, request logging, rate limiting, response caching, OpenTelemetry instrumentation, parallel requests, Rails integration, Sorbet types, and a beautiful CLI. Includes n8n/Zapier webhook signature verification utilities.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
 Dependencies

Runtime

>= 1.2, < 2.0
>= 2.0, < 3.0
>= 2.0, < 3.0
>= 1.0, < 3.0
 Project Readme

Exa Ruby

Gem Version Downloads Ruby License GitHub Stars

SSE Streaming Rails OpenTelemetry Sorbet


The most complete Ruby client for the Exa.ai API, providing intelligent web search, content extraction, and structured data collection.

Star this repo if you find it useful!

Features

  • Search API: Neural search, deep search, and content extraction
  • Contents API: Fetch full page contents with livecrawl support
  • Find Similar: Discover semantically similar pages
  • Answer API: LLM-powered question answering with citations
  • Research API: Async research tasks with structured output
  • Websets API: Build and manage structured web data collections
  • Monitors: Automated scheduled searches and content refresh
  • Imports: Upload CSV data into Websets
  • Webhooks & Events: Real-time notifications for Websets activity
  • SSE Streaming: Real-time token streaming for Answer and Research APIs
  • Request Logging: Pretty HTTP logging with timing and status indicators
  • Rate Limiting: Client-side token bucket rate limiting
  • Response Caching: Memory and Redis caching to reduce API costs
  • Instrumentation: OpenTelemetry distributed tracing
  • Parallel Requests: Concurrent batch operations
  • Rails Integration: Railtie, generator, ActiveJob, ActionCable
  • Sorbet Types: Optional T::Struct type definitions
  • Beautiful CLI: Colorful command-line interface
  • n8n/Zapier Integration: Webhook signature verification utilities
  • Automatic Retries: Built-in retry logic for transient failures
  • Type Documentation: Comprehensive YARD documentation

Installation

Add this line to your application's Gemfile:

gem 'exa'

And then execute:

bundle install

Or install it yourself as:

gem install exa

Quick Start

Configuration

Configure the client with your API key:

require 'exa'

# Option 1: Environment variable (recommended)
# Set EXA_API_KEY environment variable

# Option 2: Global configuration
Exa.configure do |config|
  config.api_key = 'your-api-key'
  config.timeout = 60          # Request timeout in seconds
  config.max_retries = 3       # Retry attempts for transient failures
end

# Option 3: Direct client initialization
client = Exa::Client.new(api_key: 'your-api-key')

Basic Search

# Simple search
results = Exa.search("Latest developments in LLMs")

# Access results
results.each do |result|
  puts "#{result.title}: #{result.url}"
end

# Search with content extraction
results = Exa.search(
  "AI research papers",
  text: true,
  num_results: 20
)

results.each do |result|
  puts result.title
  puts result.text[0..500] if result.text
end

Deep Search

For comprehensive results with query expansion:

results = client.search(
  "Machine learning startups",
  type: :deep,
  additional_queries: ["AI companies", "ML ventures"],
  category: :company,
  include_domains: ["linkedin.com", "crunchbase.com"],
  start_published_date: "2024-01-01T00:00:00.000Z",
  num_results: 50
)

Get Contents

Fetch full page contents from URLs:

contents = client.get_contents(
  ["https://arxiv.org/abs/2307.06435"],
  text: true,
  summary: true,
  livecrawl: :preferred
)

contents.results.each do |page|
  puts page.title
  puts page.summary
end

# Check for failures
if !contents.all_success?
  contents.failed_statuses.each do |status|
    puts "Failed to fetch #{status.id}: #{status.error_tag}"
  end
end

Find Similar Links

Discover pages similar to a given URL:

similar = client.find_similar(
  "https://arxiv.org/abs/2307.06435",
  num_results: 20,
  include_domains: ["arxiv.org", "paperswithcode.com"],
  text: true
)

similar.each do |result|
  puts "#{result.title}: #{result.url}"
end

Answer API

Get LLM-powered answers to questions with citations from web sources:

# Simple question
response = client.answer("What is the latest valuation of SpaceX?")
puts response.answer  # => "$350 billion."

# Access citations
response.citations.each do |citation|
  puts "Source: #{citation.title}"
  puts "URL: #{citation.url}"
end

# With search options
response = client.answer(
  "What are the latest AI safety developments?",
  text: true,
  num_results: 10,
  start_published_date: "2024-01-01T00:00:00.000Z"
)

Research API

Create async research tasks for in-depth web research:

# Create a research task
task = client.create_research(
  instructions: "Summarize the latest developments in AI safety research",
  model: "exa-research"  # or "exa-research-fast", "exa-research-pro"
)

puts "Task ID: #{task.research_id}"
puts "Status: #{task.status}"  # => "pending" or "running"

# Poll for results
loop do
  task = client.get_research(task.research_id)
  
  case task.status
  when "completed"
    puts task.output
    break
  when "running"
    puts "Progress: #{task.operations_completed}/#{task.operations_total}"
    sleep 5
  when "failed"
    puts "Error: #{task.error_message}"
    break
  end
end

# With structured output schema
schema = {
  type: "object",
  properties: {
    companies: { type: "array", items: { type: "string" } },
    summary: { type: "string" }
  }
}

task = client.create_research(
  instructions: "Find the top 5 AI startups in 2024",
  model: "exa-research-pro",
  output_schema: schema
)

# List all research tasks
response = client.list_research(limit: 10)
response.data.each { |t| puts "#{t.research_id}: #{t.status}" }

# Cancel a running task
client.cancel_research(task.research_id)

Websets API

Websets allow you to build structured collections of web data with automated search, verification, and enrichment.

Create a Webset

webset = client.create_webset(
  search: {
    query: "AI startups founded in 2024",
    count: 100,
    entity: { type: "company" },
    criteria: [
      { description: "Company must be focused on artificial intelligence" },
      { description: "Founded in 2024" }
    ]
  },
  enrichments: [
    { description: "Company's total funding amount", format: "number" },
    { description: "Number of employees", format: "number" },
    { 
      description: "Primary industry vertical",
      format: "enum",
      options: [
        { label: "Healthcare" },
        { label: "Finance" },
        { label: "Enterprise" },
        { label: "Consumer" },
        { label: "Other" }
      ]
    }
  ],
  external_id: "my-ai-startups-2024"
)

puts "Created Webset: #{webset.id}"
puts "Status: #{webset.status}"

Monitor Webset Progress

webset = client.get_webset(webset.id)

webset.searches.each do |search|
  puts "Search: #{search.query}"
  puts "Found: #{search.found_count}"
  puts "Completion: #{search.completion_percentage}%"
end

List Webset Items

response = client.list_webset_items(webset.id, limit: 50)

response.data.each do |item|
  puts "Item: #{item.url}"
  puts "Type: #{item.type}"
  
  # Check criteria evaluations
  item.evaluations.each do |eval|
    status = eval.satisfied? ? "✓" : "✗"
    puts "  #{status} #{eval.criterion}"
  end
  
  # Access enrichment results
  item.enrichments.each do |enrichment|
    puts "  #{enrichment.format}: #{enrichment.result}"
  end
end

# Paginate through all items
while response.has_more?
  response = client.list_webset_items(webset.id, cursor: response.next_cursor)
  # Process items...
end

Add Searches to Existing Webset

search = client.create_webset_search(
  webset.id,
  query: "AI healthcare startups",
  count: 50,
  entity: { type: "company" },
  criteria: [{ description: "Must be in healthcare industry" }]
)

puts "Search ID: #{search.id}"

Add Enrichments

enrichment = client.create_webset_enrichment(
  webset.id,
  description: "CEO or founder name",
  format: "text"
)

Delete Resources

# Delete an item
client.delete_webset_item(webset.id, item.id)

# Delete an enrichment
client.delete_webset_enrichment(webset.id, enrichment.id)

# Cancel an in-progress search
client.cancel_webset_search(webset.id, search.id)

# Delete entire webset
client.delete_webset(webset.id)

Search Options Reference

Option Type Description
type Symbol :neural, :auto, :fast, :deep
category Symbol :people, :company, :research_paper, :news, :pdf, :github, :tweet, :personal_site, :financial_report
num_results Integer Number of results (max 100)
include_domains Array Domains to include
exclude_domains Array Domains to exclude
start_crawl_date String/Time Results crawled after this date
end_crawl_date String/Time Results crawled before this date
start_published_date String/Time Results published after this date
end_published_date String/Time Results published before this date
include_text Array Keywords that must be present
exclude_text Array Keywords to exclude
country String Two-letter ISO country code
text Boolean/Hash Return text content
highlights Boolean/Hash Return highlights
summary Boolean/Hash Return AI summary
context Boolean/Integer Return context string for LLM
moderation Boolean Filter unsafe content
livecrawl Symbol :never, :fallback, :preferred, :always

Error Handling

The gem provides specific error classes for different failure modes:

begin
  results = client.search("query")
rescue Exa::AuthenticationError => e
  puts "Invalid API key: #{e.message}"
rescue Exa::RateLimitError => e
  puts "Rate limited. Retry after: #{e.retry_after} seconds"
rescue Exa::InvalidRequestError => e
  puts "Invalid request: #{e.message}"
  puts "Validation errors: #{e.validation_errors}" if e.validation_errors
rescue Exa::NotFoundError => e
  puts "Resource not found: #{e.message}"
rescue Exa::ServerError => e
  puts "Server error: #{e.message}"
rescue Exa::TimeoutError => e
  puts "Request timed out: #{e.message}"
rescue Exa::Error => e
  puts "General error: #{e.message}"
end

Advanced Configuration

Exa.configure do |config|
  config.api_key = ENV['EXA_API_KEY']
  config.base_url = 'https://api.exa.ai'
  config.websets_base_url = 'https://api.exa.ai/websets/v0'
  config.timeout = 120
  config.max_retries = 5
  config.retry_delay = 1.0
  config.max_retry_delay = 60.0
  config.retry_statuses = [429, 500, 502, 503, 504]
end

# Enable logging
Exa.logger = Logger.new(STDOUT)

Response Objects

SearchResponse

response = client.search("query")

response.request_id        # Unique request ID
response.results           # Array of SearchResult
response.context           # Combined context string
response.cost              # CostInfo object
response.total_cost        # Cost in dollars

# Enumerable
response.each { |r| puts r.title }
response.first
response.count

SearchResult

result.id                  # Unique identifier
result.title               # Page title
result.url                 # Page URL
result.published_date      # Publication date (Time)
result.author              # Author name
result.text                # Full text content
result.highlights          # Highlighted snippets
result.summary             # AI-generated summary
result.subpages            # Related subpages

Webset

webset.id                  # Unique identifier
webset.status              # idle, pending, running, paused
webset.title               # Webset title
webset.searches            # Array of WebsetSearch
webset.items               # Array of WebsetItem
webset.enrichments         # Array of WebsetEnrichment
webset.created_at          # Creation timestamp

Monitors

Create automated schedules to keep Websets updated:

# Create a monitor to search daily
monitor = client.create_monitor(
  webset_id: "webset_abc123",
  cadence: { cron: "0 9 * * *", timezone: "America/New_York" },
  behavior: {
    type: "search",
    config: {
      count: 50,
      query: "AI news today",
      entity: { type: "article" },
      behavior: "append"
    }
  }
)

# List monitors
client.list_monitors(webset_id: "webset_abc123")

# Update monitor
client.update_monitor(monitor.id, status: "disabled")

# Delete monitor
client.delete_monitor(monitor.id)

Imports

Upload CSV data into Websets:

# Create an import
import = client.create_import(
  size: 1024,
  count: 100,
  format: "csv",
  entity: { type: "company" },
  title: "Q4 Leads",
  csv: { identifier: 1 }
)

# Upload file to the returned URL
puts "Upload to: #{import.upload_url}"
puts "Valid until: #{import.upload_valid_until}"

# Check import status
import = client.get_import(import.id)
puts "Status: #{import.status}"

Webhooks & Events

Subscribe to real-time notifications:

# Create a webhook
webhook = client.create_webhook(
  url: "https://example.com/webhooks/exa",
  events: ["webset.item.created", "webset.item.enriched"]
)
puts "Secret (save this!): #{webhook.secret}"

# List events
events = client.list_events(
  types: ["webset.item.created"],
  limit: 50
)

events.data.each { |e| puts "#{e.type} at #{e.created_at}" }

n8n & Zapier Integration

Verify webhook signatures in your integrations:

# In your webhook receiver (Rails, Sinatra, etc.)
raw_body = request.raw_post
signature = request.headers["X-Exa-Signature"]

if Exa::Utils::WebhookHandler.verify_signature(
  raw_body, 
  signature, 
  secret: ENV["EXA_WEBHOOK_SECRET"]
)
  event = Exa::Utils::WebhookHandler.parse_event(raw_body)
  
  case event.type
  when "webset.item.created"
    # Handle new item
  when "webset.item.enriched"
    # Handle enriched item
  end
  
  head :ok
else
  head :unauthorized
end

# One-liner with automatic error handling
event = Exa::Utils::WebhookHandler.construct_event(
  raw_body,
  request.headers,
  secret: ENV["EXA_WEBHOOK_SECRET"]
)

Command-Line Interface

The gem includes a beautiful CLI:

# Search
exa search "latest AI research"
exa search "ML startups" -n 20 --type deep

# Answer questions
exa answer "What is the valuation of SpaceX?"

# Find similar pages
exa similar "https://arxiv.org/abs/2307.06435"

# Research
exa research "Summarize AI safety developments in 2024" --wait

# Manage Websets
exa websets list
exa websets get ws_abc123
exa websets items ws_abc123

# Output as JSON
exa search "AI news" --json

# Show version
exa version

SSE Streaming

Stream tokens in real-time for Answer and Research APIs:

# Stream an answer with real-time token output
Exa::Utils::SSEClient.stream_answer(
  api_key: ENV["EXA_API_KEY"],
  query: "What is quantum computing?"
) do |event|
  case event[:type]
  when :token
    print event[:data]  # Print each token as it arrives
  when :citation
    puts "\nSource: #{event[:data][:url]}"
  when :done
    puts "\n\nComplete!"
  when :error
    puts "Error: #{event[:data]}"
  end
end

# Stream research progress
Exa::Utils::SSEClient.stream_research(
  api_key: ENV["EXA_API_KEY"],
  instructions: "Research latest AI developments"
) do |event|
  case event[:type]
  when :progress
    puts "Progress: #{event[:data][:percent]}%"
  when :output
    puts event[:data]
  end
end

# Instance-based streaming
streamer = Exa::Utils::SSEClient.new(api_key: ENV["EXA_API_KEY"])
streamer.answer("What is GPT-4?") { |e| print e[:data] if e[:type] == :token }

Sorbet Type Definitions

Optional static type checking with Sorbet:

# Install sorbet-runtime for type definitions
# gem install sorbet-runtime

require 'exa'

# Types are available when sorbet-runtime is installed
params = Exa::Types::SearchParams.new(
  query: "AI research",
  type: "neural",
  num_results: 10,
  text: true
)

# Type definitions for all API responses
# Exa::Types::SearchResultData
# Exa::Types::AnswerResponseData
# Exa::Types::ResearchTaskData
# Exa::Types::WebsetData
# Exa::Types::MonitorData
# Exa::Types::ImportData
# Exa::Types::WebhookData
# Exa::Types::EventData

Requirements

  • Ruby >= 3.1
  • faraday >= 2.0
  • faraday-retry >= 2.0
  • thor >= 1.0
  • concurrent-ruby >= 1.2
  • sorbet-runtime >= 0.5 (optional, for type definitions)
  • opentelemetry-sdk (optional, for instrumentation)
  • redis (optional, for distributed caching)

Verification Status

I apologize for the lack of comprehensive unit tests in the current release (v1.3.0). Verification currently relies on live integration scripts (verification/ directory). Note that running these scripts (verification/test_search_real.rb and verification/test_websets_lifecycle.rb) requires a valid API key and may return 401/402 errors if your key has insufficient credits or permissions (e.g. Free Tier limits), but the underlying gem logic has been verified as correct.

Development

After checking out the repo, run bundle install to install dependencies.

License

The gem is available as open source under the terms of the MIT License.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/tigel-agm/exaonruby.