0.0
No release in over 3 years
Full-featured Ruby client for the Firehose API — tap management, rules CRUD (create, read, update, delete), SSE streaming with auto-reconnect and offset tracking.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
 Dependencies

Development

~> 3.12
~> 3.18

Runtime

~> 2.0
 Project Readme

firehose-rb

Gem Version

Ruby client for the Firehose real-time web monitoring API. Define rules, stream matching pages as they're discovered, and build content pipelines on top of the live web.

Installation

gem "firehose-rb", "~> 0.1"

Then bundle install.

Configuration

Firehose.configure do |c|
  c.management_key = ENV["FIREHOSE_MANAGEMENT_KEY"]  # fhm_...
  c.tap_token      = ENV["FIREHOSE_TAP_TOKEN"]       # fh_...
  c.base_url       = "https://api.firehose.dev"      # default
  c.timeout        = 300                              # SSE timeout in seconds
end

Usage

Rules

Rules tell Firehose what to watch for. They use Lucene query syntax.

client = Firehose.client

# Create a rule
rule = client.create_rule(
  value: '"AI agent" AND language:"en" AND recent:7d',
  tag: "ai-agent",
  quality: true
)

# List all rules
rules = client.list_rules

# Delete a rule
client.delete_rule(rule.id)

Streaming

Connect to the SSE stream and process matching pages in real time.

client = Firehose.client

# Persist offsets so you can resume after restart
client.on_offset { |offset| save_offset(offset) }

# Stream events (auto-reconnects with exponential backoff)
client.stream(since: "1h") do |event|
  event.id                    # String — unique event ID
  event.document.url          # String — page URL
  event.document.title        # String — page title
  event.document.markdown     # String — full page content as markdown
  event.document.categories   # Array  — page categories
  event.document.types        # Array  — page types (article, blog, etc.)
  event.document.language     # String — detected language
  event.document.publish_time # Time   — when the page was published
  event.matched_rule          # String — which rule tag matched
  event.matched_at            # Time   — when the match occurred
end

# Stop streaming gracefully
client.stop_stream

Resilience

  • Auto-reconnect with exponential backoff (1s, 2s, 4s, ... max 30s)
  • Last-Event-ID header sent on reconnect for automatic resume
  • on_offset callback for persisting stream position
  • Authentication errors (401/403) are not retried

Data Structures

Struct Fields
Firehose::Rule id, value, tag, quality, nsfw
Firehose::Event id, document, matched_rule, matched_at
Firehose::Document url, title, markdown, categories, types, language, publish_time

Errors

Error Cause
Firehose::AuthenticationError Invalid management_key or tap_token
Firehose::RateLimitError Too many requests (429)
Firehose::ConnectionError Network or HTTP errors
Firehose::TimeoutError Stream or request timeout

Requirements

Used by

Built for InventList — a home for indie builders that turns the live web into weekly signals for makers and their agents.

License

MIT