AgentFerrum
Browser automation for AI agents, in Ruby. Powered by Ferrum.
AgentFerrum wraps Chrome headless (via CDP) with an AI-optimized extraction layer. Instead of dumping raw HTML into your agent's context, it produces a compact snapshot: a markdown rendering of the visible page content + an accessibility tree of interactive elements with clickable refs. Typical reduction: 50-80% fewer tokens compared to raw HTML (the heavier the page, the bigger the savings).
Inspired by agent-browser (Vercel) for the accessibility tree + refs concept, Crucible for stealth profiles and download management, and FerrumMCP for Ruby/Ferrum patterns.
Why AgentFerrum?
Most browser automation tools return the full DOM or raw HTML. An LLM agent processing a typical web page receives thousands of tokens of noise (scripts, styles, hidden elements, data attributes). AgentFerrum solves this with a hybrid snapshot:
# Shopping Cart # What your agent sees
URL: https://shop.example.com/cart
## Interactive Elements # Clickable refs
@e1: [link] "Home" href="/"
@e2: [link] "Products" href="/products"
@e3: [textbox] "Search" value=""
@e4: [button] "Remove"
@e5: [button] "Checkout"
## Page Content # Clean markdown
# Your Cart
| Product | Qty | Price |
|---------|-----|-------|
| Widget | 2 | $20 |
Total: **$20.00**
Your agent reads a compact snapshot instead of the full DOM. It clicks @e5 to checkout. Done.
Benchmark
Real-world token reduction measured on live pages (February 2026):
| Site | Raw HTML | Snapshot | Reduction |
|---|---|---|---|
| Hacker News | ~8,600 tokens | ~4,500 tokens | 47% |
| Wikipedia (Ruby article) | ~140,000 tokens | ~31,000 tokens | 78% |
| GitHub (repo page) | ~265,000 tokens | ~22,000 tokens | 92% |
The heavier the page (scripts, styles, data attributes, hidden elements), the bigger the savings. Simple content-focused pages like HN see ~50% reduction. Rich web apps like GitHub or StackOverflow see 90%+.
Features
- Hybrid snapshots -- Accessibility tree (interactive elements with refs) + markdown (visible content), combined into a single compact output
-
Ref-based actions -- Click, fill, select, hover via
@e1,@e2... refs from the snapshot. No CSS selectors needed - Visibility filtering -- JS-based filtering removes hidden elements, then Nokogiri strips scripts, styles, and noise attributes
- Markdown conversion -- HTML to clean markdown via ReverseMarkdown, with whitespace compaction
-
Stealth mode -- Three profiles (
:minimal,:moderate,:maximum) ported from puppeteer-extra-plugin-stealth - Download management -- Set download path, wait for completion with timeout
- Smart waiting -- Poll for CSS/XPath/text/block conditions with configurable timeout and interval
- Auto-retry -- Node actions retry automatically on transient errors (element moving, coordinates not found)
- AI-friendly errors -- Error messages tell the agent what to do next ("Call browser.snapshot to refresh refs")
-
Standalone CLI --
agent_ferrum start/snapshot/click/stop— each command is a separate invocation, ideal for AI agents chaining shell commands. Zero external dependencies
Installation
Add to your Gemfile:
gem "agent_ferrum"Then:
bundle installRequirements: Ruby 3.4+, Chrome/Chromium installed.
Quick Start
1. Navigate and snapshot
require "agent_ferrum"
browser = AgentFerrum::Browser.new
browser.navigate("https://example.com")
snap = browser.snapshot
puts snap.to_s
# => Compact snapshot with interactive elements + markdown content2. Interact via refs
snap = browser.snapshot
# Click a button by ref
browser.click("@e3")
# Fill a text field by ref
browser.fill("@e2", "search query")
# Or use CSS/XPath selectors
browser.click("button.submit")
browser.click("//a[@href='/about']")
browser.fill({css: "input[name='email']"}, "user@example.com")3. Wait for content
# Wait for an element
browser.wait_for(css: ".results", timeout: 10)
# Wait for text to appear
browser.wait_for(text: "Search complete")
# Wait for a custom condition
browser.wait_for { |b| b.evaluate("document.readyState") == "complete" }4. Extract content
# Full snapshot (accessibility tree + markdown)
snap = browser.snapshot
puts snap.to_s # Combined output for AI
puts snap.markdown # Just the markdown content
puts snap.accessibility_tree # Just the interactive elements
puts snap.estimated_tokens # Approximate token count
# Quick markdown only
puts browser.page_markdown5. Clean up
browser.quitConfiguration
AgentFerrum.configure do |c|
c.headless = true # Run headless (default: true)
c.timeout = 30 # Default timeout in seconds
c.poll_interval = 0.1 # Polling interval for wait_for
c.viewport = [1920, 1080] # Browser viewport size
c.stealth = :off # Stealth profile: :off, :minimal, :moderate, :maximum
c.download_path = nil # Directory for downloads
c.browser_path = nil # Custom Chrome/Chromium path
c.user_agent = nil # Custom user agent
c.locale = nil # Browser locale (e.g., "fr-FR")
c.timezone = nil # Timezone override (e.g., "Europe/Paris")
endOr pass options directly:
browser = AgentFerrum::Browser.new(headless: false, timeout: 60, stealth: :maximum)Stealth Mode
Three profiles of increasing evasion, ported from puppeteer-extra-plugin-stealth:
| Profile | Scripts | What it does |
|---|---|---|
:minimal |
1 | Removes navigator.webdriver flag |
:moderate |
4 | + Vendor/platform spoofing, Chrome runtime, user-agent cleanup |
:maximum |
7 | + Navigator plugins, WebGL vendor masking, iframe fixes |
# Enable at initialization
browser = AgentFerrum::Browser.new(stealth: :maximum)
# Or switch dynamically
browser.stealth(:moderate)
browser.navigate("https://bot-detection-site.com")Downloads
browser.download_path = "/tmp/downloads"
browser.click("@e5") # Click a download link
filepath = browser.wait_for_download(timeout: 30)
puts filepath # => "/tmp/downloads/report.pdf"
# Or wait for a specific filename
filepath = browser.wait_for_download(filename: "report.pdf", timeout: 60)Snapshot Format
The snapshot output is designed for AI consumption. Here's the structure:
# Page Title
URL: https://example.com/page
## Interactive Elements
@e1: [button] "Submit"
@e2: [textbox] "Email" value=""
@e3: [link] "Home" href="/"
@e4: [checkbox] "Remember me" checked=true
@e5: [combobox] "Country"
@e6: [link] "Sign up" href="/register"
## Page Content
# Welcome
Please fill in your details below.
| Field | Required |
|-------|----------|
| Email | Yes |
| Name | No |
[Terms of Service](/tos)
Supported interactive roles: button, link, textbox, checkbox, radio, combobox, menuitem, tab, slider, spinbutton, searchbox, switch, option, listbox, menu, menubar.
Element properties included when present: value, disabled, required, checked, selected, readonly.
API Reference
Navigation
| Method | Description |
|---|---|
navigate(url) |
Navigate to URL |
back |
Go back |
forward |
Go forward |
refresh |
Reload page |
current_url |
Current page URL |
title |
Page title |
Content Extraction
| Method | Returns | Description |
|---|---|---|
snapshot |
Snapshot |
Full hybrid snapshot (accessibility tree + markdown) |
page_markdown |
String |
Markdown of visible content only |
accessibility_tree |
AccessibilityTree |
Interactive elements with refs |
Actions
| Method | Description |
|---|---|
click(target) |
Click element (ref, CSS, XPath, or Hash) |
fill(target, value) |
Fill text field |
select(target, value) |
Select dropdown option |
hover(target) |
Hover over element |
type_text(text) |
Type text via keyboard |
Target resolution: "@e1" (ref) > {css: ".btn"} / {xpath: "//a"} (Hash) > "//*[@id='x']" (XPath string starting with /) > "button.submit" (CSS string).
Waiting
| Method | Description |
|---|---|
wait_for(css:) |
Wait for CSS selector |
wait_for(xpath:) |
Wait for XPath |
wait_for(text:) |
Wait for text content |
wait_for { block } |
Wait for block to return truthy |
wait_for_navigation |
Wait for page idle |
Utilities
| Method | Description |
|---|---|
evaluate(js) |
Execute JavaScript, return result |
screenshot(path:, full:) |
Take screenshot |
quit |
Close browser |
AI Agent Integration Example
Here's how an AI agent loop might use AgentFerrum:
browser = AgentFerrum::Browser.new(stealth: :moderate)
# Agent navigates
browser.navigate("https://shop.example.com")
# Agent gets compact page representation
snap = browser.snapshot
# => Send snap.to_s to LLM (compact snapshot)
# LLM decides to search for a product
browser.fill("@e3", "wireless headphones")
browser.click("@e4") # Search button
# Agent refreshes snapshot after action
browser.wait_for(css: ".results")
snap = browser.snapshot
# => Updated snapshot with search results
# LLM picks a product
browser.click("@e7") # Product link
# Continue the loop...
browser.quitCLI
AgentFerrum ships with a standalone CLI where each command is a separate invocation. A background daemon holds the browser process, communicating over a Unix domain socket. Zero external dependencies -- only Ruby stdlib (UNIXSocket, JSON).
CLI invocation Daemon (background)
┌──────────────┐ ┌───────────────────┐
│ agent_ferrum │ Unix socket │ AgentFerrum:: │
│ snapshot │ ──────────────> │ Browser instance │
│ │ <────────────── │ (Chrome headless) │
└──────────────┘ JSON response └───────────────────┘
Quick start
agent_ferrum start https://example.com # Launch browser daemon + navigate
agent_ferrum snapshot # Get accessibility tree + markdown
agent_ferrum click @e1 # Click an element by ref
agent_ferrum screenshot /tmp/page.png # Take a screenshot
agent_ferrum stop # Stop daemon and browserAll commands
Session:
start [URL] [options] Start the browser daemon
--headed Visible browser (default: headless)
--stealth PROFILE off / minimal / moderate / maximum
--user-agent UA Custom user agent
--viewport WxH Viewport size (default: 1920x1080)
--timeout N Timeout in seconds (default: 30)
--browser-path PATH Path to Chrome binary
stop Stop the browser daemon
status Show browser status
Navigation:
navigate URL Navigate to URL (alias: go)
back / forward / refresh
Content:
snapshot Full snapshot (alias: snap)
tree Accessibility tree only
markdown Page markdown only (alias: md)
url Current URL
title Page title
Actions:
click TARGET Click element (ref @e1 or CSS selector)
fill TARGET VALUE Fill an input field
select TARGET VALUE Select a dropdown option
hover TARGET Hover over element
type TEXT Type text via keyboard
Utilities:
screenshot [PATH] Take a screenshot
eval JS Evaluate JavaScript
stealth PROFILE Change stealth profile
wait SELECTOR [TIMEOUT] Wait for element (CSS or XPath)
Example: AI agent loop
agent_ferrum start https://shop.example.com
# AI reads the page
SNAP=$(agent_ferrum snapshot)
# AI decides to search
agent_ferrum fill @e3 "wireless headphones"
agent_ferrum click @e4
agent_ferrum wait ".results"
# AI reads results
agent_ferrum snapshot
# Done
agent_ferrum stopDevelopment
git clone https://github.com/Alqemist-labs/agent_ferrum
cd agent_ferrum
bundle install
# Run unit tests
bundle exec rake test
# Run integration tests (requires Chrome)
bundle exec rake integrationContributing
Bug reports and pull requests are welcome on GitHub.
- Fork it
- Create your feature branch (
git checkout -b feature/my-feature) - Commit your changes
- Push to the branch
- Create a Pull Request
License
The gem is available as open source under the terms of the MIT License.
See Also
- Ferrum -- The Chrome headless Ruby library this gem is built on
- agent-browser -- Vercel's CLI for AI browser automation (accessibility tree + refs concept)
- Crucible -- Ruby MCP server for browser automation with stealth mode
- FerrumMCP -- MCP server for Ferrum with AI agent integration
- RubyLLM::Tribunal -- LLM evaluation framework for Ruby