Headless Browser Tool

A headless browser control tool that provides an MCP (Model Context Protocol) server with tools to control a headless browser using Capybara and Selenium. Features multi-session support, session persistence, and both HTTP and stdio communication modes.

Features

Headless Chrome browser automation - Full browser control via Selenium WebDriver
MCP server with 40+ browser control tools - Comprehensive API for browser interactions
Multi-session support - Isolated browser sessions for each client
Session persistence - Sessions survive server restarts with cookies and state preservation
Two server modes - HTTP server mode and stdio mode for different integration patterns
Smart screenshot tools - With annotations, highlighting, and visual diff capabilities
AI-assisted tools - Auto-narration and intelligent page analysis
Comprehensive logging - Separate log files for stdio mode to avoid protocol interference
Structured responses - All tools return rich, structured data instead of simple strings
Smart element selectors - Tools returning multiple elements include selectors for each

Installation

Add this line to your application's Gemfile:

gem 'headless_browser_tool'

And then execute:

bundle install

Or install it yourself as:

gem install headless_browser_tool

Prerequisites

You need to have Chrome/Chromium browser installed on your system. The gem will use Chrome in headless mode by default.

Usage

Command Line Interface

The hbt command provides three main commands:

`hbt start` - Start HTTP Server Mode

Starts the MCP server as an HTTP server with SSE (Server-Sent Events) support:

hbt start [OPTIONS]

Options:

--port PORT - Port for the MCP server (default: 4567)
--headless / --no-headless - Run browser in headless mode (default: true)
--single-session - Use single shared browser session instead of multi-session mode
--session-id SESSION_ID - Enable session persistence for single session mode (requires --single-session)
--show-headers - Show HTTP request headers for debugging session issues

Examples:

# Start with default settings (multi-session, headless, port 4567)
hbt start

# Start in non-headless mode for debugging
hbt start --no-headless

# Start in single session mode (legacy compatibility)
hbt start --single-session

# Start in single session mode with persistence
hbt start --single-session --session-id my-app-session

# Start with request header logging
hbt start --show-headers

`hbt stdio` - Start Stdio Server Mode

Starts the MCP server in stdio mode for direct integration with tools that spawn subprocesses:

hbt stdio [OPTIONS]

Options:

--headless / --no-headless - Run browser in headless mode (default: true)

Notes:

Always runs in single-session mode
Logs to .hbt/logs/PID.log instead of stdout to avoid interfering with MCP protocol
Ideal for editor integrations and tools that communicate via stdin/stdout
Supports optional session persistence via HBT_SESSION_ID environment variable

Session Persistence in Stdio Mode:

You can enable session persistence by setting the HBT_SESSION_ID environment variable:

# First run - creates and saves session
HBT_SESSION_ID=my-editor-session hbt stdio

# Later run - restores previous session state
HBT_SESSION_ID=my-editor-session hbt stdio

When HBT_SESSION_ID is set:

Session state is saved to .hbt/sessions/{session_id}.json on exit
On startup, if the session file exists, it restores:
- Current URL
- Cookies
- localStorage
- sessionStorage
- Window size

This is useful for editor integrations that want to maintain browser state across multiple tool invocations.

Examples:

# Start in stdio mode (headless by default, no persistence)
hbt stdio

# Start with session persistence
HBT_SESSION_ID=vscode-session hbt stdio

# Start in stdio mode with visible browser
hbt stdio --no-headless

`hbt version` - Display Version

Shows the current version of HeadlessBrowserTool:

hbt version

Session Management

Multi-Session Mode (Default for HTTP Server)

In multi-session mode, each client connection gets its own isolated browser session with:

Separate cookies and localStorage - Complete isolation between sessions
Independent navigation history - Each session maintains its own browser state
Session persistence - Sessions are saved to .hbt/sessions/ and restored on restart
Automatic cleanup - Idle sessions are closed after 30 minutes
LRU eviction - When at capacity (10 sessions), least recently used sessions are closed

Session Identification in Multi-Session Mode:

For HTTP server mode, sessions require an X-Session-ID header:

# Connect with session ID "alice"
curl -H "X-Session-ID: alice" -H "Accept: text/event-stream" http://localhost:4567/

# Different session ID gets different browser
curl -H "X-Session-ID: bob" -H "Accept: text/event-stream" http://localhost:4567/

# Without X-Session-ID header, connection is rejected
curl -H "Accept: text/event-stream" http://localhost:4567/
# Returns: 400 Bad Request - X-Session-ID header is required

Session ID Requirements:

Must be provided via X-Session-ID header
Can only contain alphanumeric characters, underscores, and hyphens
Maximum length: 64 characters
Invalid formats are rejected with 400 error

Single Session Mode

Use --single-session flag for legacy mode where all clients share one browser:

hbt start --single-session

Session Persistence in Single Session Mode:

You can enable session persistence with the --session-id flag:

# First run - creates and saves session
hbt start --single-session --session-id my-app

# Server restart - restores previous session
hbt start --single-session --session-id my-app

When --session-id is provided:

Session state is saved to .hbt/sessions/{session_id}.json on shutdown
On startup, if the session file exists, it restores browser state
All clients share this single persistent session
Compatible with stdio mode session files

This is useful for:

Development servers that need to maintain login state
Testing environments where you want consistent browser state
Applications that don't need multi-user isolation

Note: The --session-id flag can only be used with --single-session. In multi-session mode, session IDs are provided by clients via headers.

Session Management Endpoints

View active sessions:

curl http://localhost:4567/sessions | jq

Response:

{
  "active_sessions": ["alice", "bob"],
  "session_count": 2,
  "session_data": {
    "alice": {
      "created_at": "2024-01-20T10:00:00Z",
      "last_activity": "2024-01-20T10:05:00Z",
      "idle_time": 300.5
    }
  }
}

Close a specific session:

curl -X DELETE http://localhost:4567/sessions/alice

Directory Structure

HeadlessBrowserTool creates a .hbt/ directory with:

.hbt/
├── .gitignore      # Contains "*" to ignore all contents
├── screenshots/    # Screenshot storage
├── sessions/       # Session persistence files
└── logs/          # Log files (stdio mode only)
    └── PID.log    # Process-specific log file

MCP API

The server implements the Model Context Protocol (MCP) and responds to JSON-RPC requests.

Using with MCP Clients

For HTTP mode with proper MCP clients:

# Start server
hbt start

# MCP client should:
# 1. Connect with X-Session-ID header
# 2. Use SSE endpoint for streaming: http://localhost:4567/mcp/sse
# 3. Send commands via JSON-RPC

For stdio mode:

# MCP client spawns the process directly
hbt stdio
# Communication happens via stdin/stdout

Available Browser Tools

All tools are available through the MCP protocol. Here's a complete reference:

Navigation Tools

Tool	Description	Parameters	Returns
`visit`	Navigate to a URL	`url` (required)	`{url, current_url, title, status}`
`refresh`	Reload the current page	None	`{url, title, changed, status}`
`go_back`	Navigate back in browser history	None	`{navigation: {from, to, title, navigated}, status}`
`go_forward`	Navigate forward in browser history	None	`{navigation: {from, to, title, navigated}, status}`

Element Interaction Tools

Tool	Description	Parameters	Returns
`click`	Click an element	`selector` (required)	`{selector, element, navigation, status}`
`right_click`	Right-click an element	`selector` (required)	`{selector, element, status}`
`double_click`	Double-click an element	`selector` (required)	`{selector, element, status}`
`hover`	Hover mouse over element	`selector` (required)	`{selector, element, status}`
`drag`	Drag element to target	`source_selector`, `target_selector` (required)	`{source_selector, target_selector, source, target, status}`

Element Finding Tools

Tool	Description	Parameters	Key Returns
`find_element`	Find single element	`selector` (required)	Element details with attributes
`find_all`	Find all matching elements	`selector` (required)	`{elements: [{selector, tag_name, text, visible, attributes}]}`
`find_elements_containing_text`	Find elements with text	`text` (required), `exact_match`, `case_sensitive`, `visible_only`	`{elements: [{selector, xpath, tag, text, clickable}]}`
`get_text`	Get element text	`selector` (required)	Text content string
`get_attribute`	Get element attribute	`selector`, `attribute` (required)	Attribute value
`get_value`	Get input value	`selector` (required)	Input value
`is_visible`	Check element visibility	`selector` (required)	Boolean
`has_element`	Check element exists	`selector` (required), `wait`	Boolean
`has_text`	Check text exists	`text` (required), `wait`	Boolean

Form Interaction Tools

Tool	Description	Parameters	Key Returns
`fill_in`	Fill input field	`field`, `value` (required)	`{field, value, field_info, status}`
`select`	Select dropdown option	`value`, `dropdown_selector` (required)	`{selected_value, selected_text, options: [{selector, value, text}]}`
`check`	Check checkbox	`checkbox_selector` (required)	`{selector, was_checked, is_checked, element, status}`
`uncheck`	Uncheck checkbox	`checkbox_selector` (required)	`{selector, was_checked, is_checked, element, status}`
`choose`	Select radio button	`radio_button_selector` (required)	`{selector, radio, group: [{selector, value, checked}], status}`
`attach_file`	Upload file	`file_field_selector`, `file_path` (required)	`{field_selector, file_path, file_name, file_size, field, status}`
`click_button`	Click button	`button_text_or_selector` (required)	`{button, element, navigation, status}`
`click_link`	Click link	`link_text_or_selector` (required)	`{link, element, navigation, status}`

Page Information Tools

Tool	Description	Returns
`get_current_url`	Get current URL	Full URL string
`get_current_path`	Get current path	Path without domain
`get_page_title`	Get page title	Title string
`get_page_source`	Get HTML source	Full HTML
`get_page_context`	Get page analysis	Structured page data

Search Tools

Tool	Description	Parameters
`search_page`	Search visible content	`query` (required), `case_sensitive`, `regex`, `context_lines`, `highlight`
`search_source`	Search HTML source	`query` (required), `case_sensitive`, `regex`, `context_lines`, `show_line_numbers`

JavaScript Execution Tools

Tool	Description	Parameters	Returns
`execute_script`	Run JavaScript	`javascript_code` (required)	`{javascript_code, execution_time, timestamp, status}`
`evaluate_script`	Run JS and return result	`javascript_code` (required)	Script return value

Screenshot and Capture Tools

Tool	Description	Parameters	Key Returns
`screenshot`	Take screenshot	`filename`, `highlight_selectors`, `annotate`, `full_page`	`{file_path, filename, file_size, timestamp, url, title}`
`save_page`	Save HTML to file	`file_path` (required)	`{file_path, file_size, timestamp, url, title, status}`

Window Management Tools

Tool	Description	Parameters	Key Returns
`switch_to_window`	Switch to window/tab	`window_handle` (required)	`{window_handle, previous_window, current_url, title, total_windows}`
`open_new_window`	Open new window/tab	None	`{window_handle, total_windows, previous_windows, current_window}`
`close_window`	Close window/tab	`window_handle` (required)	`{closed_window, was_current, remaining_windows, current_window}`
`get_window_handles`	Get all window handles	None	`{current_window, windows: [{handle, index, is_current}], total_windows}`
`maximize_window`	Maximize window	None	`{size_before: {width, height}, size_after: {width, height}, status}`
`resize_window`	Resize window	`width`, `height` (required)	`{requested_size, size_before, size_after, status}`

Session Management Tools

Tool	Description	Returns
`get_session_info`	Get session information	Session details

Smart Tools (experimental)

Tool	Description	Parameters
`auto_narrate`	Generate page description	`focus_on`
`get_narration_history`	Get narration history	None
`visual_diff`	Compare screenshots	`before_path`, `after_path` (required)

Tool Response Structure

All tools now return structured data instead of simple strings. This makes it easier to:

Extract specific information from responses
Check operation success/failure
Access element properties and metadata
Navigate to specific elements using returned selectors

Example responses:

// visit tool response
{
  "url": "https://example.com",
  "current_url": "https://example.com/",
  "title": "Example Domain",
  "status": "success"
}

// find_all tool response with selectors
{
  "selector": ".item",
  "count": 3,
  "elements": [
    {
      "index": 0,
      "selector": ".item:nth-of-type(1)",
      "tag_name": "div",
      "text": "Item 1",
      "visible": true,
      "attributes": {"class": "item active"}
    },
    // ... more elements
  ]
}

// select tool response with option selectors
{
  "dropdown_selector": "#country",
  "selected_value": "US",
  "selected_text": "United States",
  "options": [
    {
      "selector": "#country option:nth-of-type(1)",
      "value": "US",
      "text": "United States",
      "selected": true
    },
    // ... more options
  ],
  "status": "selected"
}

Example Tool Calls

Here are examples using curl with the HTTP server:

# Navigate to a URL
curl -X POST http://localhost:4567/ \
  -H "Content-Type: application/json" \
  -H "X-Session-ID: alice" \
  -d '{"jsonrpc": "2.0", "id": 1, "method": "tools/call", 
       "params": {"name": "visit", "arguments": {"url": "https://example.com"}}}'

# Take an annotated screenshot
curl -X POST http://localhost:4567/ \
  -H "Content-Type: application/json" \
  -H "X-Session-ID: alice" \
  -d '{"jsonrpc": "2.0", "id": 2, "method": "tools/call",
       "params": {"name": "screenshot", 
                  "arguments": {"filename": "example", 
                              "highlight_selectors": [".error", ".warning"],
                              "annotate": true,
                              "full_page": true}}}'

# Search page content with highlighting
curl -X POST http://localhost:4567/ \
  -H "Content-Type: application/json" \
  -H "X-Session-ID: alice" \
  -d '{"jsonrpc": "2.0", "id": 3, "method": "tools/call",
       "params": {"name": "search_page",
                  "arguments": {"query": "error|warning",
                              "regex": true,
                              "highlight": true}}}'

Environment Variables

HBT_SINGLE_SESSION=true - Force single session mode in HTTP server
HBT_SHOW_HEADERS=true - Enable request header logging in HTTP server
HBT_SESSION_ID=<session_name> - Enable session persistence in stdio mode

Logging

HTTP mode: Logs to stdout
Stdio mode: Logs to .hbt/logs/PID.log to avoid interfering with MCP protocol

Tool calls are logged with format:

INFO -- HBT: CALL: ToolName [] {args} -> result
ERROR -- HBT: ERROR: ToolName [] {args} -> error_message

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake test to run the tests. You can also run bin/console for an interactive prompt.

To install this gem onto your local machine, run bundle exec rake install.

Running Tests and Linting

# Run tests
rake test

# Run linter
rake rubocop

# Run linter with auto-fix
rake rubocop -A

# Run both tests and linter (default task)
rake

Recent Improvements

Version 0.1.0

Structured tool responses - All tools now return rich JSON objects instead of simple strings
Element selectors in arrays - Tools returning multiple elements include unique selectors for each
Session persistence - Both stdio and single-session HTTP modes support persistent sessions
Strict session management - Multi-session mode requires X-Session-ID header (no auto-creation)
Improved logging - Fixed stdio mode logging to properly write to .hbt/logs/PID.log
DRY refactoring - Extracted common functionality into SessionPersistence and DirectorySetup modules
Better error handling - Tools return structured error information
Enhanced tool responses:
- Navigation tools return before/after URLs and navigation status
- Form tools return element state before/after interaction
- Window tools return comprehensive window state information
- Screenshot tool returns file metadata
- All element-finding tools return complete element information

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/parruda/headless_browser_tool.