The project is in a healthy, maintained state
Provides an MCP server with tools to control a headless browser using Capybara and Selenium
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
 Dependencies
 Project Readme

Headless Browser Tool

A headless browser control tool that provides an MCP (Model Context Protocol) server with tools to control a headless browser using Capybara and Selenium. Features multi-session support, session persistence, and both HTTP and stdio communication modes.

Features

  • Headless Chrome browser automation - Full browser control via Selenium WebDriver
  • MCP server with 40+ browser control tools - Comprehensive API for browser interactions
  • Multi-session support - Isolated browser sessions for each client
  • Session persistence - Sessions survive server restarts with cookies and state preservation
  • Two server modes - HTTP server mode and stdio mode for different integration patterns
  • Smart screenshot tools - With annotations, highlighting, and visual diff capabilities
  • AI-assisted tools - Auto-narration and intelligent page analysis
  • Comprehensive logging - Separate log files for stdio mode to avoid protocol interference
  • Structured responses - All tools return rich, structured data instead of simple strings
  • Smart element selectors - Tools returning multiple elements include selectors for each

Installation

Add this line to your application's Gemfile:

gem 'headless_browser_tool'

And then execute:

bundle install

Or install it yourself as:

gem install headless_browser_tool

Prerequisites

You need to have Chrome/Chromium browser installed on your system. The gem will use Chrome in headless mode by default.

Usage

Command Line Interface

The hbt command provides three main commands:

hbt start - Start HTTP Server Mode

Starts the MCP server as an HTTP server with SSE (Server-Sent Events) support:

hbt start [OPTIONS]

Options:

  • --port PORT - Port for the MCP server (default: 4567)
  • --headless / --no-headless - Run browser in headless mode (default: true)
  • --single-session - Use single shared browser session instead of multi-session mode
  • --session-id SESSION_ID - Enable session persistence for single session mode (requires --single-session)
  • --show-headers - Show HTTP request headers for debugging session issues

Examples:

# Start with default settings (multi-session, headless, port 4567)
hbt start

# Start in non-headless mode for debugging
hbt start --no-headless

# Start in single session mode (legacy compatibility)
hbt start --single-session

# Start in single session mode with persistence
hbt start --single-session --session-id my-app-session

# Start with request header logging
hbt start --show-headers

hbt stdio - Start Stdio Server Mode

Starts the MCP server in stdio mode for direct integration with tools that spawn subprocesses:

hbt stdio [OPTIONS]

Options:

  • --headless / --no-headless - Run browser in headless mode (default: true)

Notes:

  • Always runs in single-session mode
  • Logs to .hbt/logs/PID.log instead of stdout to avoid interfering with MCP protocol
  • Ideal for editor integrations and tools that communicate via stdin/stdout
  • Supports optional session persistence via HBT_SESSION_ID environment variable

Session Persistence in Stdio Mode:

You can enable session persistence by setting the HBT_SESSION_ID environment variable:

# First run - creates and saves session
HBT_SESSION_ID=my-editor-session hbt stdio

# Later run - restores previous session state
HBT_SESSION_ID=my-editor-session hbt stdio

When HBT_SESSION_ID is set:

  • Session state is saved to .hbt/sessions/{session_id}.json on exit
  • On startup, if the session file exists, it restores:
    • Current URL
    • Cookies
    • localStorage
    • sessionStorage
    • Window size

This is useful for editor integrations that want to maintain browser state across multiple tool invocations.

Examples:

# Start in stdio mode (headless by default, no persistence)
hbt stdio

# Start with session persistence
HBT_SESSION_ID=vscode-session hbt stdio

# Start in stdio mode with visible browser
hbt stdio --no-headless

hbt version - Display Version

Shows the current version of HeadlessBrowserTool:

hbt version

Session Management

Multi-Session Mode (Default for HTTP Server)

In multi-session mode, each client connection gets its own isolated browser session with:

  • Separate cookies and localStorage - Complete isolation between sessions
  • Independent navigation history - Each session maintains its own browser state
  • Session persistence - Sessions are saved to .hbt/sessions/ and restored on restart
  • Automatic cleanup - Idle sessions are closed after 30 minutes
  • LRU eviction - When at capacity (10 sessions), least recently used sessions are closed

Session Identification in Multi-Session Mode:

For HTTP server mode, sessions require an X-Session-ID header:

# Connect with session ID "alice"
curl -H "X-Session-ID: alice" -H "Accept: text/event-stream" http://localhost:4567/

# Different session ID gets different browser
curl -H "X-Session-ID: bob" -H "Accept: text/event-stream" http://localhost:4567/

# Without X-Session-ID header, connection is rejected
curl -H "Accept: text/event-stream" http://localhost:4567/
# Returns: 400 Bad Request - X-Session-ID header is required

Session ID Requirements:

  • Must be provided via X-Session-ID header
  • Can only contain alphanumeric characters, underscores, and hyphens
  • Maximum length: 64 characters
  • Invalid formats are rejected with 400 error

Single Session Mode

Use --single-session flag for legacy mode where all clients share one browser:

hbt start --single-session

Session Persistence in Single Session Mode:

You can enable session persistence with the --session-id flag:

# First run - creates and saves session
hbt start --single-session --session-id my-app

# Server restart - restores previous session
hbt start --single-session --session-id my-app

When --session-id is provided:

  • Session state is saved to .hbt/sessions/{session_id}.json on shutdown
  • On startup, if the session file exists, it restores browser state
  • All clients share this single persistent session
  • Compatible with stdio mode session files

This is useful for:

  • Development servers that need to maintain login state
  • Testing environments where you want consistent browser state
  • Applications that don't need multi-user isolation

Note: The --session-id flag can only be used with --single-session. In multi-session mode, session IDs are provided by clients via headers.

Session Management Endpoints

View active sessions:

curl http://localhost:4567/sessions | jq

Response:

{
  "active_sessions": ["alice", "bob"],
  "session_count": 2,
  "session_data": {
    "alice": {
      "created_at": "2024-01-20T10:00:00Z",
      "last_activity": "2024-01-20T10:05:00Z",
      "idle_time": 300.5
    }
  }
}

Close a specific session:

curl -X DELETE http://localhost:4567/sessions/alice

Directory Structure

HeadlessBrowserTool creates a .hbt/ directory with:

.hbt/
├── .gitignore      # Contains "*" to ignore all contents
├── screenshots/    # Screenshot storage
├── sessions/       # Session persistence files
└── logs/          # Log files (stdio mode only)
    └── PID.log    # Process-specific log file

MCP API

The server implements the Model Context Protocol (MCP) and responds to JSON-RPC requests.

Using with MCP Clients

For HTTP mode with proper MCP clients:

# Start server
hbt start

# MCP client should:
# 1. Connect with X-Session-ID header
# 2. Use SSE endpoint for streaming: http://localhost:4567/mcp/sse
# 3. Send commands via JSON-RPC

For stdio mode:

# MCP client spawns the process directly
hbt stdio
# Communication happens via stdin/stdout

Available Browser Tools

All tools are available through the MCP protocol. Here's a complete reference:

Navigation Tools

Tool Description Parameters Returns
visit Navigate to a URL url (required) {url, current_url, title, status}
refresh Reload the current page None {url, title, changed, status}
go_back Navigate back in browser history None {navigation: {from, to, title, navigated}, status}
go_forward Navigate forward in browser history None {navigation: {from, to, title, navigated}, status}

Element Interaction Tools

Tool Description Parameters Returns
click Click an element selector (required) {selector, element, navigation, status}
right_click Right-click an element selector (required) {selector, element, status}
double_click Double-click an element selector (required) {selector, element, status}
hover Hover mouse over element selector (required) {selector, element, status}
drag Drag element to target source_selector, target_selector (required) {source_selector, target_selector, source, target, status}

Element Finding Tools

Tool Description Parameters Key Returns
find_element Find single element selector (required) Element details with attributes
find_all Find all matching elements selector (required) {elements: [{selector, tag_name, text, visible, attributes}]}
find_elements_containing_text Find elements with text text (required), exact_match, case_sensitive, visible_only {elements: [{selector, xpath, tag, text, clickable}]}
get_text Get element text selector (required) Text content string
get_attribute Get element attribute selector, attribute (required) Attribute value
get_value Get input value selector (required) Input value
is_visible Check element visibility selector (required) Boolean
has_element Check element exists selector (required), wait Boolean
has_text Check text exists text (required), wait Boolean

Form Interaction Tools

Tool Description Parameters Key Returns
fill_in Fill input field field, value (required) {field, value, field_info, status}
select Select dropdown option value, dropdown_selector (required) {selected_value, selected_text, options: [{selector, value, text}]}
check Check checkbox checkbox_selector (required) {selector, was_checked, is_checked, element, status}
uncheck Uncheck checkbox checkbox_selector (required) {selector, was_checked, is_checked, element, status}
choose Select radio button radio_button_selector (required) {selector, radio, group: [{selector, value, checked}], status}
attach_file Upload file file_field_selector, file_path (required) {field_selector, file_path, file_name, file_size, field, status}
click_button Click button button_text_or_selector (required) {button, element, navigation, status}
click_link Click link link_text_or_selector (required) {link, element, navigation, status}

Page Information Tools

Tool Description Returns
get_current_url Get current URL Full URL string
get_current_path Get current path Path without domain
get_page_title Get page title Title string
get_page_source Get HTML source Full HTML
get_page_context Get page analysis Structured page data

Search Tools

Tool Description Parameters
search_page Search visible content query (required), case_sensitive, regex, context_lines, highlight
search_source Search HTML source query (required), case_sensitive, regex, context_lines, show_line_numbers

JavaScript Execution Tools

Tool Description Parameters Returns
execute_script Run JavaScript javascript_code (required) {javascript_code, execution_time, timestamp, status}
evaluate_script Run JS and return result javascript_code (required) Script return value

Screenshot and Capture Tools

Tool Description Parameters Key Returns
screenshot Take screenshot filename, highlight_selectors, annotate, full_page {file_path, filename, file_size, timestamp, url, title}
save_page Save HTML to file file_path (required) {file_path, file_size, timestamp, url, title, status}

Window Management Tools

Tool Description Parameters Key Returns
switch_to_window Switch to window/tab window_handle (required) {window_handle, previous_window, current_url, title, total_windows}
open_new_window Open new window/tab None {window_handle, total_windows, previous_windows, current_window}
close_window Close window/tab window_handle (required) {closed_window, was_current, remaining_windows, current_window}
get_window_handles Get all window handles None {current_window, windows: [{handle, index, is_current}], total_windows}
maximize_window Maximize window None {size_before: {width, height}, size_after: {width, height}, status}
resize_window Resize window width, height (required) {requested_size, size_before, size_after, status}

Session Management Tools

Tool Description Returns
get_session_info Get session information Session details

Smart Tools (experimental)

Tool Description Parameters
auto_narrate Generate page description focus_on
get_narration_history Get narration history None
visual_diff Compare screenshots before_path, after_path (required)

Tool Response Structure

All tools now return structured data instead of simple strings. This makes it easier to:

  • Extract specific information from responses
  • Check operation success/failure
  • Access element properties and metadata
  • Navigate to specific elements using returned selectors

Example responses:

// visit tool response
{
  "url": "https://example.com",
  "current_url": "https://example.com/",
  "title": "Example Domain",
  "status": "success"
}

// find_all tool response with selectors
{
  "selector": ".item",
  "count": 3,
  "elements": [
    {
      "index": 0,
      "selector": ".item:nth-of-type(1)",
      "tag_name": "div",
      "text": "Item 1",
      "visible": true,
      "attributes": {"class": "item active"}
    },
    // ... more elements
  ]
}

// select tool response with option selectors
{
  "dropdown_selector": "#country",
  "selected_value": "US",
  "selected_text": "United States",
  "options": [
    {
      "selector": "#country option:nth-of-type(1)",
      "value": "US",
      "text": "United States",
      "selected": true
    },
    // ... more options
  ],
  "status": "selected"
}

Example Tool Calls

Here are examples using curl with the HTTP server:

# Navigate to a URL
curl -X POST http://localhost:4567/ \
  -H "Content-Type: application/json" \
  -H "X-Session-ID: alice" \
  -d '{"jsonrpc": "2.0", "id": 1, "method": "tools/call", 
       "params": {"name": "visit", "arguments": {"url": "https://example.com"}}}'

# Take an annotated screenshot
curl -X POST http://localhost:4567/ \
  -H "Content-Type: application/json" \
  -H "X-Session-ID: alice" \
  -d '{"jsonrpc": "2.0", "id": 2, "method": "tools/call",
       "params": {"name": "screenshot", 
                  "arguments": {"filename": "example", 
                              "highlight_selectors": [".error", ".warning"],
                              "annotate": true,
                              "full_page": true}}}'

# Search page content with highlighting
curl -X POST http://localhost:4567/ \
  -H "Content-Type: application/json" \
  -H "X-Session-ID: alice" \
  -d '{"jsonrpc": "2.0", "id": 3, "method": "tools/call",
       "params": {"name": "search_page",
                  "arguments": {"query": "error|warning",
                              "regex": true,
                              "highlight": true}}}'

Environment Variables

  • HBT_SINGLE_SESSION=true - Force single session mode in HTTP server
  • HBT_SHOW_HEADERS=true - Enable request header logging in HTTP server
  • HBT_SESSION_ID=<session_name> - Enable session persistence in stdio mode

Logging

  • HTTP mode: Logs to stdout
  • Stdio mode: Logs to .hbt/logs/PID.log to avoid interfering with MCP protocol

Tool calls are logged with format:

INFO -- HBT: CALL: ToolName [] {args} -> result
ERROR -- HBT: ERROR: ToolName [] {args} -> error_message

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake test to run the tests. You can also run bin/console for an interactive prompt.

To install this gem onto your local machine, run bundle exec rake install.

Running Tests and Linting

# Run tests
rake test

# Run linter
rake rubocop

# Run linter with auto-fix
rake rubocop -A

# Run both tests and linter (default task)
rake

Recent Improvements

Version 0.1.0

  • Structured tool responses - All tools now return rich JSON objects instead of simple strings
  • Element selectors in arrays - Tools returning multiple elements include unique selectors for each
  • Session persistence - Both stdio and single-session HTTP modes support persistent sessions
  • Strict session management - Multi-session mode requires X-Session-ID header (no auto-creation)
  • Improved logging - Fixed stdio mode logging to properly write to .hbt/logs/PID.log
  • DRY refactoring - Extracted common functionality into SessionPersistence and DirectorySetup modules
  • Better error handling - Tools return structured error information
  • Enhanced tool responses:
    • Navigation tools return before/after URLs and navigation status
    • Form tools return element state before/after interaction
    • Window tools return comprehensive window state information
    • Screenshot tool returns file metadata
    • All element-finding tools return complete element information

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/parruda/headless_browser_tool.