Flaky
Track, rank, and reproduce flaky CI test failures in Rails projects.
Flaky fetches test results from your CI provider, stores failures in a local SQLite database, ranks tests by flakiness, and helps reproduce failures under simulated CI conditions.
Installation
Add to your Gemfile:
# From GitHub
gem 'flaky', github: 'Flytedesk/flaky', group: [:development, :test]
# Or from a local path during development
gem 'flaky', path: '../flaky', group: [:development, :test]Then bundle install.
Configuration
Create an initializer (e.g. config/initializers/flaky.rb):
if defined?(Flaky)
Flaky.configure do |c|
c.provider = :semaphore # or :github_actions
c.project = "my-project" # CI project name
c.branch = "main" # branch to track
end
endPrerequisites by provider
Semaphore: Install and authenticate the sem CLI.
GitHub Actions: Install and authenticate the gh CLI.
Rake Tasks
rake flaky:fetch[age]
Fetch recent CI results and store failures in the local database.
rake flaky:fetch # last 24 hours (default)
rake flaky:fetch[168h] # last 7 days
rake flaky:fetch[2160h] # last 90 daysFor each workflow on the configured branch, fetches all test job logs, parses RSpec output for failures and random seeds, and inserts new records into tmp/flaky.db.
rake flaky:rank[since]
Rank flaky tests by failure frequency and suggest the next one to investigate.
rake flaky:rank # last 30 days (default)
rake flaky:rank[7] # last 7 daysOutput:
Flaky tests on main (last 30 days, 42 CI runs):
Fails Location Last Failure
------------------------------------------------------------------------------------------
5 ...spec/system/inventory_search_modal_spec.rb:83 2026-04-12 09:15:22
> Next to investigate: packs/.../inventory_search_modal_spec.rb:83
Inventory search modal filters by enrollment
Seeds: 6432, 51203, 8891
rake flaky:history[spec_location]
Show the full failure timeline for a specific test, including every seed and CI job it failed in.
rake flaky:history[inventory_search_modal_spec.rb:83]
rake flaky:history[inventory_search_modal_spec.rb] # all failures in this filerake flaky:stress[spec,iterations,seed,ci]
Run a test repeatedly to reproduce a flaky failure or prove a fix is stable.
# 20 iterations with random seeds
rake flaky:stress[path/to/spec.rb:83]
# 50 iterations with a specific seed and CI simulation
rake flaky:stress[path/to/spec.rb:83,50,6432,true]Arguments:
-
spec(required) -- spec file path, optionally with line number -
iterations-- number of runs (default: 20) -
seed-- RSpec random seed; omit for random each run -
ci--trueto enable CI environment simulation (default: false)
Results are recorded to the database and shown in rake flaky:report.
rake flaky:report
Summary dashboard showing overall flaky test health.
rake flaky:reportOutput:
=== Flaky Test Report (main) ===
CI Runs tracked: 42
Failed runs: 8 (19.0%)
Total test failures: 14
Unique flaky specs: 6
Last fetch: 2026-04-14 20:39:57
7-day trend: 3 failures (prior 7 days: 5)
v Trending better
Top 5 flaky tests:
--------------------------------------------------------------------------------
1. packs/.../inventory_search_modal_spec.rb:83 (5x)
Inventory search modal filters by enrollment
Recent stress runs:
--------------------------------------------------------------------------------
packs/.../inventory_search_modal_spec.rb:83 -- 18/20 passed (10.0% failure rate) [CI sim]
CI Simulation
When ci=true is passed to rake flaky:stress, the gem simulates CI environment constraints:
-
Rack middleware latency -- adds 30ms delay per HTTP request (approximates the difference between a Mac and an f1-standard-2 CI machine). Configurable via
FLAKY_LATENCY_MSenv var. -
Reduced Puma threads -- the host app should conditionally reduce Capybara's Puma threads when
FLAKY_CI_SIMULATE=1is set:
# spec/support/capybara_drivers.rb (or equivalent)
max_threads = ENV["FLAKY_CI_SIMULATE"] ? 2 : 8
Capybara.server = :puma, { Silent: true, Threads: "1:#{max_threads}" }The middleware is auto-inserted by the Railtie in test environment when FLAKY_CI_SIMULATE=1.
Database
Failures are stored in SQLite at tmp/flaky.db (auto-created on first use). The schema is managed internally and migrated automatically.
Tables:
-
ci_runs-- one row per CI workflow on the tracked branch -
job_results-- one row per test job (unit tests, system tests, etc.) -
test_failures-- one row per individual test failure with spec file, line, description, and seed -
stress_runs-- one row per stress test session
The database is local and should be gitignored (typically already is via tmp/).
Custom Providers
To add a CI provider, implement the three-method interface and register it:
class Flaky::Providers::CircleCI < Flaky::Providers::Base
def fetch_workflows(age: "24h")
# Return [{ id:, pipeline_id:, branch:, created_at: }, ...]
end
def fetch_jobs(pipeline_id:)
# Return [{ id:, name:, block_name:, result: }, ...]
end
def fetch_log(job_id:)
# Return raw log string
end
end
Flaky.register_provider(:circleci, Flaky::Providers::CircleCI)The log parser is CI-agnostic -- it extracts failures, seeds, and counts from standard RSpec output. Your provider just needs to return the raw log text.
Typical Workflow
# 1. Fetch recent CI data
rake flaky:fetch[168h]
# 2. See what's flaky
rake flaky:rank
# 3. Investigate the top offender
rake flaky:history[the_flaky_spec.rb:42]
# 4. Try to reproduce it locally with CI simulation
rake flaky:stress[the_flaky_spec.rb:42,30,6432,true]
# 5. Fix the test, then prove the fix holds
rake flaky:stress[the_flaky_spec.rb:42,50,,true]
# 6. Check overall health
rake flaky:reportLicense
MIT