Project

rperf

0.0
The project is in a healthy, maintained state
A safepoint-based sampling performance profiler that uses actual time deltas as weights to correct safepoint bias. Outputs JSON, pprof, collapsed stacks, or text report.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
 Dependencies

Development

 Project Readme

rperf logo

rperf

Know where your Ruby spends its time — accurately.
A sampling profiler that corrects safepoint bias using real time deltas.

Gem Version Ruby >= 3.4.0 Manual MIT License

Built-in flamegraph viewer  ·  CPU mode & wall mode (GVL + GC tracking)

Web site, Online manual, GitHub repository

See It in Action

$ rperf exec ruby fib.rb

 Performance stats for 'ruby fib.rb':

         2,023.3 ms   user
             4.3 ms   sys
         2,001.8 ms   real

         2,000.3 ms 100.0%  [Rperf] CPU execution
             3.0 ms         [Ruby ] GC time (4 count: 2 minor, 2 major)
          48,741            [Ruby ] allocated objects
          27,034            [Ruby ] freed objects
               1            [Ruby ] detected threads
              16 MB         [OS   ] peak memory (maxrss)
           5,784            [OS   ] page faults (5,783 minor, 1 major)
              22            [OS   ] context switches (13 voluntary, 9 involuntary)
               0 MB         [OS   ] disk I/O (0 MB read, 0 MB write)

 Flat:
         1,998.4 ms  99.9%  Object#fibonacci (fib.rb)
             1.9 ms   0.1%  Module#method_added (<C method>)

 Cumulative:
         2,000.3 ms 100.0%  <main> (fib.rb)
         1,998.4 ms  99.9%  Object#fibonacci (fib.rb)
             1.9 ms   0.1%  Module#method_added (<C method>)

  1999 samples / 1999 triggers, 0.1% profiler overhead

Quick Start

# Performance summary (wall mode, prints to stderr)
rperf stat ruby app.rb

# Record a profile to file
rperf record ruby app.rb                     # → rperf.json.gz (cpu mode, default)
rperf record -m wall ruby server.rb          # wall mode

# View results in browser
rperf report                                 # open rperf.json.gz in viewer
rperf report --top profile.json.gz           # print top functions to terminal

# Compare two profiles (requires Go)
rperf diff before.json.gz after.json.gz      # open diff in browser

On rperf report, you can see the profile result like this page: rprof viewer

Ruby API

require "rperf"

# Block form — profiles and saves to file
Rperf.start(output: "profile.json.gz", frequency: 500, mode: :cpu) do
  # code to profile
end

# Manual start/stop
Rperf.start(frequency: 1000, mode: :wall)
# ...
data = Rperf.stop
Rperf.save("profile.json.gz", data)

In-browser Viewer

# config.ru
require "rperf/viewer"
require "rperf/rack"

Rperf.start(mode: :wall, defer: true)
use Rperf::Viewer           # visit /rperf/ for flamegraph UI
use Rperf::RackMiddleware   # labels each request
run MyApp

# Snapshot every 60 minutes
Thread.new { loop { sleep 3600; Rperf::Viewer.instance&.take_snapshot! } }

Note: Rperf::Viewer has no built-in authentication. In production, restrict access with your framework's auth mechanisms (e.g., route constraints in Rails). See the manual for examples.

Environment Variables

Profile without code changes (e.g., Rails):

RPERF_ENABLED=1 RPERF_MODE=wall ruby app.rb    # → rperf.json.gz
rperf report                                    # open in viewer

Run rperf help for full documentation, or see the online manual.

Subcommands

Inspired by Linux perf — familiar subcommand interface for profiling workflows.

Command Description
rperf record Profile a command and save to file (default: .json.gz)
rperf stat Profile a command and print summary to stderr
rperf exec Profile a command and print full report to stderr
rperf report Open viewer for .json.gz; wraps go tool pprof for .pb.gz (requires Go)
rperf diff Compare two profiles (requires Go)
rperf help Show full reference documentation

How It Works

The Challenge: Safepoint Sampling

Most Ruby profilers (e.g., stackprof) use signal handlers to capture stack traces at the exact moment the timer fires. rperf takes a different approach — it samples at safepoints (VM checkpoints), which is safer (no async-signal-safety concerns, reliable access to VM state) but means the sample timing can be delayed. Without correction, this delay would skew the results.

The Fix: Weight = Real Time

rperf uses actual elapsed time as sample weights — so delayed samples carry proportionally more weight, and the profile matches reality:

Timer (signal or thread)         VM thread (postponed job)
────────────────────────         ────────────────────────
  every 1/frequency sec:          at next safepoint:
    rb_postponed_job_trigger()  →   rperf_sample_job()
                                      time_now = read_clock()
                                      weight = time_now - prev_time
                                      record(backtrace, weight)

On Linux, the timer uses timer_create + signal delivery to a dedicated worker thread. On other platforms, a dedicated pthread with nanosleep is used.

If a safepoint is delayed, the sample carries proportionally more weight. The total weight equals the total time, accurately distributed across call stacks.

Modes

Mode Clock What it measures
cpu (default) CLOCK_THREAD_CPUTIME_ID CPU time consumed (excludes sleep/I/O)
wall CLOCK_MONOTONIC Real elapsed time (includes everything)

Use cpu to find what consumes CPU. Use wall to find what makes things slow (I/O, GVL contention, GC).

GVL and GC Labels

rperf hooks GVL and GC events to attribute non-CPU time. These are recorded as labels on samples rather than synthetic stack frames:

Label (key=value) Mode Meaning
%GVL=blocked wall only Off-GVL time (I/O, sleep, C extension releasing GVL)
%GVL=wait wall only Waiting to reacquire the GVL (contention)
%GC=mark cpu and wall Time in GC mark phase (wall time)
%GC=sweep cpu and wall Time in GC sweep phase (wall time)

Why rperf?

  • Accurate despite safepoints — Safepoint sampling is safer (no async-signal-safety issues), but normally inaccurate. rperf compensates with real time-delta weights, so profiles faithfully reflect where time is actually spent.
  • See the whole picture (wall mode) — GVL contention, off-GVL I/O, GC marking/sweeping — all attributed to the call stacks responsible, via sample labels.
  • Built-in viewer — Flamegraph, Top, Tags tabs with interactive tag filtering. No external tools needed to analyze profiles.
  • Low overhead — Signal-based timer on Linux (no extra thread). ~1–5 us per sample.
  • Zero code changes — Profile any Ruby program via CLI or environment variables. Drop-in for Rails, too.
  • perf-like CLIrecord, stat, report, diff — if you know Linux perf, you already know rperf.
  • Multi-process — automatically profiles forked/spawned Ruby child processes (e.g., Unicorn/Puma workers). Use --no-inherit to disable.

Limitations

  • Method-level only — no line-level granularity.
  • Ruby >= 3.4.0 — uses recent VM internals (postponed jobs, thread event hooks).
  • POSIX only — Linux, macOS. No Windows.

Output Formats

Format Extension Viewer
JSON (default) .json.gz rperf report (built-in viewer), Rperf.load, any JSON tool
pprof .pb.gz go tool pprof (requires Go), speedscope
collapsed .collapsed FlameGraph, speedscope
text .txt any text viewer

Format is auto-detected from extension, or set explicitly with --format.

License

MIT