ABProf

ABProf attempts to use simple A/B test statistical logic and apply it to the question, "which of these two programs is faster?"

Most commonly, you profile by running a program a certain number of times ("okay, burn it into cache for 100 iterations, then run it 5000 times and divide the total time by 5000"). Then, you make changes to your program and do the same thing again to compare.

Real statisticians inform us that there are a few problems with that approach :-)

We use a Welch's T Test on a set of measured runtimes to determine how likely the two programs are to be different from each other, and after the P value is low enough, we give our current estimate of which is faster and by how much.

Want a nice "getting started" introduction? Here are the original blog posts on using ABProf.

For more about the subtleties of profiling in general, may I recommend Matthew Gaudet's wonderful talk "Ruby 3x3: How Will Are We Going To Measure 3X?" from RubyKaigi 2016?

Installation

Add this line to your application's Gemfile:

gem 'abprof'

And then execute:

$ bundle

Or install it yourself as:

$ gem install abprof

Usage

Quick Start - Run Two Programs

The simplest way to use ABProf is the "abcompare" command. Give it two commands, let it run them for you and measure the results. If your command contains spaces, put it in quotes - standard shell programming.

$ abcompare "cd ../vanilla_ruby && ./tool/runruby.rb ../optcarrot/bin/optcarrot --benchmark ../optcarrot/examples/Lan_Master.nes >> /dev/null" \
  "cd ../alt_ruby && ./tool/runruby.rb ../optcarrot/bin/optcarrot --benchmark ../optcarrot/examples/Lan_Master.nes >> /dev/null"

This defaults to basic settings (10 iterations of burn-in before measuring, P value of 0.05, etc.) You can change them on the command line. Running this way is simple, straightforward, and will take a little longer to converge since it's paying the start-a-process tax every time it takes a measurement.

Run "abcompare --help" if you want to see what command-line options you can supply. For more control in the results, see below.

The abcompare command is identical to abprof except that it uses a raw command, not harness code. See below for details.

Quick Start - Test Harness

Loading and running a program is slow, and it adds a lot of variable overhead. That can make it hard to sample the specific operations that you want to measure. ABProf prefers to just do the operations you want without restarting the worker processes constantly. That takes a bit of harness code to do well.

In Ruby, there's an ABProf library you can use which will take care of that interface. That's the easiest way to use it, especially since you're running a benchmark anyway and would need some structure around your code.

For a Ruby snippet to be profiled very simply, do this:

require "abprof"

ABProf::ABWorker.iteration do
  # Code to measure goes here
  sleep 0.1
end

ABProf::ABWorker.start

With two such files, you can compare their speed.

Under the hood, ABProf's harness uses a simple communication protocol over STDIN and STDOUT to allow the controlling process to tell the workers to run iterations. Mostly that's great, but it means you'll need to make sure your worker processes aren't using STDIN for anything else.

See the examples directory for more. For instance:

abprof examples/sleep.rb examples/sleep.rb

If abprof is just in the source directory and not installed as a gem, you should add RUBYLIB="lib" before "abprof" above to get it to run.

Quick Start - Benchmark DSL

Want to make a benchmark reproducible? Want better accuracy? ABProf has a DSL (Domain-Specific Language) that can help here.

Here's a simple example:

require "abprof/benchmark_dsl"

ABProf.compare do
  warmup 10
  max_trials 5
  min_trials 3
  p_value 0.01
  iters_per_trial 2
  bare true

  report do
    10_000.times {}
  end

  report do
    sleep 0.1
  end

end

Note that "warmup" is a synonym for "burnin" here -- iterations done before ABProf starts measuring and comparing. The "report" blocks are run for the sample. You can also have a "report_command", which takes a string as an argument and uses that to take a measurement.

A Digression - Bare and Harness

"Harness" refers to ABProf's internal testing protocol, used to allow multiple processes to communicate. A "harness process" or "harness worker" means a second process that is used to take measurements, and can do so repeatedly without having to restart the process.

A "bare process" means one where the work is run directly. Either a new process is spawned for each measurement (slow, inaccurate) or a block is run in the same Ruby process (potential for inadvertent cross-talk.)

In general, for a "harness" process you'll need to put together a .rb file similar to examples/sleep.rb or examples/for_loop_10k.rb.

You can use the DSL above for either bare or harness processes ("bare true" or "bare false") without a problem. But if you tell it to use a harness, the process in question should be reading ABProf commands from STDIN and writing responses to STDOUT in ABProf protocol, normally by using the Ruby Test Harness library.

Don't Cross the Streams

Harness-enabled tests expect to run forever, fielding requests for work.

Non-harness-enabled tests don't know how to do harness stuff.

If you run the wrong way (abcompare with a harness, abprof with no harness,) you'll get either an immediate crash or running forever without ever finishing burn-in, depending which way you did it.

Normally you'll handle this by just passing your command line directly to abcompare rather than packaging it up into a separate Ruby script.

Comparing Rubies

I'm AppFolio's Ruby fellow, so I'm writing this to compare two different locally-built Ruby implementations for speed. The easiest way to do that is to build them in multiple directories, then build a wrapper that uses that directory to run the program in question.

You can see examples such as examples/alt_ruby.rb and examples/vanilla_ruby.rb and so on in the examples directory of this gem.

Those examples use a benchmark called "optcarrot" which can be quite slow. So you'll need to decide whether to do a quick, rough check with a few iterations or a more in-depth check which runs many times for high certainty.

Here's a slow, very conservative check:

abprof --burnin=10 --max-trials=50 --min-trials=50 --iters-per-trial=5 examples/vanilla_ruby.rb examples/inline_ruby_1800.rb

Note that since the minimum and maximum trials are both 50, it won't stop at a particular certainty (P value.) It will just run for 50 trials of 5 iterations each. It takes awhile, but gives a pretty good estimate of how fast one is compared to the other.

Here's a quicker, rougher check:

abprof --burnin=5 --max-trials=10 --iters-per-trial=1 examples/vanilla_ruby.rb examples/inline_ruby_1800.rb

It may stop after only a few trials if the difference in speed is big enough. By default, it uses a P value of 0.05, which is (very roughly) a one in twenty chance of a false result.

If you want a very low chance of a false positive, consider adjusting the P value downward, to more like 0.001 (0.1% chance) or 0.00001 (0.001% chance.) This may require a lot of time to run, especially if the two programs are of very similar speed, or have a lot of variability in the test results.

abprof --burnin=5 --max-trials=50 --pvalue 0.001 --iters-per-trial=1 examples/sleep.rb examples/for_loop_10k.rb

How Many Times Faster?

ABProf will try to give you an estimate of how much faster one option is than the other. Be careful taking it at face value -- if you do a series of trials and coincidentally get a really different-looking run, that may give you an unexpected P value and an unexpected number of times faster.

In other words, those false positives will tend to happen together, not independently. If you want to actually check how much faster one is than the other in a less-biased way, set the number of trials and/or iterations very high, or manually run both yourself some large number of times, rather than letting it converge to a P value and then taking the result from the output.

See the first example under "Comparing Rubies" for one way to do this. Setting the min and max trials equal is good practice for this to reduce bias.

Does This Just Take Forever?

It's easy to accidentally specify a very large number of iterations per trial, or total trials, or otherwise make testing a slow program take forever. Right now, you'll pretty much just need to notice that it's happening and drop the iters-per-trial, the min-trials, or the P value. When in doubt, try to start with just a very quick, rough test.

Of course, if your test is really slow, or you're trying to detect a very small difference, it can just take a really long time. Like A/B testing, this method has its pitfalls.

More Control of Sampling

Would you like to explicitly return the value(s) to compare? You can replace the "iteration" block above with "iteration_with_return_value" to return a measurement of your choice. That allows you to do setup or teardown inside the block that isn't necessarily counted in the total time. You can also use a custom counter or timer rather than Ruby's Time.now, which is the default for ABProf.

If you return a higher-is-better value like a counter rather than a lower-is-better value like time, you'll find that ABProf keeps telling you the lower-valued process, which may be slower rather than faster. ABProf can tell which one gets lower numbers, but it doesn't know whether that means better or worse.

That's why the console output shows the word "faster?" with a question mark. It knows it's giving you lower. It hopes that means faster.

More Samples Per Trial

Would you like to control how the N iterations (default 10) per trial get run? Want to do setup or teardown before or after them as a group, not individuall?

Replace the "iteration" block above with "n_iterations_with_return_value". Your block will take a single parameter N for the number of iterations - run the code that many times and return either a single measured speed or time, or an array of speeds or times, which will be your samples.

Note: this technique has some subtleties -- you're better off not doing this to rapidly collect many, many samples of very small performance differences. If you do, transient conditions like background processes can skew the results a lot when many T-test samples are collected in a short time. You're much better off running the same operation many times and returning the cumulative value in those cases, or otherwise controlling for transient conditions that drift over time.

In those cases, either set the iters-per-trial very low (likely to 1) so that both processes are getting the benefit/penalty from transient background conditions, or set the number of iterations per trial very high so that each trial takes several seconds or longer, to allow transient conditions to pass.

ABProf also runs the two processes' iterations in a random order by default, starting from one process or the other based on a per-trial random number. This helps a little, but only a little. If you don't want ABProf to do that for some reason, turn on the static_order option to get simple "process1 then process2" order for every trial.

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake test to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.

Credit Where Credit Is Due

I feel like I maybe saw this idea (use A/B test math for a profiler) somewhere else before, but I can't tell if I really did or if I misunderstood or hallucinated it. Either way, why isn't this a standard approach that's built into most profiling tools?

After I started implementation I found out that optcarrot, used by the Ruby core team for profiling, is already using this technique (!) -- I am using it slightly differently, but I'm clearly not the first to think of using a statistics test to verify which of two programs is faster.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/appfolio/abprof. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.

License

The gem is available as open source under the terms of the MIT License.

abprof

Development

Runtime

ABProf

Installation

Usage

Quick Start - Run Two Programs

Quick Start - Test Harness

Quick Start - Benchmark DSL

A Digression - Bare and Harness

Don't Cross the Streams

Comparing Rubies

How Many Times Faster?

Does This Just Take Forever?

More Control of Sampling

More Samples Per Trial

Development

Credit Where Credit Is Due

Contributing

License