Project

llm_bench

0.0
No release in over 3 years
LLM Bench is a Ruby gem that allows you to benchmark and compare the performance of different Large Language Model providers and APIs. It supports both OpenAI and Anthropic-compatible API formats, provides parallel execution, and includes continuous tracking capabilities with CSV export.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
 Dependencies

Runtime

~> 1.1
 Project Readme

LLMBench

A standalone Ruby gem for benchmarking and comparing the performance of different Large Language Model providers and APIs.

Features

  • Support for both OpenAI and Anthropic-compatible API formats
  • Parallel execution across multiple models and providers
  • Continuous tracking with CSV export functionality
  • No external dependencies - uses only Ruby standard library

Installation

Using Ruby (Recommended)

Important: This is a standalone executable gem, not a library for use in other applications. Install it system-wide:

gem install llm_bench

Do not add this gem to your application's Gemfile - it is designed to be used as a command-line tool only.

Using Docker

If you don't have Ruby installed or prefer containerized environments, you can use the Docker image:

# Build the Docker image
docker build -t llm_bench .

# Or use the pre-built image
docker pull vitobotta/llm-bench:v3

The Docker image includes everything needed to run llm_bench without installing Ruby locally.

Usage

Configuration

Create a configuration file named models.yaml in your current directory, or specify a custom path with the --config argument:

prompt: "Explain the concept of machine learning in simple terms in exactly 300 words..."

providers:
  - name: "openai"
    base_url: "https://api.openai.com/v1"
    api_key: "your-api-key-here"
    models:
      - nickname: "gpt-4"
        id: "gpt-4"
        api_format: "openai"

  - name: "anthropic"
    base_url: "https://api.anthropic.com"
    api_key: "your-api-key-here"
    models:
      - nickname: "claude"
        id: "claude-3-sonnet-20240229"
        api_format: "anthropic"

Commands

Benchmark a single model:

llm_bench --config ./my-config.yaml --provider openai --model gpt-4

Benchmark all configured models:

llm_bench --all

Benchmark all models with custom config:

llm_bench --config ./my-config.yaml --all

Enable continuous tracking:

llm_bench --config ./my-config.yaml --all --track

Enable continuous tracking with custom interval (default is 600 seconds):

llm_bench --config ./my-config.yaml --all --track --interval-in-seconds 300

Enable continuous tracking with custom output file:

llm_bench --config ./my-config.yaml --all --track --output-file ./results/benchmark_results.csv

Print full responses:

llm_bench --config ./my-config.yaml --provider openai --model gpt-4 --print-result

Show version information:

llm_bench --version

Note: If no --config argument is provided, llm_bench will look for models.yaml in the current directory. If the configuration file is not found, an error will be displayed. When using --track, you can optionally specify --interval-in-seconds to control the frequency of benchmark cycles (default: 600 seconds) and --output-file to specify the CSV output path (default: llm_benchmark_results_TIMESTAMP.csv in current directory).

Docker Usage

When using Docker, you need to mount your configuration file and any output directories:

# Benchmark a single model with Docker
docker run -v $(pwd)/my-config.yaml:/data/models.yaml \
           -v $(pwd)/results:/data/results \
           llm_bench --provider openai --model gpt-4

# Benchmark all models with Docker
docker run -v $(pwd)/models.yaml:/data/models.yaml \
           -v $(pwd)/results:/data/results \
           llm_bench --all

# Enable continuous tracking with Docker
docker run -v $(pwd)/models.yaml:/data/models.yaml \
           -v $(pwd)/results:/data/results \
           llm_bench --all --track

# Enable continuous tracking with custom interval (5 minutes) using Docker
docker run -v $(pwd)/models.yaml:/data/models.yaml \
           -v $(pwd)/results:/data/results \
           llm_bench --all --track --interval-in-seconds 300

# Enable continuous tracking with custom output file using Docker
docker run -v $(pwd)/models.yaml:/data/models.yaml \
           -v $(pwd)/results:/data/results \
           llm_bench --all --track --output-file /data/results/custom_benchmark.csv

The Docker container uses /data as the working directory, so mount your config file to /data/models.yaml (or use the --config argument with the mounted path) and mount any directories where you want to save output files.

Development

After checking out the repo, run bin/setup to install dependencies. You can also run bin/console for an interactive prompt that will allow you to experiment.

To build and install the gem locally:

gem build llm_bench.gemspec
gem install ./llm_bench-0.1.0.gem

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/vitobotta/llm-bench.

License

The gem is available as open source under the terms of the MIT License.