Kino
Kino is a high-performance Ractor web server for Ruby 4.0+.
Ruby threads cannot run Ruby code in parallel, so production setups fork a process per core and pay for each copy in memory. Kino runs your code on every core in one small process. A Rust (tokio + hyper) front-end owns the network, parallel Ractors run your Rack 3 app, and a threaded fallback mode runs everything else, Rails included.
- Fast. On a real 8-core server, every Kino mode is 1.5-2× ahead of a Puma fork cluster on I/O-light endpoints. Ractor mode also wins on pure CPU, 30%+. Benchmarks below.
- A fraction of the memory. One process instead of a fork per core: about 15× less memory than the Puma cluster under the same load, and 8× less when serving the Rails hello-world.
- Parallel without forking. Ractor mode runs CPU work more than 5× faster than Kino's own GVL-bound threaded mode, in the same small process.
- Production plumbing included. Graceful drain, crash supervision and respawn, bounded queues with 503 backpressure, request timeouts, TLS (rustls), live stats, async access and app logging.
-
Tells you why.
kino --checklists exactly what blocks your app from ractor mode, finding by finding, so you do not have to decodeRactor::IsolationErroryourself. -
Puma-shaped. The same
workers × threadstopology, a familiar config DSL, akinoCLI. If you can run Puma, you can run Kino.
N.B.: Ractors are officially experimental in Ruby 4.0, and so is this server. The threaded mode is solid. Still, Kino aims to be the best way to experiment with Ractors today—and the best Ractor server when they become stable.
Table of Contents
- Why
- Benchmarks
- Install
- Usage
- Config file and CLI
kino --check- Request timeouts
- Stats
- Logging
- Timer waits
- Rack 3 compliance
- Rails
Why
The GVL allows only one Ruby thread to run at a time. To use all cores,
Ruby servers fork processes, and every fork costs a full copy of the
app. Ractors do not have this limit: each one has its own lock, so one
process can run Ruby in parallel. What was missing is a server that
dispatches requests to them. Ruby 4.0 reworked Ractors (Ractor::Port,
shareable_proc, less lock contention) and made this worth building.
Why a Ractor server has to be built this way, and which Rust parts make Ractors fast here: doc/why-kino.md. The full design notes live in doc/architecture.md.
Benchmarks
Measured on a real server: AWS c7a.2xlarge (8-core AMD EPYC 9R14, 16 GB, Amazon Linux 2023). This is a realistic app-server size. The same Ractor-shareable app runs on every server, Ruby 4.0.5 with YJIT, every server at its defaults: Puma forks 8 workers × 3 threads, Kino stays in one process (8 workers; 1 thread each in ractor modes, 3 in threaded). Numbers are req/s by wrk (8-second windows, 64 connections, same host). Methodology and the analysis behind every column: doc/benchmarks.md.
| endpoint | Kino :ractor | + lanes | :ractor, workers 32² |
Kino :threaded | Puma (cluster) |
|---|---|---|---|---|---|
| /plaintext | 229,565 | 244,340 | 156,118 | 217,619 | 118,190 |
| /10k | 179,119 | 188,258 | 134,457 | 157,147 | 105,588 |
| /cpu (fib) | 76,922¹ | 73,136 | 62,406 | 13,499 | 58,337 |
| /io (5 ms) | 1,548 | 1,548 | 5,935 | 4,715 | 4,687 |
| /io_native | 1,570 | 1,571 | 6,289 | 4,717 | 4,695 |
Memory on the same box, RSS after sustained load:
| serving | Kino (one process) | Puma cluster (8 workers) |
|---|---|---|
| bench app, :ractor | 80 MB | 1,256 MB |
| bench app, :threaded | 151 MB³ | 1,256 MB |
| Rails hello-world | 97 MB | 797 MB |
"+ lanes" is the experimental per-worker-queue dispatcher (lanes true).
It posts the fastest plaintext/10k of any configuration here. Details:
doc/benchmarks.md.
¹ Stock settings, no tuning. Ractor mode beats the fork cluster on pure
CPU by +32% (+25% with lanes). Threaded mode shows the GVL ceiling that
every single-process Ruby server hits. The old CPU-tuning recipe is
retired: its threads 1 half is the default now, and its
tokio_threads 1 half costs −12% on real hardware; see
doc/benchmarks.md.
² Wait-bound throughput is slots ÷ wait, and the default columns bring
8 single-thread workers against the cluster's 24 threads. Kino slots
are threads, not processes—when your app waits a lot, raise workers.
The workers 32 column is that tuning: +27% over the cluster on /io
(+34% via Kino.sleep) while still ahead of it on pure CPU, all in
one small process. The cost is the CPU-light rows (32 ractors
oversubscribe 8 cores); pick the topology your app's wait profile
needs. See
doc/benchmarks.md.
³ With MALLOC_ARENA_MAX=2 (the standard Ruby deployment setting;
Heroku's default). Without it, 24 threads churning 10 KB responses
through one glibc heap balloon to ~600 MB—an arena-fragmentation
footgun, not a leak, and ractor mode sidesteps it. See
doc/benchmarks.md.
A common first idea is to keep your current server and wrap the app in a ractor pool. We measured that too (same box; the analysis is in the doc):
| endpoint | Kino :ractor (8×3) | Puma + ractor wrapper | Falcon + ractor wrapper |
|---|---|---|---|
| /plaintext | 199,032 | 19,532 | 100,342 |
| /cpu (fib) | 68,238 | 17,323 | 48,561 |
| /io (5 ms) | 4,531 | 1,452 | 1,544 |
In short: ractor mode beats fork-level CPU parallelism (5.7× Kino's own GVL-bound threaded mode, +32% over the cluster) in one process, at about 1/16th of the cluster's memory. Every Kino mode is 1.5-2.1× ahead of the cluster on I/O-light endpoints. The macOS numbers (secondary; everything there hits the loopback ceiling) and the YJIT × Ractors gotcha are in doc/benchmarks.md.
Reproduce: bench/run.sh [seconds] [concurrency] for the main table,
bench/studies.sh for the follow-ups (CPU recipe, topology, scaling,
logging, memory).
Install
You need Ruby >= 4.0. Add Kino to your application's bundle:
bundle add kino # or: gem install kino (outside a bundle)or put it in the Gemfile yourself:
gem "kino", "~> 0.1"Then generate a config and serve:
bundle exec kino --init # writes kino.rb; every directive documented in place
bundle exec kino # picks up config.ru + kino.rb, serves on :9292(After a standalone gem install, the kino command works without
bundle exec.)
No Rust compiler needed: released versions ship precompiled native gems for Linux (x86_64/aarch64, glibc and musl) and macOS (arm64). On other platforms the gem compiles at install time; that needs a Rust toolchain, plus clang/libclang on Linux.
Usage
require "kino"
# Ractor mode needs a Ractor-shareable app: capture nothing, freeze config.
app = Ractor.shareable_proc do |env|
[200, { "content-type" => "text/plain" }, ["Hello from #{Ractor.current}"]]
end
Kino::Server.run(app, port: 9292) # traps INT/TERM; Ctrl-C drains gracefullyOr embedded, with everything spelled out:
server = Kino::Server.new(app,
bind: "127.0.0.1",
port: 9292, # 0 = ephemeral; read back via server.port
workers: Etc.nprocessors, # ractors (parallelism)
threads: 1, # per worker; ractor default 1, threaded default 3
mode: :auto, # :auto | :ractor | :threaded
queue_depth: 1024, # bounded queue; overflow → 503
queue_timeout: 5.0, # seconds before 503 on a full queue
request_timeout: nil, # seconds before a slow response becomes a 504 (nil = off)
shutdown_timeout: 30, # drain deadline
tls: { cert: "cert.pem", key: "key.pem" }, # file paths or inline PEM
)
server.start
server.shutdown # graceful: drain → deadline → abort stragglersModes
-
:ractor:workersRactors ×threadsThreads each. The app must beRactor.shareable?(frozen middleware,shareable_procendpoints). Forcing:ractorwith an unshareable app raisesKino::UnshareableAppError. A crashed ractor returns 500 to its in-flight requests right away, then respawns. -
:threaded: the same machinery onworkers × threadsplain Threads. Runs any Rack app, including Rails, today. Parallel for I/O, serialized by the GVL for CPU. -
:auto(default)::ractorwhen the app is shareable, otherwise a warning and:threaded. One caveat: a class used as a Rack app always counts as "shareable" (classes are), even if calling it touches unshareable state. Force:threadedfor those.
Config file and CLI
Settings can live in a Puma-style Ruby DSL file. Precedence: explicit kwargs and CLI flags > config file > defaults.
# kino.rb
port 9292
workers 8
threads 1
mode :ractorkino --init # write a fully commented sample kino.rb
kino # config.ru + kino.rb, port 9292
kino --check # explain whether the app can run in :ractor mode
kino -C config/kino.rb -p 3000 -w 4 -m ractor my_app.ruThe generated sample documents every directive, including the Rails settings and the performance notes.
kino --check
When an app cannot run in :ractor mode, Kino can tell you why, instead
of leaving you with a bare Ractor::IsolationError. The check changes
nothing (it does not freeze your objects) and names each blocker:
captured variables with the place they were defined, instance variables
by path, and the class-level instance variable trap that catches
class-style apps:
$ kino --check
check: app is NOT Ractor-shareable
- app (Proc at app.rb:12)—captures `cache` = {} (Hash) (unshareable)
- app (HelloApp).@instance—class-level ivar holds #<HelloApp…>—classes
pass Ractor.shareable?, but reading this from a worker ractor raises
Ractor::IsolationError on the first request
hints: freeze config at boot; build endpoints with Ractor.shareable_proc;
keep per-worker resources in Ractor.store_if_absent; or run mode :threaded.
Exit status is 0/1, so it works in CI. The programmatic form is
Kino::Check.report(app).
Request timeouts
request_timeout: seconds (or request_timeout 30 in kino.rb) limits
how long the app may take to produce a response. Past the deadline the
client gets an immediate 504 while the handler keeps running; its
late response is dropped without harm. Off by default. The handler is
deliberately not killed, because interrupting arbitrary Ruby mid-flight
is unsafe. A stuck handler still occupies its worker slot until it
returns, so set the deadline above your slowest legitimate endpoint and
watch stats[:timeouts].
Stats
server.stats returns a live snapshot: the configuration plus counters
from the native layer (one relaxed atomic per request, no measurable
cost):
server.stats
# => {mode: :ractor, lanes: false, workers: 8, threads: 1, batch: 1,
# respawns: 0, queued: 0, in_flight: 2, served: 1041, rejected: 0,
# timeouts: 0}
# plus lane_depths: [...] when lane dispatch is onFrom the outside, kill -USR1 <pid> prints the same snapshot as one line
(pair it with pidfile to find the pid):
Kino stats: mode=:ractor lanes=false workers=8 threads=1 batch=1 respawns=0 queued=0 in_flight=2 served=1041 rejected=0 timeouts=0
Logging
With one log line per request, Kino::Logger sustained 2.4× the
throughput of a shared ::Logger (149k vs 63k req/s on the benchmark
box). There are two native pieces. Both write through a lock-free
channel to a Rust flusher thread, so request threads never take a log
mutex and never make a write syscall:
-
Access log (
log_requests true): one line per request to stdout, including the 503s that never reach your app. Recommended in development; cheap enough for production. On color terminals the lines are tinted by status class: 2xx green, 3xx yellow, 4xx maroon, 5xx bright red:127.0.0.1 [Tue, 10 Jun 2026 13:39:56 GMT] "GET / HTTP/1.1" 200 0.1ms -
Kino::Logger: a::Loggerover the same async sink, for your app's own logging (Kino::Logger.new("log/production.log"), or no argument for stdout). The raw IO-like device isKino::Logger::Device, for integrations that want bytes without::Loggerformatting. The device is frozen and Ractor-shareable, so one device serves every worker.
Kino::Logger in a Rails app: it is a real ::Logger subclass, so
it fits anywhere Rails expects a logger:
# config/environments/production.rb, simplest forms:
config.logger = Kino::Logger.new # stdout
config.logger = Kino::Logger.new("log/production.log") # file
# both file and stdout:
config.logger = ActiveSupport::BroadcastLogger.new(
Kino::Logger.new("log/production.log"), Kino::Logger.new
)
# tagged logging wraps it like any ::Logger:
config.logger = ActiveSupport::TaggedLogging.new(Kino::Logger.new)From a plain Rack app, give middleware the logger, or hand
Rack::CommonLogger the raw device (it just calls write):
# config.ru
use Rack::CommonLogger, Kino::Logger::Device.new # access-style app log
run MyApp(If you only want request lines, prefer Kino's own log_requests true.
It is free for your Ruby threads, and it also sees the 503s that never
reach Rack.)
Graceful shutdown drains both logs fully. A hard crash can lose the tail of the buffer, and when you log faster than the disk can take (over 100k lines/s), the sink drops lines instead of blocking request threads. These trade-offs are measured in doc/benchmarks.md.
Timer waits
Kino.sleep(seconds) is a high-resolution sleep on the OS clock with
the GVL released. MRI's own sleep wakes up late inside non-main
ractors (details and numbers in doc/benchmarks.md).
Use Kino.sleep for explicit timer waits in handlers. Ordinary blocking
I/O does not need it.
Rack 3 compliance
The spec suite runs every test app under Rack::Lint over real sockets:
streaming request bodies (forward-only rack.input), enumerable and
callable (full-duplex stream) response bodies, lowercase and multi-value
headers, HEAD/204 semantics. Full hijack is left out on purpose; it is
optional in Rack 3.
Rails
Rails (edge) runs on Kino today in :threaded mode; see
examples/rails-hello. Ractor-mode Rails is blocked upstream. The exact
blockers, the Ruby::Box findings, and what would unlock it are written
up in doc/rails-on-ractors.md. The example
ships a probe script that re-tests against whatever Rails you bundle.
Development
bin/setup
bundle exec rake # compile, Rust tests, specs, RBS, lint
RB_SYS_CARGO_PROFILE=dev bundle exec rake compile # fast dev rebuildsAssisted by
Claude Code (Mythos, Opus).
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/yaroslav/kino.
License
The gem is available as open source under the terms of the MIT License.