ChronoMachines
The temporal manipulation engine that rewrites the rules of retry!
A sophisticated Ruby implementation of exponential backoff and retry mechanisms, built for temporal precision in distributed systems where time itself is your greatest ally.
Quick Start
gem 'chrono_machines'
class PaymentService
include ChronoMachines::DSL
chrono_policy :stripe_payment, max_attempts: 5, base_delay: 0.1, multiplier: 2
def charge(amount)
with_chrono_policy(:stripe_payment) do
Stripe::Charge.create(amount: amount)
end
end
end
# Or use it directly
result = ChronoMachines.retry(max_attempts: 3) do
perform_risky_operation
end
A Message from the Time Lords
So your microservices are failing faster than your deployment pipeline can recover, and you're stuck in an infinite loop of "let's just add more retries"?
Welcome to the temporal wasteland—where every millisecond matters, exponential backoff is law, and jitter isn't just a feeling you get when watching your error rates spike.
Still here? Excellent. Because in the fabric of spacetime, nobody can hear your servers screaming about cascading failures. It's all just timing and patience.
The Pattern Time Forgot
Built for Ruby 3.2+ with fiber-aware sleep and full jitter implementation, because when you're manipulating time itself, precision matters.
Features
- Temporal Precision - Full jitter exponential backoff with microsecond accuracy
- Advanced Retry Logic - Configurable retryable exceptions and intelligent failure handling
- Rich Instrumentation - Success, retry, and failure callbacks with contextual data
- Fallback Mechanisms - Execute alternative logic when all retries are exhausted
- Declarative DSL - Clean policy definitions with builder patterns
- Fiber-Safe Operations - Async-aware sleep handling for modern Ruby
- Custom Exceptions - MaxRetriesExceededError with original exception context
- Policy Management - Named retry policies with inheritance and overrides
- Robust Error Handling - Interrupt-preserving sleep with graceful degradation
The Temporal Manifesto
You Think: "I'll just add retry
and call it resilience!"
Reality: You're creating a time paradox that crashes your entire fleet
When your payment service fails, you don't want to hammer it into submission. You want to approach it like a time traveler—carefully, with exponential patience, and a healthy respect for the butterfly effect.
Core Usage Patterns
Direct Retry with Options
# Simple retry with exponential backoff
result = ChronoMachines.retry(max_attempts: 3, base_delay: 0.1) do
fetch_external_data
end
# Advanced configuration
result = ChronoMachines.retry(
max_attempts: 5,
base_delay: 0.2,
multiplier: 3,
max_delay: 30,
retryable_exceptions: [Net::TimeoutError, HTTPError],
on_retry: ->(exception:, attempt:, next_delay:) {
Rails.logger.warn "Retry #{attempt}: #{exception.message}, waiting #{next_delay}s"
},
on_failure: ->(exception:, attempts:) {
Metrics.increment('api.retry.exhausted', tags: ["attempts:#{attempts}"])
}
) do
external_api_call
end
Policy-Based Configuration
# Configure global policies
ChronoMachines.configure do |config|
config.define_policy(:aggressive, max_attempts: 10, base_delay: 0.01, multiplier: 1.5)
config.define_policy(:conservative, max_attempts: 3, base_delay: 1.0, multiplier: 2)
config.define_policy(:database, max_attempts: 5, retryable_exceptions: [ActiveRecord::ConnectionError])
end
# Use named policies
result = ChronoMachines.retry(:database) do
User.find(user_id)
end
DSL Integration
class ApiClient
include ChronoMachines::DSL
# Define policies at class level
chrono_policy :standard_api, max_attempts: 5, base_delay: 0.1, multiplier: 2
chrono_policy :critical_api, max_attempts: 10, base_delay: 0.05, max_delay: 5
def fetch_user_data(id)
with_chrono_policy(:standard_api) do
api_request("/users/#{id}")
end
end
def emergency_shutdown
# Use inline options for one-off scenarios
with_chrono_policy(max_attempts: 1, base_delay: 0) do
shutdown_api_call
end
end
end
Advanced Temporal Mechanics
Callback Instrumentation
# Monitor retry patterns
policy_options = {
max_attempts: 5,
on_success: ->(result:, attempts:) {
Metrics.histogram('operation.attempts', attempts)
Rails.logger.info "Operation succeeded after #{attempts} attempts"
},
on_retry: ->(exception:, attempt:, next_delay:) {
Metrics.increment('operation.retry', tags: ["attempt:#{attempt}"])
Honeybadger.notify(exception, context: { attempt: attempt, next_delay: next_delay })
},
on_failure: ->(exception:, attempts:) {
Metrics.increment('operation.failure', tags: ["final_attempts:#{attempts}"])
PagerDuty.trigger("Operation failed after #{attempts} attempts: #{exception.message}")
}
}
ChronoMachines.retry(policy_options) do
critical_operation
end
Exception Handling
begin
ChronoMachines.retry(max_attempts: 3) do
risky_operation
end
rescue ChronoMachines::MaxRetriesExceededError => e
# Access the original exception and retry context
Rails.logger.error "Failed after #{e.attempts} attempts: #{e.original_exception.message}"
# The original exception is preserved
case e.original_exception
when Net::TimeoutError
handle_timeout_failure
when HTTPError
handle_http_failure
end
end
Fallback Mechanisms
# Execute fallback logic when retries are exhausted
ChronoMachines.retry(
max_attempts: 3,
on_failure: ->(exception:, attempts:) {
# Fallback doesn't throw - original exception is still raised
Rails.cache.write("fallback_data_#{user_id}", cached_response, expires_in: 5.minutes)
SlackNotifier.notify("API down, serving cached data for user #{user_id}")
}
) do
fetch_fresh_user_data
end
The Science of Temporal Jitter
ChronoMachines implements full jitter exponential backoff:
# Instead of predictable delays that create thundering herds:
# Attempt 1: 100ms
# Attempt 2: 200ms
# Attempt 3: 400ms
# ChronoMachines uses full jitter:
# Attempt 1: random(0, 100ms)
# Attempt 2: random(0, 200ms)
# Attempt 3: random(0, 400ms)
This prevents the "thundering herd" problem where multiple clients retry simultaneously, overwhelming recovering services.
Configuration Reference
Policy Options
Option | Default | Description |
---|---|---|
max_attempts |
3 |
Maximum number of retry attempts |
base_delay |
0.1 |
Initial delay in seconds |
multiplier |
2 |
Exponential backoff multiplier |
max_delay |
10 |
Maximum delay cap in seconds |
retryable_exceptions |
[StandardError] |
Array of exception classes to retry |
on_success |
nil |
Success callback: (result:, attempts:)
|
on_retry |
nil |
Retry callback: (exception:, attempt:, next_delay:)
|
on_failure |
nil |
Failure callback: (exception:, attempts:)
|
DSL Methods
Method | Scope | Description |
---|---|---|
chrono_policy(name, options) |
Class | Define a named retry policy |
with_chrono_policy(policy_or_options, &block) |
Instance | Execute block with retry policy |
Real-World Examples
Database Connection Resilience
class DatabaseService
include ChronoMachines::DSL
chrono_policy :db_connection,
max_attempts: 5,
base_delay: 0.1,
retryable_exceptions: [
ActiveRecord::ConnectionTimeoutError,
ActiveRecord::DisconnectedError,
PG::ConnectionBad
],
on_retry: ->(exception:, attempt:, next_delay:) {
Rails.logger.warn "DB retry #{attempt}: #{exception.class}"
}
def find_user(id)
with_chrono_policy(:db_connection) do
User.find(id)
end
end
end
HTTP API Integration
class WeatherService
include ChronoMachines::DSL
chrono_policy :weather_api,
max_attempts: 4,
base_delay: 0.2,
max_delay: 10,
retryable_exceptions: [Net::TimeoutError, Net::HTTPServerError],
on_failure: ->(exception:, attempts:) {
# Serve stale data when API is completely down
Rails.cache.write("weather_service_down", true, expires_in: 5.minutes)
}
def current_weather(location)
with_chrono_policy(:weather_api) do
response = HTTP.timeout(connect: 2, read: 5)
.get("https://api.weather.com/#{location}")
JSON.parse(response.body)
end
rescue ChronoMachines::MaxRetriesExceededError
# Return cached data if available
Rails.cache.fetch("weather_#{location}", expires_in: 1.hour) do
{ temperature: "Unknown", status: "Service Unavailable" }
end
end
end
Background Job Retry Logic
class EmailDeliveryJob
include ChronoMachines::DSL
chrono_policy :email_delivery,
max_attempts: 8,
base_delay: 1,
multiplier: 1.5,
max_delay: 300, # 5 minutes max
retryable_exceptions: [Net::SMTPServerBusy, Net::SMTPTemporaryError],
on_failure: ->(exception:, attempts:) {
# Move to dead letter queue after all retries
DeadLetterQueue.push(job_data, reason: exception.message)
}
def perform(email_data)
with_chrono_policy(:email_delivery) do
EmailService.deliver(email_data)
end
end
end
Testing Strategies
Mocking Time and Retries
require "minitest/autorun"
require "mocha/minitest"
class PaymentServiceTest < Minitest::Test
def setup
@service = PaymentService.new
end
def test_retries_payment_on_timeout
charge_response = { id: "ch_123", amount: 100 }
Stripe::Charge.expects(:create)
.raises(Net::TimeoutError).once
.then.returns(charge_response)
# Mock sleep to avoid test delays
ChronoMachines::Executor.any_instance.expects(:robust_sleep).at_least_once
result = @service.charge(100)
assert_equal charge_response, result
end
def test_respects_max_attempts
Stripe::Charge.expects(:create)
.raises(Net::TimeoutError).times(3)
assert_raises(ChronoMachines::MaxRetriesExceededError) do
@service.charge(100)
end
end
def test_preserves_original_exception
original_error = Net::TimeoutError.new("Connection timed out")
Stripe::Charge.expects(:create).raises(original_error).times(3)
begin
@service.charge(100)
flunk "Expected MaxRetriesExceededError to be raised"
rescue ChronoMachines::MaxRetriesExceededError => e
assert_equal 3, e.attempts
assert_equal original_error, e.original_exception
assert_equal "Connection timed out", e.original_exception.message
end
end
end
Testing Callbacks
class CallbackTest < Minitest::Test
def test_calls_retry_callback_with_correct_context
retry_calls = []
call_count = 0
result = ChronoMachines.retry(
max_attempts: 3,
base_delay: 0.001, # Short delay for tests
on_retry: ->(exception:, attempt:, next_delay:) {
retry_calls << {
attempt: attempt,
delay: next_delay,
exception_message: exception.message
}
}
) do
call_count += 1
raise "Fail" if call_count < 2
"Success"
end
assert_equal "Success", result
assert_equal 1, retry_calls.length
assert_equal 1, retry_calls.first[:attempt]
assert retry_calls.first[:delay] > 0
assert_equal "Fail", retry_calls.first[:exception_message]
end
def test_calls_success_callback
success_called = false
result_captured = nil
attempts_captured = nil
result = ChronoMachines.retry(
on_success: ->(result:, attempts:) {
success_called = true
result_captured = result
attempts_captured = attempts
}
) do
"Operation succeeded"
end
assert success_called
assert_equal "Operation succeeded", result_captured
assert_equal 1, attempts_captured
end
def test_calls_failure_callback
failure_called = false
exception_captured = nil
assert_raises(ChronoMachines::MaxRetriesExceededError) do
ChronoMachines.retry(
max_attempts: 2,
on_failure: ->(exception:, attempts:) {
failure_called = true
exception_captured = exception
}
) do
raise "Always fails"
end
end
assert failure_called
assert_equal "Always fails", exception_captured.message
end
end
Testing DSL Integration
class DSLTestExample < Minitest::Test
class TestService
include ChronoMachines::DSL
chrono_policy :test_policy, max_attempts: 2, base_delay: 0.001
def risky_operation
with_chrono_policy(:test_policy) do
# Simulated operation
yield if block_given?
end
end
end
def test_dsl_policy_definition
service = TestService.new
call_count = 0
result = service.risky_operation do
call_count += 1
raise "Fail" if call_count < 2
"Success"
end
assert_equal "Success", result
assert_equal 2, call_count
end
def test_dsl_with_inline_options
service = TestService.new
assert_raises(ChronoMachines::MaxRetriesExceededError) do
service.with_chrono_policy(max_attempts: 1) do
raise "Always fails"
end
end
end
end
TestHelper for Library Authors
ChronoMachines provides a test helper module for library authors who want to integrate ChronoMachines testing utilities into their own test suites.
Setup
require 'chrono_machines/test_helper'
class MyLibraryTest < Minitest::Test
include ChronoMachines::TestHelper
def setup
super # Important: calls ChronoMachines config reset
# Your setup code here
end
end
Features
Configuration Reset: The TestHelper automatically resets ChronoMachines configuration before each test, ensuring test isolation.
Custom Assertions: Provides specialized assertions for testing delay ranges:
def test_delay_calculation
executor = ChronoMachines::Executor.new(base_delay: 0.1, multiplier: 2)
delay = executor.send(:calculate_delay, 1)
# Assert delay is within expected jitter range
assert_cm_delay_range(delay, 0.0, 0.1, "First attempt delay out of range")
end
Available Assertions:
-
assert_cm_delay_range(delay, min, max, message = nil)
- Assert delay falls within expected range
Integration Example
# In your gem's test_helper.rb
require 'minitest/autorun'
require 'chrono_machines/test_helper'
class TestBase < Minitest::Test
include ChronoMachines::TestHelper
def setup
super
# Reset any additional state
end
end
# In your specific tests
class RetryServiceTest < TestBase
def test_retry_with_custom_policy
# ChronoMachines config is automatically reset
# You can safely define test-specific policies
ChronoMachines.configure do |config|
config.define_policy(:test_policy, max_attempts: 2)
end
result = ChronoMachines.retry(:test_policy) do
"success"
end
assert_equal "success", result
end
end
Why ChronoMachines?
Built for Modern Ruby
- Ruby 3.2+ Support: Fiber-aware sleep handling
- Clean Architecture: Separation of concerns with configurable policies
- Rich Instrumentation: Comprehensive callback system for monitoring
- Battle-Tested: Full jitter implementation prevents thundering herds
Time-Tested Patterns
- Exponential Backoff: Industry-standard retry timing
- Circuit Breaking: Fail-fast when upstream is down
- Fallback Support: Graceful degradation strategies
- Exception Preservation: Original errors aren't lost in retry logic
A Word from the Time Corps Engineering Division
The Temporal Commentary Engine activates:
"Time isn't linear—especially when your payment processor is having 'a moment.'
The universe doesn't care about your startup's burn rate or your post on X about 'building in public.' It cares about one immutable law:
Does your system handle failure gracefully across the fourth dimension?
If not, welcome to the Time Corps. We have exponential backoff.
Remember: The goal isn't to prevent temporal anomalies—it's to fail fast, fail smart, and retry with mathematical precision.
As I always say when debugging production: 'Time heals all wounds, but jitter prevents thundering herds.'"
— Temporal Commentary Engine, Log Entry ∞
Contributing to the Timeline
- Fork it (like it's 2005, but with better temporal mechanics)
- Create your feature branch (
git checkout -b feature/quantum-retries
) - Commit your changes (
git commit -am 'Add temporal stabilization'
) - Push to the branch (
git push origin feature/quantum-retries
) - Create a new Pull Request (and wait for the Time Lords to review)
License
MIT License. See LICENSE file for details.
Acknowledgments
- The Ruby community - For building a language worth retrying for
- Every timeout that ever taught us patience - You made us stronger
- The Time Corps - For maintaining temporal stability
- The universe - For being deterministically random
Author
Built with time and coffee by temporal engineers fighting entropy one retry at a time.
Remember: In the fabric of spacetime, nobody can hear your API timeout. But they can feel your exponential backoff working as intended.