ElevenlabsClient

A comprehensive Ruby client library for the ElevenLabs API, supporting voice synthesis, dubbing, dialogue generation, sound effects, AI music composition, voice transformation, speech transcription, audio isolation, and advanced audio processing features.

See Architecture Documentation for details.

Features

🎙️ Core Audio Features

Text-to-Speech - Convert text to natural-sounding speech with timestamps
Speech-to-Speech - Transform audio from one voice to another (Voice Changer)
Speech-to-Text - Transcribe audio and video files with advanced features
Text-to-Dialogue - Multi-speaker conversations and dialogue generation
Voice Design - Create custom voices from text descriptions
Voice Management - Create, edit, and manage individual voices
Audio Isolation - Remove background noise from audio files
Forced Alignment - Get precise timing information for audio transcripts

🎬 Content Creation

Dubbing - Create dubbed versions of audio/video content
Sound Generation - AI-generated sound effects and ambient audio
Music Generation - AI-powered music composition and streaming
Audio Native - Create embeddable audio players for websites

🤖 Agents Platform (Conversational AI)

Agents - Create and manage AI conversational agents
Conversations - Handle real-time conversations and chat interactions
Knowledge Base - Upload and manage documents for agent knowledge
Tools - Define and manage tools that agents can use
Tests - Create and run tests for agent performance
Outbound Calling - Make automated phone calls with agents
Batch Calling - Execute large-scale calling campaigns
Phone Numbers - Manage phone numbers for voice agents
Widgets - Create embeddable chat widgets for websites
LLM Usage - Monitor and analyze language model usage
MCP Servers - Manage Model Context Protocol servers

📊 Admin & Management APIs

History - Manage and analyze your generated audio history
Usage - Monitor character usage and analytics
User - Access account information and subscription details
Voice Library - Browse and manage community shared voices
Models - List available models and their capabilities
Samples - Delete voice samples for content moderation
Service Accounts - Monitor service accounts and API keys
Webhooks - Monitor workspace webhooks and their health
Workspace Management - Manage workspace groups, invites, members, and resources
Pronunciation Dictionaries - Custom pronunciation rules

🔧 Technical Features

WebSocket Streaming - Real-time audio streaming with low latency
Multiple Output Formats - Support for various audio formats
Flexible Configuration - Environment-based and programmatic configuration
Comprehensive Error Handling - Detailed error messages and status codes
Well-tested - Extensive test coverage with integration tests

Installation

Add this line to your application's Gemfile:

gem 'elevenlabs_client'

And then execute:

$ bundle install

Or install it yourself as:

$ gem install elevenlabs_client

Quick Start

Configuration

Rails Applications (Recommended)

Create config/initializers/elevenlabs_client.rb:

ElevenlabsClient::Settings.configure do |config|
  config.properties = {
    elevenlabs_base_uri: ENV["ELEVENLABS_BASE_URL"],
    elevenlabs_api_key: ENV["ELEVENLABS_API_KEY"]
  }
end

Set your environment variables:

export ELEVENLABS_API_KEY="your_api_key_here"
export ELEVENLABS_BASE_URL="https://api.elevenlabs.io"  # Optional, defaults to official API

Direct Configuration

# Global configuration (recommended)
ElevenlabsClient.configure do |config|
  config.api_key = "your_api_key_here"
  config.base_url = "https://api.elevenlabs.io"
  config.timeout = 30
  config.retry_count = 3
end

# Use globally configured client
client = ElevenlabsClient.client

# Or pass directly to client instance
client = ElevenlabsClient.new(
  api_key: "your_api_key_here",
  base_url: "https://api.elevenlabs.io",
  timeout: 60
)

# Legacy Settings support (still works)
ElevenlabsClient.configure do |config|
  config.properties = {
    elevenlabs_base_uri: "https://api.elevenlabs.io",
    elevenlabs_api_key: "your_api_key_here"
  }
end

Basic Usage

# Initialize client (uses configured settings)
client = ElevenlabsClient.new

# Text-to-Speech
audio_data = client.text_to_speech.convert("21m00Tcm4TlvDq8ikWAM", "Hello, world!")
File.open("hello.mp3", "wb") { |f| f.write(audio_data) }

# Dubbing
File.open("video.mp4", "rb") do |file|
  result = client.dubs.create(
    file_io: file,
    filename: "video.mp4",
    target_languages: ["es", "fr", "de"]
  )
end

# Dialogue Generation
dialogue = [
  { text: "Hello, how are you?", voice_id: "voice_1" },
  { text: "I'm doing great, thanks!", voice_id: "voice_2" }
]
audio_data = client.text_to_dialogue.convert(dialogue)

# Sound Generation
audio_data = client.sound_generation.generate("Ocean waves crashing on rocks")

# Voice Design
design_result = client.text_to_voice.design("Warm, professional female voice")
generated_voice_id = design_result["previews"].first["generated_voice_id"]

# Stream the voice preview
client.text_to_voice.stream_preview(generated_voice_id) do |chunk|
  puts "Received preview chunk: #{chunk.bytesize} bytes"
end

voice_result = client.text_to_voice.create(
  "Professional Voice",
  "Warm, professional female voice",
  generated_voice_id
)

# List Available Models
models = client.models.list
fastest_model = models["models"].min_by { |m| m["token_cost_factor"] }
puts "Fastest model: #{fastest_model['name']}"

# Voice Management
voices = client.voices.list
puts "Total voices: #{voices['voices'].length}"

# Create custom voice from audio samples
File.open("sample1.mp3", "rb") do |sample|
  voice = client.voices.create("My Voice", [sample], description: "Custom narrator voice")
  puts "Created voice: #{voice['voice_id']}"
end

# Admin APIs - Account Management
user_info = client.user.get_user
puts "Account: #{user_info['subscription']['tier']} (#{user_info['subscription']['status']})"
puts "Usage: #{user_info['subscription']['character_count']} / #{user_info['subscription']['character_limit']}"

# Usage Analytics
usage_stats = client.usage.get_character_stats(
  start_unix: (Time.now - 7.days).to_i * 1000,
  end_unix: Time.now.to_i * 1000,
  breakdown_type: "voice"
)
puts "7-day usage: #{usage_stats['usage']['All'].sum} characters"

# History Management
history = client.history.list(page_size: 10)
puts "Recent history: #{history['history'].length} items"

# Voice Library
voices = client.voice_library.get_shared_voices(category: "professional", page_size: 5)
puts "Professional voices available: #{voices['voices'].length}"

# Admin Samples Management
client.samples.delete_sample(voice_id: "voice_id", sample_id: "sample_id")
puts "Sample deleted successfully"

# Service Accounts Monitoring
accounts = client.service_accounts.get_service_accounts
puts "Service accounts: #{accounts['service-accounts'].length}"

# Webhooks Management
webhooks = client.webhooks.list_webhooks(include_usages: true)
puts "Active webhooks: #{webhooks['webhooks'].length}"

# Music Generation
music_data = client.music.compose(
  prompt: "Upbeat electronic dance track with synthesizers",
  music_length_ms: 30000
)
File.open("generated_music.mp3", "wb") { |f| f.write(music_data) }

# Speech-to-Speech (Voice Changer)
File.open("input_audio.mp3", "rb") do |audio_file|
  converted_audio = client.speech_to_speech.convert(
    "target_voice_id", 
    audio_file, 
    "input_audio.mp3",
    remove_background_noise: true
  )
  File.open("converted_audio.mp3", "wb") { |f| f.write(converted_audio) }
end

# Speech-to-Text Transcription
File.open("audio.mp3", "rb") do |audio_file|
  transcription = client.speech_to_text.create(
    "scribe_v1",
    file: audio_file,
    filename: "audio.mp3",
    diarize: true,
    timestamps_granularity: "word"
  )
  puts "Transcribed: #{transcription['text']}"
  
  # Get the transcript later
  transcript = client.speech_to_text.get_transcript(transcription['transcription_id'])
  
  # Delete when no longer needed
  client.speech_to_text.delete_transcript(transcription['transcription_id'])
end

# Audio Isolation (Background Noise Removal)
File.open("noisy_audio.mp3", "rb") do |audio_file|
  clean_audio = client.audio_isolation.isolate(audio_file, "noisy_audio.mp3")
  File.open("clean_audio.mp3", "wb") { |f| f.write(clean_audio) }
end

# Audio Native (Embeddable Player)
File.open("article.html", "rb") do |html_file|
  project = client.audio_native.create(
    "My Article",
    file: html_file,
    filename: "article.html",
    voice_id: "voice_id",
    auto_convert: true
  )
  puts "Player HTML: #{project['html_snippet']}"
end

# Forced Alignment
File.open("speech.wav", "rb") do |audio_file|
  alignment = client.forced_alignment.create(
    audio_file,
    "speech.wav",
    "Hello world, this is a test transcript"
  )
  
  alignment['words'].each do |word|
    puts "#{word['text']}: #{word['start']}s - #{word['end']}s"
  end
end

# Streaming Text-to-Speech
client.text_to_speech_stream.stream("voice_id", "Streaming text") do |chunk|
  # Process audio chunk in real-time
  puts "Received #{chunk.bytesize} bytes"
end

API Documentation

Core APIs

🎙️ Core Audio APIs

Text-to-Speech API - Convert text to natural speech
Text-to-Speech Streaming API - Real-time audio streaming
Text-to-Speech with Timestamps - Speech synthesis with precise timing
Speech-to-Speech API - Transform audio from one voice to another
Speech-to-Text API - Transcribe audio and video files
Text-to-Dialogue API - Multi-speaker conversations
Text-to-Dialogue Streaming - Real-time dialogue generation
Voice Design API - Design and create custom voices from text descriptions
Voice Management API - Manage individual voices (CRUD operations)
Audio Isolation API - Remove background noise from audio
Forced Alignment API - Get precise timing information for transcripts

🎬 Content Creation APIs

Dubbing API - Create dubbed versions of audio/video content
Sound Generation API - AI-generated sound effects and ambient audio
Music Generation API - AI-powered music composition and streaming
Audio Native API - Create embeddable audio players for websites

🤖 Agents Platform APIs (Conversational AI)

Agents Platform Overview - Complete conversational AI platform
Agents API - Create and manage AI conversational agents
Conversations API - Handle real-time conversations and chat interactions
Knowledge Base API - Upload and manage documents for agent knowledge
Tools API - Define and manage tools that agents can use
Tests API - Create and run tests for agent performance
Test Invocations API - Execute and monitor test runs
Outbound Calling API - Make automated phone calls with agents
Batch Calling API - Execute large-scale calling campaigns
Phone Numbers API - Manage phone numbers for voice agents
Widgets API - Create embeddable chat widgets for websites
LLM Usage API - Monitor and analyze language model usage
MCP Servers API - Manage Model Context Protocol servers
Workspace API - Manage agent platform workspace settings

📊 Admin & Management APIs

Admin APIs Overview - Complete administrative functionality
User Management - Account information and subscription details
Usage Analytics - Character usage monitoring and analytics
History Management - Generated audio history management
Voice Library - Community voice browsing and management
Models API - List available models and capabilities
Samples Management - Delete voice samples for content moderation
Service Accounts - Monitor and manage service accounts
Service Account API Keys - Manage API keys for service accounts
Webhooks Management - Monitor workspace webhooks and their health
Workspace Webhooks - Configure and manage workspace-level webhooks
Workspace Groups - Manage user groups and members
Workspace Invites - Invite users and revoke invitations
Workspace Members - Update member attributes and roles
Workspace Resources - Share/unshare resources across the workspace
Pronunciation Dictionaries - Create, manage and download pronunciation dictionaries

🔧 Advanced Features

WebSocket Streaming - Real-time audio streaming with WebSockets

Available Endpoints

Endpoint	Description	Documentation
`client.dubs.*`	Audio/video dubbing	DUBBING.md
`client.text_to_speech.*`	Text-to-speech conversion	TEXT_TO_SPEECH.md
`client.text_to_speech_stream.*`	Streaming TTS	TEXT_TO_SPEECH_STREAMING.md
`client.text_to_dialogue.*`	Dialogue generation	TEXT_TO_DIALOGUE.md
`client.sound_generation.*`	Sound effect generation	SOUND_GENERATION.md
`client.music.*`	AI music composition and streaming	MUSIC.md
`client.text_to_voice.*`	Voice design and creation	TEXT_TO_VOICE.md
`client.voices.*`	Voice management (CRUD)	VOICES.md
`client.speech_to_speech.*`	Voice changer and audio transformation	SPEECH_TO_SPEECH.md
`client.speech_to_text.*`	Audio/video transcription	SPEECH_TO_TEXT.md
`client.audio_isolation.*`	Background noise removal	AUDIO_ISOLATION.md
`client.audio_native.*`	Embeddable audio players	AUDIO_NATIVE.md
`client.forced_alignment.*`	Audio-text timing alignment	FORCED_ALIGNMENT.md
`client.user.*`	User account and subscription information	USER.md
`client.usage.*`	Character usage analytics and monitoring	USAGE.md
`client.history.*`	Generated audio history management	HISTORY.md
`client.voice_library.*`	Community voice browsing and management	VOICE_LIBRARY.md
`client.models.*`	Model information and capabilities	MODELS.md
`client.workspace_groups.*`	Workspace user groups management	WORKSPACE_GROUPS.md
`client.workspace_invites.*`	Workspace invites management	WORKSPACE_INVITES.md
`client.workspace_members.*`	Workspace member management	WORKSPACE_MEMBERS.md
`client.workspace_resources.*`	Workspace resource sharing	WORKSPACE_RESOURCES.md
`client.pronunciation_dictionaries.*`	Manage pronunciation dictionaries	PRONUNCIATION_DICTIONARIES.md
`client.samples.*`	Voice sample deletion and content moderation	SAMPLES.md
`client.service_accounts.*`	Service account monitoring and management	SERVICE_ACCOUNTS.md
`client.webhooks.*`	Workspace webhook monitoring and health analysis	WEBHOOKS.md

Configuration Options

Configuration Precedence

Explicit parameters (highest priority)
Settings.properties (configured via initializer)
Environment variables (lowest priority)

Environment Variables

ELEVENLABS_API_KEY - Your ElevenLabs API key (required)
ELEVENLABS_BASE_URL - API base URL (optional, defaults to https://api.elevenlabs.io)

Custom Environment Variable Names

client = ElevenlabsClient.new(
  api_key_env: "CUSTOM_API_KEY_VAR",
  base_url_env: "CUSTOM_BASE_URL_VAR"
)

Error Handling

The client provides specific exception types for different error conditions:

begin
  result = client.text_to_speech.convert(voice_id, text)
rescue ElevenlabsClient::AuthenticationError
  puts "Invalid API key"
rescue ElevenlabsClient::RateLimitError
  puts "Rate limit exceeded"
rescue ElevenlabsClient::ValidationError => e
  puts "Invalid parameters: #{e.message}"
rescue ElevenlabsClient::APIError => e
  puts "API error: #{e.message}"
end

Exception Types

AuthenticationError - Invalid API key or authentication failure
RateLimitError - Rate limit exceeded
ValidationError - Invalid request parameters
NotFoundError - Resource not found (e.g., voice ID, transcript ID)
BadRequestError - Bad request with invalid parameters
UnprocessableEntityError - Request cannot be processed (e.g., invalid file format)
APIError - General API errors

Rails Integration

The gem is designed to work seamlessly with Rails applications. See the examples directory for complete controller implementations and the Rails initializer example for configuration setup:

Core Controllers
- DubsController - Complete dubbing workflow
- TextToSpeechController - TTS with error handling
- StreamingAudioController - Real-time streaming
- TextToDialogueController - Dialogue generation
- SoundGenerationController - Sound effects
- MusicController - AI music composition and streaming
- TextToVoiceController - Voice design and creation
- VoicesController - Voice management (CRUD operations)
- SpeechToSpeechController - Voice changer and audio transformation
- SpeechToTextController - Audio/video transcription with advanced features
- AudioIsolationController - Background noise removal and audio cleanup
- AudioNativeController - Embeddable audio players for websites
- ForcedAlignmentController - Audio-text timing alignment and subtitle generation
Admin Controllers - Complete administrative functionality:
- Admin::HistoryController - Generated audio history management and analytics
- Admin::UsageController - Character usage monitoring and analytics
- Admin::UserController - User account and subscription management
- Admin::VoiceLibraryController - Community voice browsing and management
- Admin::ModelsController - Model information and selection guidance
- Admin::SamplesController - Voice sample deletion and content moderation
- Admin::ServiceAccountsController - Service account monitoring and analytics
- Admin::ServiceAccountApiKeysController - API key management for service accounts
- Admin::WebhooksController - Workspace webhook monitoring and health analysis
- Admin::WorkspaceWebhooksController - Workspace-level webhook configuration
- Admin::WorkspaceGroupsController - User group management and permissions
- Admin::WorkspaceInvitesController - Workspace invitation management
- Admin::WorkspaceMembersController - Workspace member management and roles
- Admin::WorkspaceResourcesController - Resource sharing and permissions
- Admin::PronunciationDictionariesController - Custom pronunciation management
Agents Platform Controllers - Conversational AI functionality:
- AgentsPlatform::AgentsController - AI agent creation and management
- AgentsPlatform::ConversationsController - Real-time conversation handling
- AgentsPlatform::KnowledgeBaseController - Document and knowledge management
- AgentsPlatform::ToolsController - Agent tool management and configuration
- AgentsPlatform::TestsController - Agent testing and validation
- AgentsPlatform::TestInvocationsController - Test execution and monitoring
- AgentsPlatform::OutboundCallingController - Automated phone call management
- AgentsPlatform::BatchCallingController - Large-scale calling campaigns
- AgentsPlatform::PhoneNumbersController - Phone number management for agents
- AgentsPlatform::WidgetsController - Embeddable chat widget management
- AgentsPlatform::LlmUsageController - Language model usage analytics
- AgentsPlatform::McpServersController - Model Context Protocol server management
- AgentsPlatform::WorkspaceController - Agent platform workspace settings

Development

After checking out the repo, run:

bin/setup          # Install dependencies
bundle exec rspec  # Run tests

Available Rake Tasks

# Testing
rake spec                    # Run all tests (default)
rake test:unit              # Run unit tests only
rake test:integration       # Run integration tests only

# Security
rake dev:security           # Run security checks
rake dev:audit              # Run bundler-audit

# Development
rake dev:test               # Run all tests
rake dev:coverage           # Run tests with coverage
rake release:prepare        # Run full CI suite locally

Continuous Integration

This gem uses GitHub Actions for CI/CD with the following checks:

Tests: Runs on Ruby 3.0, 3.1, 3.2, and 3.3
Security: bundler-audit for dependency vulnerability scanning
Build: Verifies gem can be built and installed

All checks must pass before merging pull requests.

To install this gem onto your local machine:

bundle exec rake install

To release a new version:

Update the version number in version.rb
Update CHANGELOG.md
Run bundle exec rake release:prepare to verify tests and security checks pass
Run bundle exec rake release

Testing

The gem includes comprehensive test coverage with RSpec:

# Run all tests
bundle exec rspec

# Run specific test files
bundle exec rspec spec/elevenlabs_client/endpoints/
bundle exec rspec spec/elevenlabs_client/client
bundle exec rspec spec/integration/

# Run with documentation format
bundle exec rspec --format documentation

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/yourusername/elevenlabs_client.

Fork it
Create your feature branch (git checkout -b my-new-feature)
Commit your changes (git commit -am 'Add some feature')
Push to the branch (git push origin my-new-feature)
Create a new Pull Request

License

The gem is available as open source under the terms of the MIT License.

Changelog

See CHANGELOG.md for a detailed list of changes and version history.

Support

📖 Documentation: API Documentation
🐛 Issues: GitHub Issues
💬 Discussions: GitHub Discussions

Made with ❤️ for the Ruby community