Project

elevenlabs

0.0
The project is in a healthy, maintained state
This gem provides a convenient Ruby interface to the ElevenLabs TTS, Voice Cloning, Voice Design, Voice dialogues, TTS Streaming, Music Generation and Streaming endpoints.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
 Dependencies

Runtime

 Project Readme

Elevenlabs Ruby Gem

Gem Version License: MIT

A Ruby client for the ElevenLabs Text-to-Speech API.
This gem provides an easy-to-use interface for:

  • Listing available voices
  • Fetching details about a voice
  • Creating a custom voice (with uploaded sample files)
  • Editing an existing voice
  • Deleting a voice
  • Converting text to speech and retrieving the generated audio
  • Designing a voice based on a text description
  • Streaming text-to-speech audio
  • Music Generation
  • Sound Effect Generation

All requests are handled via Faraday.


Table of Contents

  • Features
  • Installation
  • Usage
    • Basic Example
    • Rails Integration
      • Store API Key in Rails Credentials
      • Rails Initializer
      • Controller Example
  • Endpoints
  • Error Handling
  • Development
  • Contributing
  • License

Features

  • Simple and intuitive API client for ElevenLabs.
  • Multipart file uploads for training custom voices.
  • Voice design via text prompts to generate voice previews.
  • Automatic authentication via API key configuration.
  • Error handling with custom exceptions.
  • Rails integration support (including credentials storage).

Installation

Add the gem to your Gemfile:

gem "elevenlabs"

Then run:

bundle install

Or install it directly using:

gem install elevenlabs

Usage

Basic Example (Standalone Ruby)

require "elevenlabs"

# 1. Configure the gem globally (Optional)
Elevenlabs.configure do |config|
  config.api_key = "YOUR_API_KEY"
end

# 2. Initialize a client (will use configured API key)
client = Elevenlabs::Client.new

# 3. List available voices
voices = client.list_voices
puts voices # JSON response with voices

# 4. Convert text to speech
voice_id = "YOUR_VOICE_ID"
text = "Hello from Elevenlabs!"
audio_data = client.text_to_speech(voice_id, text)

# 5. Save the audio file
File.open("output.mp3", "wb") { |f| f.write(audio_data) }
puts "Audio file saved to output.mp3"

# 6. Design a voice with a text prompt
response = client.design_voice(
  "A deep, resonant male voice with a British accent, suitable for storytelling",
  output_format: "mp3_44100_192",
  model_id: "eleven_multilingual_ttv_v2",
  text: "In a land far away, where the mountains meet the sky, a great adventure began. Brave heroes embarked on a quest to find the lost artifact, facing challenges and forging bonds that would last a lifetime. Their journey took them through enchanted forests, across raging rivers, and into the heart of ancient ruins.",
  auto_generate_text: false,
  loudness: 0.5,
  seed: 12345,
  guidance_scale: 5.0,
  stream_previews: false
)

# 7. Save voice preview audio
require "base64"
response["previews"].each_with_index do |preview, index|
  audio_data = Base64.decode64(preview["audio_base_64"])
  File.open("preview_#{index}.mp3", "wb") { |f| f.write(audio_data) }
  puts "Saved preview #{index + 1} to preview_#{index}.mp3"
end

Note: You can override the API key per request:

client = Elevenlabs::Client.new(api_key: "DIFFERENT_API_KEY")

Rails Integration

Store API Key in Rails Credentials

  1. Open your encrypted credentials:
EDITOR=vim rails credentials:edit
  1. Add the ElevenLabs API key:
eleven_labs:
  api_key: YOUR_SECURE_KEY
  1. Save and exit. Rails will securely encrypt your API key.

Rails Initializer

Create an initializer file: config/initializers/elevenlabs.rb

# config/initializers/elevenlabs.rb
require "elevenlabs"

Rails.application.config.to_prepare do
  Elevenlabs.configure do |config|
    config.api_key = Rails.application.credentials.dig(:eleven_labs, :api_key)
  end
end

Now you can simply call:

client = Elevenlabs::Client.new

without manually providing an API key.

Controller Example

class AudioController < ApplicationController
  def generate
    client = Elevenlabs::Client.new
    voice_id = params[:voice_id]
    text = params[:text]

    begin
      audio_data = client.text_to_speech(voice_id, text)
      send_data audio_data, type: "audio/mpeg", disposition: "attachment", filename: "output.mp3"
    rescue Elevenlabs::APIError => e
      render json: { error: e.message }, status: :bad_request
    end
  end
end

Endpoints

  1. List Voices
client.list_voices
# => { "voices" => [...] }

2. List Models

client.list_models
# => [...]

3. **Get Voice Details**

```ruby
client.get_voice("VOICE_ID")
# => { "voice_id" => "...", "name" => "...", ... }
  1. Create a Custom Voice
sample_files = [File.open("sample1.mp3", "rb")]
client.create_voice("Custom Voice", sample_files, description: "My custom AI voice")
# => JSON response with new voice details
  1. Check if a Voice is Banned
sample_files = [File.open("trump.mp3", "rb")]
client.create_voice("Donald Trump", sample_files, description: "My Trump Voice")
# => {"voice_id"=>"<RETURNED_VOICE_ID>", "requires_verification"=>false}
trump = "<RETURNED_VOICE_ID>"
client.banned?(trump)
# => true
  1. Edit a Voice
client.edit_voice("VOICE_ID", name: "Updated Voice Name")
# => JSON response with updated details
  1. Delete a Voice
client.delete_voice("VOICE_ID")
# => JSON response acknowledging deletion
  1. Convert Text to Speech
audio_data = client.text_to_speech("VOICE_ID", "Hello world!")
File.open("output.mp3", "wb") { |f| f.write(audio_data) }
  1. Stream Text to Speech

Stream from terminal:

# Mac: Install sox
brew install sox
# Linux: Install sox
sudo apt install sox
IO.popen("play -t mp3 -", "wb") do |audio_pipe| # Notice "wb" (write binary)
  client.text_to_speech_stream("VOICE_ID", "Some text to stream back in chunks") do |chunk|
    audio_pipe.write(chunk.b) # Ensure chunk is written as binary
  end
end
  1. Create a Voice from a Design

Once you’ve generated a voice design using client.design_voice, you can turn it into a permanent voice in your account by passing its generated_voice_id to client.create_from_generated_voice.

Step 1: Design a voice (returns previews + generated_voice_id)

design_response = client.design_voice(
  "A warm, friendly female voice with a slight Australian accent",
  model_id: "eleven_multilingual_ttv_v2",
  text: "Welcome to our podcast, where every story is an adventure, taking you on a journey through fascinating worlds, inspiring voices, and unforgettable moments.",
  auto_generate_text: false
)

generated_voice_id = design_response["previews"].first["generated_voice_id"] #three previews are given, but for this example we will use the first to create a voice here

# Step 2: Create the permanent voice
create_response = client.create_from_generated_voice(
  "Friendly Aussie",
  "A warm, friendly Australian-accented voice for podcasts",
   generated_voice_id,
)

voice_id = create_response["voice_id"] # This is the ID you can use for TTS

# Step 3: Use the new voice for TTS
audio_data = client.text_to_speech(voice_id, "This is my new permanent designed voice.")
File.open("friendly_aussie.mp3", "wb") { |f| f.write(audio_data) }

Important notes:

Always store the returned voice_id from create_voice_from_design. This is the permanent identifier for TTS.

Designed voices cannot be used for TTS until they are created in your account.

If the voice is not immediately available for TTS, wait a few seconds or check its status via client.get_voice(voice_id) until it’s "active".

  1. Create a multi-speaker dialogue
inputs = [{text: "It smells like updog in here", voice_id: "TX3LPaxmHKxFdv7VOQHJ"}, {text: "What's updog?", voice_id: "RILOU7YmBhvwJGDGjNmP"}, {text: "Not much, you?", voice_id: "TX3LPaxmHKxFdv7VOQHJ"}]

audio_data = client.text_to_dialogue(inputs)
File.open("what's updog.mp3", "wb") { |f| f.write(audio_data) }
  1. Generate Music from prompt
audio = client.compose_music(prompt: "Lo-fi hip hop beat", music_length_ms: 30000)
File.binwrite("lofi.mp3", audio)
  1. Stream Music Generated from prompt
File.open("epic_stream.mp3", "wb") do |f|
  client.compose_music_stream(prompt: "Epic orchestral build", music_length_ms: 60000) do |chunk|
    f.write(chunk)
  end
end
  1. Generate Music with Detailed Metadata (metadata + audio) from prompt
result = client.compose_music_detailed(prompt: "Jazz piano trio", music_length_ms: 20000)
puts result # raw multipart data (needs parsing)
  1. Create a music composition plan from prompt
plan = client.create_music_plan(prompt: "Upbeat pop song with verse and chorus", music_length_ms: 60000)
puts plan[:sections]
  1. Create sound effects from a prompt

Basic Usage: Simple Prompt Generate a sound effect with only a text prompt, using default settings (output_format: "mp3_44100_128", duration_seconds: nil (auto-detected), prompt_influence: 0.3).

audio_data = client.sound_generation("Futuristic laser blast in a space battle")

# Save the audio to a file
File.open("laser_blast.mp3", "wb") { |f| f.write(audio_data) }

Advanced Usage: Custom Duration, Influence, and Format Specify duration_seconds, prompt_influence, and output_format for precise control over the sound effect.

Generate a roaring dragon sound with specific settings

audio_data = client.sound_generation(
  "Roaring dragon in a fantasy cave",
  duration_seconds: 3.0,
  prompt_influence: 0.7, # Higher influence for closer adherence to the prompt
  output_format: "mp3_22050_32"
)

# Save the audio to a file
File.open("dragon_roar.mp3", "wb") { |f| f.write(audio_data) }

Looping Sound Effect Create a looping sound effect for continuous playback, such as background ambiance in a video game.

Generate a looping ambient sound for a haunted forest

audio_data = client.sound_generation(
  "Eerie wind and distant owl hoots in a haunted forest",
  loop: true,
  duration_seconds: 10.0,
  prompt_influence: 0.5,
  output_format: "mp3_22050_32"
)
# Save the audio to a file
File.open("haunted_forest_loop.mp3", "wb") { |f| f.write(audio_data) }

For more details, see the ElevenLabs Sound Generation API documentation.


Error Handling

When the API returns an error, the gem raises specific exceptions:

Exception Meaning
Elevenlabs::BadRequestError Invalid request parameters
Elevenlabs::AuthenticationError Invalid API key
Elevenlabs::NotFoundError Resource (voice) not found
Elevenlabs::UnprocessableEntityError Unprocessable entity (e.g., invalid input format)
Elevenlabs::APIError General API failure

Example:

begin
  client.design_voice("Short description") # Too short, will raise error
rescue Elevenlabs::UnprocessableEntityError => e
  puts "Validation error: #{e.message}"
rescue Elevenlabs::AuthenticationError => e
  puts "Invalid API key: #{e.message}"
rescue Elevenlabs::NotFoundError => e
  puts "Voice not found: #{e.message}"
rescue Elevenlabs::APIError => e
  puts "General error: #{e.message}"
end

Development

Clone this repository:

git clone https://github.com/your-username/elevenlabs.git
cd elevenlabs

Install dependencies:

bundle install

Build the gem:

gem build elevenlabs.gemspec

Install the gem locally:

gem install ./elevenlabs-0.0.8.gem

Contributing

Contributions are welcome! Please follow these steps:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/my-new-feature)
  3. Commit your changes (git commit -am 'Add new feature')
  4. Push to your branch (git push origin feature/my-new-feature)
  5. Create a Pull Request describing your changes

For bug reports, please open an issue with details.


License

This project is licensed under the MIT License. See the LICENSE file for details.

⭐ Thank you for using the Elevenlabs Ruby Gem!
If you have any questions or suggestions, feel free to open an issue or submit a Pull Request!