Project

voice_id

0.0
No commit activity in last 3 years
No release in over 3 years
Exposes easy-to-use methods to work with Microsoft's Speach Recognition API and identify people by their voice. Useful for identifying your users by speech.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

~> 0.4
~> 3.4

Runtime

~> 2.0
 Project Readme

Gem Version CircleCI

VoiceId

Wrapper around Microsoft Cognitive Services - Speaker Recognition API

Installation

sign up and pick up a new api key (speaker recognition API key) https://www.microsoft.com/cognitive-services

$ gem install voice_id

Examples

  #create a new profile
  identification = VoiceId::Identification.new("MS_speaker_recognition_api_key")
  profile        = identification.create_profile
  # => { "identificationProfileId" => "49a46324-fc4b-4387-aa06-090cfbf0214f" }

  # create a new enrollment for that profile
  profile_id    = profile["identificationProfileId"]
  path_to_audio = '/path/to/some/audio_file.wav'
  short_audio   = true
  operation_url = identification.create_enrollment(profile_id , short_audio, path_to_audio)
  # => "https://api.projectoxford.ai/spid/v1.0/operations/EF217D0C-9085-45D7-AAE0-2B36471B89B5"

  # check the status of operation
  operation_id = identification.get_operation_id(operation_url)
  # => "EF217D0C-9085-45D7-AAE0-2B36471B89B5"

  identification.get_operation_status(operation_id)
  # notice below that we only had 13.6 seconds of useable audio so we need to
  # submit more enrollments for this profile until we achieve at min 30 seconds
  # => 
  #   {
  #    "status" => "succeeded", 
  #    "createdDateTime" => "2016-09-23T01:34:44.226642Z",
  #    "lastActionDateTime" => "2016-09-23T01:34:44.4795299Z",
  #    "processingResult" => {
  #      "enrollmentStatus" => "Enrolling", 
  #      "remainingEnrollmentSpeechTime" => 16.4, 
  #      "speechTime" => 13.6, 
  #      "enrollmentSpeechTime"=>13.6
  #    }
  #   }

  # identify a speaker
  profile_ids                  = ["49a46324-fc4b-4387-aa06-090cfbf0214f", "49a36324-fc4b-4387-aa06-091cfbf0216b", ...]
  path_to_test_audio           = '/path/to/some/audio_file.wav'
  short_audio                  = true
  identification_operation_url = identification.identify_speaker(profile_ids, short_audio, path_to_test_audio)
  # => "https://api.projectoxford.ai/spid/v1.0/operations/EF217D0C-9085-45D7-AAE0-2B36471B89B6"
  identification_operation_id = identification.get_operation_id(identification_operation_url)
  # => "EF217D0C-9085-45D7-AAE0-2B36471B89B6"
  identification.get_operation_status(identification_operation_id)
  # => 
  # {
  #  "status" => "succeeded", 
  #  "createdDateTime" => "2016-09-23T02:01:54.6498703Z",
  #  "lastActionDateTime" => "2016-09-23T02:01:56.054633Z",
  #  "processingResult" => {
  #    "identifiedProfileId" => "49a46324-fc4b-4387-aa06-090cfbf0214f", 
  #    "confidence"=>"High"
  #   }
  # }

APIs

Provides methods for two APIs (Identification and Verification) All audio samples provided to the API must be the following format:

Container WAV
Encoding  PCM
Rate  16K
Sample Format 16 bit
Channels  Mono

Identification API

Identify a person from a list of people - this is a text-independant api. Prior to being able to identify a speaker, a speaker (profile) must send a minimum of 30 seconds of recognizable audio.

identification = VoiceId::Identification.new("MS_speaker_recognition_api_key")

create_profile

Each person needs a unique profile, this creates a new one.

  profile = identification.create_profile
  # => { "identificationProfileId" => "49a36324-fc4b-4387-aa06-090cfbf0064f" }

create_enrollment(profile_id, short_audio, audio_file_path)

An enrollment is how audio samples are associated with a profile (training the service). For the Identification API a minimum of 30 seconds of recognizable speach is required. This can be done through multiple enrollments. This creates a new enrollment for a profile.

  profile_id    = "1234567890"
  path_to_audio = '/path/to/some/audio_file.wav'
  short_audio   = true # true - set minimum duration to 1 sec (5 sec by default per enrollment)
  identification.create_enrollment(profile_id, short_audio, path_to_audio)
  # returns an operation url that you can use to check the status of the enrollment
  # => "https://api.projectoxford.ai/spid/v1.0/operations/EF217D0C-9085-45D7-AAE0-2B36471B89B5"

get_operation_id(operation_status_url)

Certain endpoints take time to process to they return a url for you to check on the status of the operation. To parse out the operation id use this method. Now you can use #get_operation_status to check the id.

  operation_status_url = identification.create_enrollment(profile_id, short_audio, path_to_audio)
  # => "https://api.projectoxford.ai/spid/v1.0/operations/EF217D0C-9085-45D7-AAE0-2B36471B89B5"
  identification_operation_id = identification.get_operation_id(operation_status_url)
  # => "EF217D0C-9085-45D7-AAE0-2B36471B89B6"

get_operation_status(operation_id)

Check on the status of an operation by passing in the operation id (use #get_operation_id to get the id)

  identification.get_operation_status(identification_operation_id)
  # => 
  # {
  #  "status" => "succeeded", 
  #  "createdDateTime" => "2016-09-23T02:01:54.6498703Z",
  #  "lastActionDateTime" => "2016-09-23T02:01:56.054633Z",
  #  "processingResult" => {
  #    "identifiedProfileId" => "49a59333-ur9d-4387-wd06-880cfby0215f", 
  #    "confidence"=>"High"
  #   }
  # }

delete_profile(profile_id)

Delete a particular profile from the service.

  profile_id = "1234567890"
  identification.delete_profile(profile_id)
  # => true

get_all_profiles

Returns a list of all the profiles for this account.

    identification.get_all_profiles
    # => 
    # [
    #   {
    #     "identificationProfileId" => "111f427c-3791-468f-b709-fcef7660fff9",
    #     "locale" => "en-US",
    #     "enrollmentSpeechTime" => 0.0
    #     "remainingEnrollmentSpeechTime" => 0.0,
    #     "createdDateTime" => "2015-04-23T18:25:43.511Z",
    #     "lastActionDateTime" => "2015-04-23T18:25:43.511Z",
    #     "enrollmentStatus" => "Enrolled" //[Enrolled | Enrolling | Training]
    #   }, ...
    # ]

get_profile(profileId)

Returns a profile's details

  profile_id = "1234567890"
  identification.get_profile(profile_id)
  # =>
  #   {
  #     "identificationProfileId" => "111f427c-3791-468f-b709-fcef7660fff9",
  #     "locale" => "en-US",
  #     "enrollmentSpeechTime" => 0.0,
  #     "remainingEnrollmentSpeechTime" => 0.0,
  #     "createdDateTime" => "2015-04-23T18:25:43.511Z",
  #     "lastActionDateTime" => "2015-04-23T18:25:43.511Z",
  #     "enrollmentStatus" => "Enrolled" //[Enrolled | Enrolling | Training]
  #   }

reset_all_enrollments_for_profile(profileId)

Resets all the enrollments for a particular profile

  profile_id = "1234567890"
  identification.reset_all_enrollments_for_profile(profile_id)
  # => true

identify_speaker(profile_ids, short_audio, audio_file_path)

Identify a speaker by calling this method with an array of enrolled profile_ids. Use short_audio to wave the required 5-second speech sample. The audio sample to be analyzed should ideally be 30 seconds, with a maximum of 5 mins.

  profile_ids        = ["49a46324-fc4b-4387-aa06-090cfbf0214f", "49a36324-fc4b-4387-aa06-091cfbf0216b", ...]
  path_to_test_audio = '/path/to/some/audio_file.wav'
  short_audio        = true
  operation_url      = identification.identify_speaker(profile_ids, short_audio, path_to_test_audio)
  # => "https://api.projectoxford.ai/spid/v1.0/operations/EF217D0C-9085-45D7-AAE0-2B36471B89B6"
  identification_operation_id = identification.get_operation_id(operation_url)
  # => "EF217D0C-9085-45D7-AAE0-2B36471B89B6"
  identification.get_operation_status(identification_operation_id)
  # => 
  # {
  #  "status" => "succeeded", 
  #  "createdDateTime" => "2016-09-23T02:01:54.6498703Z",
  #  "lastActionDateTime" => "2016-09-23T02:01:56.054633Z",
  #  "processingResult" => {
  #    "identifiedProfileId" => "49a46324-fc4b-4387-aa06-090cfbf0214f", 
  #    "confidence"=>"High"
  #   }
  # }

Verification API

Verify that a person is who they say they are - this is a text-dependent api. Prior to being able to verify a speaker, a speaker (profile) must send three audio samples (from an API provided list) with their enrollment.

verification = VoiceId::Verification.new("MS_speaker_recognition_api_key")

list_all_verification_phrases

Get a list of accepted scripts to use when sending your audio sample.

  verification.list_all_verification_phrases
  # =>
  #  [
  #   {"phrase" => "i am going to make him an offer he cannot refuse"}, 
  #   {"phrase" => "houston we have had a problem"}, 
  #   {"phrase" => "my voice is my passport verify me"}, 
  #   {"phrase" => "apple juice tastes funny after toothpaste"}, 
  #   {"phrase" => "you can get in without your password"}, 
  #   {"phrase" => "you can activate security system now"}, 
  #   {"phrase" => "my voice is stronger than passwords"}, 
  #   {"phrase" => "my password is not your business"}, 
  #   {"phrase" => "my name is unknown to you"}, 
  #   {"phrase" => "be yourself everyone else is already taken"}
  #  ]

create_profile

Same as Identification API

create_enrollment(profile_id, audio_file_path)

Requires 3 enrollments. Pick 3 of the acceptable phrases from #list_all_verification_phrases and enroll them.

  verification.create_enrollment("49a46324-fc4b-4387-aa06-090cfbf0214f", '/path/to/audio/make_him_an_offer.wav')
  # =>
  #   {
  #     "enrollmentStatus" => "Enrolling",
  #     "enrollmentsCount" => 1,
  #     "remainingEnrollments" => 2,
  #     "phrase" => "i am going to make him an offer he cannot refuse"
  #   }

delete_profile(profile_id)

Same as Identification API

get_all_profiles

Same as Identification API

get_profile(profile_id)

Same as Identification API

reset_all_enrollments_for_profile(profile_id)

Same as Identification API

verify_speaker(profile_id, audio_file_path)

User (profile) would have had to enroll with 3 of the accepted phrases (#list_all_verification_phrases). Once the phrases have been accepted, a recording of one of the accepted phrases can be checked against an enrolled profile.

  verification.verify_speaker("86935587-b631-4cc7-a59t-8e580d71522g", "/path/to/audio/offer_converted.wav")
  # =>
  #   {
  #    "result" => "Accept", 
  #    "confidence" => "High", 
  #    "phrase" => "i am going to make him an offer he cannot refuse"
  #   }