Adding Text to Speech to Your IVR Using SaaS (Google Cloud Functions)

I’ve been on a text-to-speech and speech-to-text kick lately. My last post talked about using AWS S3 and Amazon Transcribe to convert your audio files to text and in previous articles I’ve covered how to create temporary prompts using Poly so you can build out your contact center call flows. Well, now we’re going to expand our use case to allow a traditional on premise call center to leverage the cloud and provide dynamic prompts. My use case is simple. I want my UCCX call center to dynamically play some string back to my caller without having to use a traditional TTS service.

First, this is not new in any way and other people have solved this in different ways. This Cisco DevNet Github repo provides a method to use voicerss.org to generate TTS for UCCX. However, this process requires loading a jar file in order to do Base64 decoding. Then there’s this Cisco Live presentation from 2019, by the awesome Paul Tindall, who used a Connector server to do something similar. To be fair the Connector server allowed for a ton more functionality than what I’m looking for.

Screen Shot 2021-09-15 at 3.38.30 PM

Cisco Live Presentation

Second, I wanted this functionality to be as easy to use as possible. While functionality keeps getting better for on premise call center software there are still limitations around knowledge to leverage new features and legacy version that can’t be upgraded that makes it harder to consume cloud based services. I wanted the solution to require the least amount of moving parts possible. That means no custom Java nor additional servers to stand up.

The solution I came up with leverages Google’s cloud (GCP) specifically Cloud Functions. However, the same functionality can be achieves used AWS Lambda or Azure’s equivalent. At a high level we have an HTTP end point where you pass your text string to and in return you will get a wav file in the right format which you can then play back.

Blank diagram

Flow Diagram

The URL would look something like this:

https://us-central1-myFunction.cloudfunctions.net/synthesize_text_to_wav?text=American%20cookies%20are%20too%20big

The Good Things About This

  • Pay as you go pricing for TTS. Looking at the pricing calculator a few hours of TTS a month would run under $2.00/month.
  • Infinitely scalable. If you’re handling 1 call or 100 calls your function will always return data.
  • Easy to use.

The Bad Things About This

  • There is a delay between making the request and getting the wav file. I’ve seen as long as 7 seconds at times. I would only use this in a very targeted manner and ensure it didn’t affect the caller experience too drastically.
  • Requires your on premise IVR to have internet access. Often time this is a big no no for most businesses.

Some initial testing with UCCX is showing some positive results. I’m going to investigate if there’s a way to accelerate the processing in order to keep the request and response in under 3 seconds as well as adding the ability to set language, voice, and even SSML via arguments. If you want to build this yourself here’s the code for the function.

def synthesize_text_to_wav(request):
"""Synthesizes speech from the input string of text."""
text = request.args.get('text')

client = texttospeech.TextToSpeechClient()
input_text = texttospeech.SynthesisInput(text=text)
voice = texttospeech.VoiceSelectionParams(
language_code="en-US",
name="en-US-Standard-C",
ssml_gender=texttospeech.SsmlVoiceGender.FEMALE,
)
audio_config = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.MP3
)
response = client.synthesize_speech(
request={"input": input_text, "voice": voice, "audio_config": audio_config}
)

src_file_path = '/tmp/output.mp3'
dst_file_path = '/tmp/output.wav'

# make sure dir exist
os.makedirs(os.path.dirname(src_file_path), exist_ok=True)

# The response's audio_content is binary.
with open(src_file_path, "wb") as out:
out.write(response.audio_content)
print('Audio content written to file "output.mp3"')
AudioSegment.from_mp3(src_file_path).export(dst_file_path, format="wav", codec="pcm_mulaw", parameters=["-ar","8000"])
return send_file(dst_file_path

Be awesome!

~david

Transcribe Your Audio Files To Migrate to Amazon Connect

This is an update to an earlier post covering the same thing now with updated code.

As we are working our way out of Cisco UCCE to Amazon Connect we find ourselves needing to transcribe thousands of prompts. I wanted to revisit this piece of code to ensure it is still working. If you want to use this and are starting from scratch here are the steps you need to take:

– Install Studio Code
– Install Python 3.9
– Create the folder where you will keep your project.
– Create a virtual environment.
– Activate your virtual environment.
– pip install python-dotenv, boto3, pandas
– *Remove the profile_name or update it.
– Update the .env file with the region you’ll be using.

You can find the full source code here.

The script works like this: It creates an S3 bucket, grabs the first file, checks if the file is in S3 and uploads it, creates a transcription job, waits for the transcription to complete, grabs the results, writes a CSV. I’ve tried to catch as many potential errors as possible, but I’m sure there are some lingering. Expect the transcription to take around 1 minute per file. Assuming normal IVR prompts.

AWS Transcribe* I have many AWS profiles, which might not the be case for others. If you only have a single profile change this line session = boto3.session.Session(profile_name=’MyProfile’) to session = boto3.session.Session()

I hope this helps others.

~david

Using AWS Transcribe to get IVR prompt verbiage

I’m a stickler for documentation and a bigger stickler for good documentation. Documentation allows the work you’ve produced to live on beyond you and help others get up to speed quickly. It feels that documentation is one of those things everyone says they do, but few really follow through. There’s nothing hard about it, but it’s something you need to work on as you’re going through your project. Do not leave documentation to the end, it will show. So DO IT!

I’ve recently started a new project to migrate an IVR over to CVP. To my pleasant surprise the customer had a call flow, prompts, a web service definition document, and test data. A dream come true! As I started the development I noticed that the verbiage in the prompts didn’t match the call flow and considering my “sticklerness” I wanted to update the call flow to ensure it matches 100% with the verbiage.

I’m always looking for excuses to play around with Python, so that’s what I used. I hacked together the script below which does the following:

  • Creates an AWS S3 bucket.
  • Uploads prompts from a specific directory to bucket.
  • Creates a job in AWS Transcribe to transcribe the prompts.
  • Waits for the job to be completed.
  • Creates a CSV names prompts.csv
  • Deletes transcriptions jobs
  • Deletes bucket

The only things you will need to change to match what you’re doing is the following:

local_directory = 'Spanish/'
file_extension = '.wav'
media_format = 'wav'
language_code = 'es-US'

The complete code is found below, be careful with the formatting it might be best to use copy it from this snippet:

</pre>
<pre>from __future__ import print_function
from botocore.exceptions import ClientError

import boto3
import uuid
import logging
import sys
import os
import time
import json
import urllib.request
import pandas

local_directory = 'French/'
file_extension = '.wav'
media_format = 'wav'
language_code = 'fr-CA'

def create_unique_bucket_name(bucket_prefix):
    # The generated bucket name must be between 3 and 63 chars long
    return ''.join([bucket_prefix, str(uuid.uuid4())])

def create_bucket(bucket_prefix, s3_connection):
    session = boto3.session.Session()
    current_region = session.region_name
    bucket_name = create_unique_bucket_name(bucket_prefix)
    bucket_response = s3_connection.create_bucket(
        Bucket=bucket_name,
    )
    # print(bucket_name, current_region)
    return bucket_name, bucket_response

def delete_all_objects(bucket_name):
    res = []
    bucket = s3Resource.Bucket(bucket_name)
    for obj_version in bucket.object_versions.all():
        res.append({'Key': obj_version.object_key,
                    'VersionId': obj_version.id})
    # print(res)
    bucket.delete_objects(Delete={'Objects': res})

s3Client = boto3.client('s3')
s3Resource = boto3.resource('s3')
transcribe = boto3.client('transcribe')
data_frame =  pandas.DataFrame()

# Create bucket
bucket_name, first_response = create_bucket(
    bucket_prefix = 'transcription-',
    s3_connection = s3Client)

print("Bucket created %s" % bucket_name)

print("Checking bucket.")
for bucket in s3Resource.buckets.all():
    if bucket.name == bucket_name:
        print("Bucket ready.")
        good_to_go = True

if not good_to_go:
    print("Error with bucket.")
    quit()

# enumerate local files recursively
for root, dirs, files in os.walk(local_directory):
    for filename in files:
        if filename.endswith(file_extension):
            # construct the full local path
            local_path = os.path.join(root, filename)
            print("Local path: %s" % local_path)
            # construct the full Dropbox path
            relative_path = os.path.relpath(local_path, local_directory)
            print("File name: %s" % relative_path)
            s3_path = local_path
            print("Searching for %s in bucket %s" % (s3_path, bucket_name))
            try:
                s3Client.head_object(Bucket=bucket_name, Key=s3_path)
                print("Path found on bucket. Skipping %s..." % s3_path)
            except:
                print("Uploading %s..." % s3_path)
                s3Client.upload_file(local_path, bucket_name, s3_path)
                job_name = relative_path
                job_uri = "https://%s.s3.amazonaws.com/%s" % (
                    bucket_name, s3_path)
                transcribe.start_transcription_job(
                    TranscriptionJobName=job_name,
                    Media={'MediaFileUri': job_uri},
                    MediaFormat=media_format,
                    LanguageCode=language_code
                )
                while True:
                    status = transcribe.get_transcription_job(TranscriptionJobName=job_name)
                    if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:
                        break
                    print('Transcription ' + status['TranscriptionJob']['TranscriptionJobStatus'])
                    time.sleep(25)
                print('Transcription ' + status['TranscriptionJob']['TranscriptionJobStatus'])
                response = urllib.request.urlopen(status['TranscriptionJob']['Transcript']['TranscriptFileUri'])
                data = json.loads(response.read())
                text = data['results']['transcripts'][0]['transcript']
                print("%s, %s "%(job_name, text))
                data_frame = data_frame.append({"Prompt Name":job_name, "Verbiage":text}, ignore_index=True)
                print("Deleting transcription job.")
                status = transcribe.delete_transcription_job(TranscriptionJobName=job_name)

#Create csv
print("Writing CSV")
data_frame.to_csv('prompts.csv', index=False)

# Empty bucket
print("Emptying bucket.")
delete_all_objects(bucket_name)

# Delete empty bucket
s3Resource.Bucket(bucket_name).delete()
print("Bucket deleted.")</pre>
<pre>

I hope this helps someone out there create better documentation.

~david