Speech Synthesis component (Azure Cloud Speech)

Table of Contents

Description

Performs text-to-speech synthesis and playback to the current caller.
Azure Speech Cloud is used, channel REST HTTP (streamed). +

For MS Azure account registration procedure and service cost see on https://learn.microsoft.com/en-us/azure/ai-services/speech-service/.

Setting up a connection under domain settings, field 'azure_cloud'.

Allows you to play back the result, as well as simply record it to an audio file.

In playback mode, speech starts playing when 1.5 seconds of data appears in the data buffer.
Depending on whether caching is configured on the synthesis service and whether the data for the synthesized text is available, the output of the first one and a half seconds may take 300 to 1500 ms.

Table 1. System Characteristics

Index

230

Short title

tts_azure

Types of scenarios

IVR

Starter module

r_sip_ivr_script_component_tts_sber

Mode

Asynchronous

Icon

230

Branching pattern

Branching, interrupting

Properties

Table 2. Properties
Specification Description

Title: Account Azure Speech
Code: accountKey
Visibility: no
Default: default

Specifies the account that defines the connection points to the service Microsoft Azure Speech.
The list includes the 'default' value that sets the root 'speech' fields in the object to be used 'settings.azure_cloud'.
Additionally, the keys of the 'settings.azure_cloud.accounts' object are listed, each of which also has an object with separately configured access parameters behind it.

Title: Mode
Code: mode
Visibility: no
Default: `Play'

Component operation mode after receiving a response from the service TTS Azure Speech.
Possible options:

  • Play (play, 0) - Play the synthesized speech to the subscriber and then delete the file.

  • Generate File (file, 1) - Saves the audio file to a local temporary directory and returns the file path to a variable.

Title: Content Type
Code: contentType
Visibility: no
Default: SSML

The format of data transmitted in the request body (content_type).
Possible options:

  • SSML (ssml, 1) – SSML-markup with text.

Title: Text
Code: text
Visibility: no
Default: — 

SSML markup with text.
In addition to the text, the markup specifies the voice model used.

Example:

<speak version='1.0' xml:lang='en-US'>
  <voice xml:lang='en-US' xml:gender='Male' name='en-US-ChristopherNeural'>
     I'm excited to try text to speech!
  </voice>
</speak>

Title: Caching
Code: useCache
Visibility: no
Default: no

Speech synthesis results caching mode.

Caching is intended to speed up the output of results when the same jobs are frequently accessed by the synthesis service, as well as to reduce the load on the synthesis service.
If synthesis data changes regularly, it is detrimental to cache such operations, as a procedure for periodically clearing them would have to be defined and configured.

The first time the file is generated by the service and saved to the folder if the value is missing ':GlobalShare/domains/DOMAIN/cache/sber_tts/…​'.
In the future, when a file is found in the cache, there are no calls to the TTS Azure Speech service, the file is copied from the cache.

The unique file name hides the mapping of synthesis parameters: text, language, voice.
Files from the cache are not automatically deleted. The procedure for deleting obsolete data must be configured separately. The criterion can be, for example, the time of the last access to the file.

Possible options:

  • Disable (0) - Caching is not used. Each request is sent to the service for synthesis TTS Azure Speech.

  • Enable (1) - Caching is in use.

Title: Break by DTMF
Code: checkDTMF
Visibility: no
Default: `None'

DTMF detector switch. Opens the settings for the character save and operation interrupt modes.

Title: Buffer for DTMF
Code: dtmfBuffer
Visibility: yes
Default: — 

Variable to store received DTMF characters.

Title: Clear buffer DTMF
Code: clearDtmfBuffer
Visibility: yes
Default: `Yes'

Buffer pre-clearance switch DTMF.

Title: Number of characters
Code: maxSymbolCount
Visibility: yes
Default: — 

An argument containing a limit on the number of characters that can be entered.
When the specified number of DTMF characters is received during component execution, playback operation is completed.

Title: Interrupt Symbols
Code: interruptSymbols
Visibility: yes
Default: — 

A string containing sequences of interrupt characters separated by commas.
When a character sequence matching one of the specified interrupt sequences is detected at the end of the DTMF buffer, the playback operation is completed.
For example, *, 7, 123, 9395.

Title: Response timeout, s
Code: responseTimeoutSec
Visibility: no
Default: — 

Timeout to wait for a response from the TTS Azure Speech service after sending it a request.
When the timeout expires, control is passed to the next component on the Time branch.

Title: Response code to variable
Code: varHttpCode
Visibility: no
Default: — 

Variable to store the HTTP response code of the recognition service.

Title: File path to a variable
Code: varFile
Visibility: yes
Default: — 

Variable to save the path to the synthesized speech audio file in the local temporary directory.
Long-term file retention requires the script to further move the file to a stationary storage location.

Title: Transition
Code: transfer
Visibility: no
Default: — 

The component to which control is passed if the operation is successfully completed.

Title: Transition, Time
Code: transferTimeout
Visibility: no
Default: — 

The component to which control is passed when the time to wait for an HTTP response from the service has expired.

Title: Transition, Error
Code: transferError
Visibility: no
Default: — 

The component to which control is passed if an error occurs.