Speech Recognition Component (Yandex Cloud SpeechKit)

Table of Contents

Description

Performs caller speech recognition into text form.
Sber SaluteSpeech on-premise service is used, channel GRPC.

Connection points are configured in domain settings, field 'sber_salute'.

Allows you to start file pre-playback interrupted by the caller’s voice.

Allows you to interrupt on silence after a spoken phrase. Allows you to quickly cut off a single phrase by means of the recognition service, as well as customize waiting for multiple phrases.

Table 1. System Characteristics

Index

223

Short title

asr_yandex

Types of scenarios

IVR

Starter module

r_sip_ivr_script_component_asr_sber

Mode

Asynchronous

Icon

223

Branching pattern

Branching, interrupting

Properties

Table 2. Properties
Specification Description

Title: Account Sber Salute
Code: accountKey
Visibility: no
Default: default

Specifies the account that defines the connection points to the service Sber SaluteSpeech.
The list includes the 'default' value that sets the root 'speech' fields in the object to be used 'settings.sber_salute'.
Additionally, the keys of the 'settings.sber_salute.accounts' object are listed, each of which also has an object with separately configured access parameters.

Title: Set of Grammars
Code: model
Visibility: no
Default: callcenter

Sber SaluteSpeech recognition service parameter: grammar set name.

Title: Language
Code: lang
Visibility: no
Default: ru-RU

Sber SaluteSpeech recognition service parameter: recognition language.
Possible options:

  • ru-RU (0) – Russian language

  • en-US (1) – English language

  • kk-KZ (2) – Kazakh language

Title: `A profanity filter
Code: profanityFilter
Visibility: no
Default: `Disable'

Sber SaluteSpeech Recognition Service Parameter: switch for profanity filter.

Title: Some Suggestions
Code: multiUtterance
Visibility: no
Default: `Off'

Sber SaluteSpeech Recognition Service Parameter: Waiting for multiple sentences.

If off, the response comes quickly along with detecting the end of the first sentence.
If enabled, all results are collected and glued together. The end of recognition is defined by the parameters 'Recording timeout, s' and ''Silence interval, s''".

"Title: Sentence length maximum, s
Code: maxSpeechTimeoutSec
Visibility: no
Default: 20

Sber SaluteSpeech recognition service parameter: maximum sentence length.

Title: In Cyrillic
Code: forceCyrillic
Visibility: no
Default: `Off'

Sber SaluteSpeech Recognition Service Parameter: Force conversion of result to Cyrillic.

Title: Recording timeout, s
Code: recordTimeoutSec
Visibility: no
Default: 30

Maximum allowable recording time from the end of preplay, in seconds.

Title: Break by DTMF
Code: checkDTMF
Visibility: no
Default: `none'

DTMF detector switch. Opens the settings for the character save and operation interrupt modes.

Title: Buffer for DTMF
Code: dtmfBuffer
Visibility: yes
Default: — 

Variable to store received DTMF characters.

Title: Clear buffer DTMF
Code: clearDtmfBuffer
Visibility: yes
Default: `Yes'

Buffer pre-clearance switch DTMF.

Title: Number of characters
Code: maxSymbolCount
Visibility: yes
Default: — 

An argument containing a limit on the number of characters that can be entered.
When the specified number of DTMF characters is received during the component execution, the recording is automatically terminated and the last portion of voice data is sent to the recognition service.

Title: Interrupt Symbols
Code: interruptSymbols
Visibility: yes
Default: — 

A string containing sequences of interrupt characters separated by commas.
When a character sequence matching one of the specified interrupt sequences is detected at the end of the DTMF buffer, the recording is automatically terminated and the last portion of data is sent to the recognition service.
For example, *, 7, 123, 9395.

Title: Interrupt when silence is detected
Code: abortOnSilence
Visibility: no
Default: `Yes'

Voice Detector (VAD) switch to automatically end recording and send the last portion of voice data to the recognition service.
The criterion for stopping is the presence of a voice for at least 300 ms and its subsequent absence for the specified interval.

Title: `Interval of Silence, from
Code: silenceTimeoutSec
Visibility: yes
Default: 2

Interval for the Voice Action Detector (VAD) to automatically stop recording and send the last portion of voice data to the recognition service.
Applies when 'Interrupt when silence is detected' is enabled'.

Title: VAD threshold, dB
Code: vadThreshold
Visibility: yes
Default: 30

An argument containing the VAD threshold at which the presence of voice is detected when crossed upwards, in decibels.
Any noise with a level below the threshold is considered as silence.

Title: Response timeout, s
Code: responseTimeoutSec
Visibility: no
Default: 5

Timeout for waiting for a response from the Yandex Cloud SpeechKit recognition service after sending it the last portion of voice data.
When the timeout expires, control is passed to the next component on the Time branch.

Title: Result to variable
Code: varText
Visibility: no
Default: — 

Variable to save the text result of recognition.

Title: Normalized result into a variable
Code: varNormText
Visibility: no
Default: — 

Variable to save the normalized text result of the recognition.

Title: Response code to variable
Code: varCode
Visibility: no
Default: — 

Variable to store the response code of the recognition service (the code is emulated similar to the HTTP: 200, 408, 500).

Title: Response body to variable
Code: varBody
Visibility: no
Default: — 

Variable to store the full content of the recognition service response.
If one statement - an object, if multiple statements - an array of objects.

Title: Save Record File
Code: saveRec
Visibility: no
Default: `None'

Switch to save the record file sent to the recognition service.

Title: File path to a variable
Code: varRecordPath
Visibility: yes
Default: — 

Variable to store the path to the record file.
The file is placed in a temporary directory of the script and will be deleted when the script completes.
Long-term file retention requires the script to further move the file to a stationary storage location.

The recording is made on the server with the mg role serving the current call, and then transferred to the server with the ivr role serving the current scenario. The transfer always takes place within the site.

Title: Preview
Code: prePlayFile
Visibility: no
Default: — 

Pre-play audio file to the subscriber during which the voice detector is also activated.
If there is no voice from the subscriber (taking into account the noise threshold of the VAD detector), no data is sent to the recognition service.

Can be selected in one of the modes:

  • a static file attached to the script (loaded from the Script Editor application or via the API);

  • argument-formed path, which must include one of the filecategories.

Title: Transition
Code: transfer
Visibility: no
Default: — 

Component to which control is passed in case of successful completion of the operation.

Title: Transition, Time
Code: transferTimeout
Visibility: no
Default: — 

The component to which control is passed in case the timeout period for HTTP response from the recognition service has expired.

Title: Transition, Error
Code: transferError
Visibility: no
Default: —