Bespoken LLM Benchmark: Does ChatGPT know more than Google and Amazon?
Learn more
November 21, 2019 in Blog

Zero To Tested In Sixty Minutes – How To Get Started With Automated Testing for Voice

TL;DR Today, 9 out of 10 enterprises have recognized the importance of voice for their customers and are investing in development and release of voice apps in the next 12-24 months. But over 40% of skills are used once and abandoned while most of them have a rating below 3.5 stars. Automated testing for voice will guarantee a positive voice user experience that will get you higher ratings. Get started here.

Are you facing any of the challenges below?

Let me show you simple and practical ways to get started with automated testing for voice.

No matter the stage of development you’re in, we have a solution that will meet your needs and ensure the quality of your voice apps.

This article can also help you to create a POC to demonstrate the benefits and cost savings associated with test automation for Alexa Skills and Google Actions.

Let’s begin!

Best Practices for Getting Started

Here are some tips to help you achieve your goals:

  • Select a small but representative set of use cases to test your voice apps: Start small and grow. The first thing is to make sure you understand how to perform automated testing for your voice applications. For this, we are going to make a small introduction to each one of our tools. The goal is that you will be able to create a first and simple set of tests and learn how to execute them.
  • Define the type of test to perform: This will depend on which stage you are at in your voice app’s development cycle. Once you have understood our tools you will be able to choose the most appropriate one for your needs. If you have already developed a voice app or are in the POC stage, we recommend you jump directly to End-to-end testing and Usability Performance Testing (UPT).
  • Use Bespoken to start creating and executing test scripts: We’ll see this later, but remember that testing is very easy and the benefits are enormous, including cost savings, faster/cheaper releases, and reduction of bad reviews.
  • Compile results and compare with manual testing: Once you have executed your automated tests with Bespoken, you will be able to immediately compare its benefits versus the execution of manual tests.

Preparation: Get and Install Bespoken CLI

As a prerequisite to start testing your voice app, you need to have Node.js installed on your computer. If you haven’t already, go here and install the appropriate version for your system. We recommend choosing the LTS version (with the installer).

Now it’s time to get and install the Bespoken CLI. To do that, open your command prompt and type:

$ npm install -g bespoken-tools

Note: If you are on MacOS and the command fails, it’s probably because you need to run it with sudo, like this:

$ sudo npm install -g bespoken-tools

Unit Testing your Voice Apps

You need to do unit testing to ensure that the individual pieces of your code are working correctly. To that end, you can write unit test scripts to verify each intent and each major piece of functionality. While it’s never too late to start Unit Testing, we recommend implementing it as early in your Voice Development Lifecycle as possible.

Folder Structure and Filename Conventions

Before you start creating unit test scripts, it’s important to know how to organize the files and results. This is our recommendation:

  • Create a test folder under the root of your voice app project.
  • Create a unit folder under your test directory to store your unit test script files and the testing.json configuration file.

Here’s an example:

And for the naming conventions we suggest to follow the next structure:

[file-name].test.yml

file-namecan equal the main module name (e.g. index) or you can also define it depending on the type of test you are performing (simple, advanced, 1stRelease, etc.)

A simple unit test script

Let’s start with a very simple yet complete unit test script:

---
configuration:
locale: en-US
---
- test: Launching and testing intent with slot values
- LaunchRequest: Welcome to Pet Match
- PetMatchIntent size=small pet=dog:
- prompt:
- Would you prefer a dog to hang out with kids or to protect you?
- Are you looking for more of a family dog or a guard dog?
- AMAZON.StopIntent: Bye

This test script is composed of two YAML documents, each one starting with three dashes (- – -). The first one represents a configuration section that will apply to the entire test script. In this example, we are only defining the locale for the test.

The second YAML document is a test case, defined by the first line after the three dashes with the reserved word test and a brief description of it. After the test line we have a sequence of interactions, each one starting with a dash and a blank space (in YAML, spaces and tabs are important; learn more about the syntax here).

A typical interaction within a test script file is composed of two parts separated by a colon (:). The first part is an utterance. Since our unit tests are bypassing the voice platforms and just executing locally against your code, you can just use the name of the intent you want to invoke (e.g. LaunchRequest); during End-to-end Testing, which actually uses those voice platforms, you’d use the actual utterance instead (e.g. open my skill).

After the colon, we have the expected result, which is what the voice app should respond with. The expected result can be simple, like just a phrase (Welcome to Pet Match), or it can be more complex like a prompt with multiple valid responses or a card object.

How do the tests work? Easy! We send the utterances to the voice app when we get the actual response we compare it to the expected result, if there is a match, the test will pass, otherwise it will fail.

Notice that it is also possible to invoke an intent providing the slot values it requires, making it a one-shot utterance.

Now, let’s take a look at a more complex example:

---
configuration:
locale: en-US
---
- test: Using succinct syntax to invoke the intent with slots.
- tags: FirstUse
- LaunchRequest:
- response.outputSpeech.ssml: /.*how many people are playing today.*/i
- response.shouldEndSession: false
- sessionAttributes.STATE: _START_MODE
- GetPlayerNumber Number=1: please tell us your name
- GetContestantName PlayerName=jordi: "let's start the game: jordi”
- GetContestantPrice Number=149:
- response.outputSpeech.ssml: /.*you said 149*/i
- sessionAttributes.STATE: _GAME_ROUND
- sessionAttributes.players[0].name: jordi
- AMAZON.StopIntent:
- response.outputSpeech.ssml:
- Hope to see you soon
- See you around
- Nice playing with you

In this example, the tags reserved word allows you to define terms that will be evaluated at execution time, including or excluding test cases as defined in the testing.json configuration file (see the next section for more details).

As you can see, it is possible to use regular expressions (i.e. /.you said 149/i) and wildcards (*) in the expected response. Additionally, you can evaluate any element in the response payload, like the outputSpeech or even the sessionAttributes elements.

The Unit testing.json Configuration File

This file is used to define the configuration options for unit testing your voice apps. It’s typically kept in the test/unit folder of your project. The file looks like this:

{
"handler": "../../index.js",
"trace": "false",
"jest": {
"silent": true,
"collectCoverageFrom": [
"index.js"
]
},
"include": ["FirstUse"],
"exclude": ["broken"]
}

Some things to highlight:

  • The handler key is used to set the location of the main module of your voice app.
  • The trace and silent keys are used for debugging purposes. If silent is set to false, console messages won’t be displayed.
  • The include and exclude elements are used to define which test cases to execute at run time based on the tags added to the test scripts. You can also use the override properties when executing the test scripts as shown below:
$ bst test --include FirstUse,ReturningUser --exclude broken

Read here to learn all the configuration options for the testing.json file.

Running test scripts

To execute a unit test script, just open a terminal and navigate to the folder where you have created it, then use the test command like this:

$ bst test

You will see an output similar to this:

The results in red means that the test case failed. Based on this, we can either fix our test or fix the code.

The summary at the bottom tells us about the success of the tests, as well as basic code coverage info.

To see more detailed code coverage info, we can go to coverage/lcov-report/index.html. The coverage directory will be located in the same directory where your testing.json file is located.

It is important to mention that the environment variable UNIT_TEST is automatically set when running Unit Tests. This can be used to craft unit tests that run more predictably, like this:

sessionAttributes.guessNumber = Math.floor(Math.random() * 100);
// For testing purposes, force a number to be picked if the UNIT_TEST environment variable is set
if (process.env.UNIT_TEST) {
sessionAttributes.guessNumber = 50;
}

Advanced Topics

  • To make it easy to test Dynamo connections locally we have created a mock for it. For more information about how to use it read here. We have mocked the Address API too, read here for more information.
  • See this example to know how to unit test multi-locale voice apps.
  • Check this simple script to know how to enable Continuous Integration with Travis.

End-to-end testing your voice apps

End-to-end tests focus on testing:

  • The voice app as a whole (from Alexa/Google through infrastructure to voice app).
  • The utterance resolution aka speech recognition accuracy.
  • The interaction models.

You need to do E2E testing because it’s critical to ensure your voice app behaves as expected once deployed on the assistant. Most voice apps work with other services and use different pieces of technology, and so testing only your code (i.e. just doing Unit Testing) is no guarantee you are free from errors. Since users have shown their tolerance for errors in voice apps is far lower than with GUI apps, the best way you can increase retention and engagement is to catch these errors before your users do.

Just like with unit testing, we can create simple End-to-end test scripts to perform this type of functional testing.

Setup: Create a Virtual Device

First things first. To start testing your voice app you have to create a Bespoken Virtual Device, which works like a real physical device (such as an Amazon Echo) but exists only as software. Follow the instructions described here to get your Virtual Device Token. You will later use this token to execute your test scripts.

Folder Structure and Filename Conventions

Similar to what we did for unit testing, we suggest creating a separate folder to store your End-to-end test scripts. This folder should have a different location than the unit test folder since regularly this type of test is often executed by a different team (QA). See our suggestions here.

An End-to-end Test Script

The next test script is a complete example that will help us to explain some of the most important features.

---
configuration:
locale: en-US
voiceId: Joanna
---
- test: Launch request, no further interaction
- tags: NLUTest
- <speak>open <phoneme alphabet="ipa" ph="kwɪk lɪst">quick list</phoneme></speak>: "*"
---
- test.only: Launch request followed by a sequence of interactions
- open INVOCATION_NAME
- what is on my list:
- prompt:
- you have the following items on your list *
- here is your* list
- goodbye:
- prompt:
- talk to you soon
- bye bye happy shopping
- merry christmas to you soon
- /^$/ # This regex is equivalent to empty response or prompt = ""
  • The first YAML document configures the execution of the test. In this case, we are just defining the locale and the Amazon Polly voice ID to use. In End-to-end testing, we convert the utterances you provide to speech using Amazon Polly or Google Text-to-Speech, capture the actual audio from the voice platform, convert them back to text, and compare them with the expected results defined in the test script. Read here to know the extra configuration settings that can be added to the testing.json.
  • The second YAML document is a test case. It contains a tag which can be later used to exclude or include test cases during run time. In this example, we are also sending as an utterance an SSML expression using a phonetic dictionary. This is very helpful when you have an utterance that may be difficult for the voice platform to pronounce by default. Alternatively, we could have just included:
open quick list: *
  • The third test case is using the instruction .only, which means that it will run only this case when the entire script is executed. We are also using a find/replace term, INVOCATION_NAME, to invoke the app; this parameter is defined in the testing.json file below. Finally, notice how it is also possible to use wildcards and regular expressions in the expected results part of the tests.

The End-to-end testing.json configuration file

The configuration file for End-to-end testing is similar to the one we have seen in the previous section for unit testing. There are many available properties that we can use; let’s look at an example:

{
"type": "e2e",
"findReplace": {
"INVOCATION_NAME": "quick list",
"INVOCATION_NAME_DEV": "quick list development"
},
"homophones": {
"is": ["as", "does", "it's"],
"two": ["to", "2"],
"contestant": ["contested"]
},
"trace": false,
"jest": {
"reporters": [
"default",
[
"./node_modules/jest-html-reporter",
{
"includeFailureMsg": true,
"pageTitle": "Bespoken Test Report"
}
]
],
"silent": false,
"testMatch": ["**/test/*.yml", "**/tests/*.yml", "**/*.e2e.yml"]
},
"virtualDeviceToken": {
"alexa": {
"en-US": "alexa-xxxx-xxxx-xxxx-xxxx-xxxx"
},
"google": {
"en-US": "google-xxxx-xxxx-xxxx-xxxx-xxxx"
}
}
}
  • The findReplace key is used to set parameters in the test scripts. As we have seen in the End-to-end example we can use it to define parameters like INVOCATION_NAME that later can be translated into different text.
  • Use homophones to fix issues with the speech-to-text during test execution. Remember the speech-to-text is performed on the audio coming FROM Alexa. Homophones are defined in the homophones section of the testing.json file. The left side (e.g. contestant) will be used when any of the values on the right side (e.g., “contested”) are returned in the transcript. Read more about homophones here.
  • The virtualDeviceToken element is used to add the virtual devices we are going to use in our tests. You will need one virtual device per each locale (e.g. en-US, fr-FR, etc.) and per each platform (e.g. Alexa or Google) that you want to test.

Advanced Topics

  • See this example to learn how to End-to-end test multi-locale voice apps.
  • Check this other example where just one test script is used to End-to-end test an app deployed on both Alexa and Google.

More testing tools

  • Usability Performance Testing (UPT): This tool allows you to send hundreds or thousands of utterances to your voice app to verify how the ASR and NLU are working. The first run of UPT is used to get a score (% of successfully understood utterances/slots) that will serve as a baseline for future executions. After each execution, we can help you analyze the results and make suggestions about how to improve your interaction model. We have seen improvements of more than 80% in the speech recognition before launching! Check out this case study to learn more.
  • Monitoring: Once you are sure your app understood your users, and have launched with confidence, it is important to keep an eye on it. Your code might be unbreakable, but what if your voice service releases an update that makes your voice app stop performing as expected? How would you know? We have created monitoring for exactly these types of scenarios, giving you peace of mind by alerting you when your voice app stops working. Be the first to know and avoid bad reviews. Read here to learn how to get started.
  • IVR Testing: Do you have an IVR with deep levels of interactions? Are you lost within the functional branches and uncountable paths your users can take? We can help to automate tests for your IVR. Define all the interactions you want and run them hundreds of times if you wish. Contact us for a demo.

I hope this guide has been helpful to you. If you need more help, don’t hesitate to contact us for a demo to show you how easy, quick, and affordable it is to start testing your voice portfolio with our tools.

Leave a Reply

Your email address will not be published. Required fields are marked *