Bespoken LLM Benchmark: Does ChatGPT know more than Google and Amazon?
Learn more
April 14, 2020 in Blog

🌟 Create delightful and flawless voice experiences with Bespoken

How to increase user retention and engagement by catching errors before users do
TL;DR Publish your voice experiences with confidence. Learn how to ensure the quality of your creations in a simple, accessible and cost-effective way. Test automation is within your reach! Read on to learn how to take advantage of its benefits.

It is not enough to provide first-class content or delightful functionality to ensure the success of voice applications. An engaged user will return only if his experience has been satisfactory, that means, in addition to having his needs fulfilled, not finding any errors during the time he has used our product, and the only way to ensure that happens is by testing.

We know that it is not advisable to trust only in unit tests (even with a code coverage above 90%) because we can be very good at programming and analyzing all possible scenarios, but we cannot be certain that the components or services we use are always working properly. End-to-End (E2E) testing is necessary.

And when we say E2E testing for voice experiences we mean:

  • The voice app as a whole (from Alexa/Google through infrastructure to voice app)
  • The speech recognition
  • The interaction model and the natural language understanding

At Bespoken we have focused on minimizing the complexity of E2E testing for voice. And as a result of our efforts, we offer simple and powerful tools that allow complex use-cases to be verified with straightforward test scripts. In addition to the ease of creation and maintenance, one of the major advantages of using test scripts is that their execution can be automated quite simply, reducing manual work and optimizing the use of resources.

However, all this has not been enough for us, and we wanted to go a step further by building plug-and-play solutions to bring you test automation at the click of a button. Sounds great, right?

In this article we’ll explore our comprehensive and recommended approach to voice testing, it’s the best guarantee to highly engaged and happy users!

How It Works

Our solution leverages a state of the art approach that encompasses:

  • Scheduling
  • Monitoring
  • Reporting
  • Notifications

It is ready to go on day one yet still allows for extensive customization. To get started,  it is as easy as creating a test script using our simple YAML-based syntax. It looks like this:

The test can be executed from a CI environment like GitLab, or your console. The test output is shown in the console, and a report is created under the test_output/report folder. The report looks like this:

We also leverage an amazing tool called DataDog for real-time visual and historical reporting as well as notifications. DataDog provides incredible flexibility and allows for data to be sliced and diced as needed. Here is our standard reporting dashboard:

How To Get Started

Take a look at our End-toEnd Automation Solution here:

We provide directions there on how to get set up. The quick highlights are:

  • Create a Bespoken account
  • Make a copy of our sample project
  • Add in your tests (as shown here)
  • Run it!

It is that easy. And if you have questions, just send an email to us at

Going Deeper

Custom Scheduling

One of the big benefits of using industrial-strength DevOps tools like GitLab is that it allows for setting any schedule you like for running tests. With the GitLab scheduler, you can:

  • Run a subset of tests every hour
  • A full suite of tests every day
  • In-depth speech recognition tests every week

These schedules can be set and configured in a matter of minutes. See how to do it here.

Custom interceptor

We use a filter in our sample project. A filter is a module used to do extra processing at some stages of the test execution. In this case, we use the filter to capture and send metrics onTestEnd to DataDog.

Filters are referenced in the configuration file like this:

  "filter": "node_modules/bespoken-datadog-plugin/index.js"

Notice the reference points to the node_modules folder since the filter has been defined as a dependency (see the package.json file):

"dependencies": {
      "bespoken-datadog-plugin": "^1.1.0"

Take a look at the plugin here, and feel free to extend and customize it to meet your test reporting needs.

Leveraging Multi-Locale Testing

Check this sample project to know how to test a skill supporting German, English, and Spanish with just a single set of tests. The sample project includes unit and E2E tests and is part of the tutorial series we have done with Dabble Lab, check the video here.

Leveraging Multi-Platform Testing

We love to simplify the challenges of voice testing, this is why you can also use a single set of tests to check both versions of your app, for Alexa, Google Assistant, and potentially other platforms as well. See how to do it in this sample project.

Speech Recognition Testing

Besides running functional and regression tests with our E2E test scripts, we highly recommend testing and tuning the speech recognition before deploying your voice experiences. 

Speech recognition and NLU testing helps you to identify which utterances are not being understood correctly, and just as importantly, verifies if your invocation name is free from issues.

These kinds of tests leverage a variety  of phrasings and accents. You can even simulate challenging sound conditions like crowded places, or noisy environments like streets or public transportation stations.

Bespoken Test Robot

What if your testing needs are beyond smart speakers. You might have deployed your apps on platforms like Siri, Facebook Portal, or even you might want to test IoT devices with display and voice capabilities. How can we perform extensive testing without resorting to tedious manual testing? Simple, leave it to our test robots

Bespoken’s test robots are the world’s best diagnostic tools for voice that perform E2E testing using analog output and input. This way we are not just testing the software, but the hardware as well. Our Test Robot ensures your device delivers flawless and incredible Voice Experiences to your customers. Contact us to get started.

Bespoken – Just 30 minutes can save 300 hours

Difficult to believe? Perhaps, but it just happens to be true! We’ve saved our customers more than 100,000 hours of manual testing in the last year alone. And feel free to put us to the test. Reach out to our team – we can tell you all about our solution, then get you set up with a free demo that proves everything we describe above. Or just get cranking with our sample project:

Yours in testing and tuning,


Leave a Reply

Your email address will not be published. Required fields are marked *