Alexa Skill Automation – Testing, Integration, and Delivery

Developers love automated testing and deployment, but up until recently, it has not been possible for voice apps. Luckily, with Bespoken’s suite of tools, first-class automation and testing are now achievable! We took one of our homegrown skills and used our new Virtual Device SDK for end-to-end tests, along with our unit-testing tools, and have turned it into a showcase for best practices for Alexa skill testing and automation.

We brought together:

  • ASK CLI – for Lambda deployment and skill updates
  • Virtual Alexa – for skill unit tests
  • Virtual Device SDK – for end-to-end testing using the Alexa Voice Service (AVS)
  • Circle CI – for continuous integration and deployment
  • Codecov – for code coverage tracking and reporting

This gives us a full-featured automation platform – one that ensures our skill is always working. And it’s a complete level of assurance – thanks to Virtual Alexa and the Virtual Device SDK, we are able to ensure that things are working at both a code-level as well as a whole system. The net result is a fun little skill that is a serious showcase for automation. We’ll go through it, piece-by-piece, starting with our Alexa skill automation for unit testing.

Unit Testing

For unit-testing Alexa skills, we use our Virtual Alexa library. Virtual Alexa emulates the behavior of Alexa and generates JSON request payloads as if they were coming from it. Combined with traditional unit testing tools such as Mocha or Jest, it’s a great way to ensure code quality. Here is a sample test:

(For a more complete writeup on unit-testing and this skill, take a look here)

Continuous Integration (CI)

Now that our unit-tests are in place, our next step is setting up continuous integration to run these tests whenever we make changes. There are a lot of great tools for this – we prefer CircleCI, but Travis, Jenkins, CodeShip, etc. are also great choices.

For running our unit tests, we need a circle.yml file, with a line like so in it:

The last line, npm run test is the key one.

After doing some setup (such as establishing which Node version to run, and installing the ASK CLI), this actually runs the tests. Our project is setup so that every push triggers them.

Here is what our dashboard in Circle looks like for our last few runs:

All green, which is great – feel free to take a look for yourself.

Code Coverage

With our unit tests in place, the next piece is code coverage. For this, we use CodeCov, which is another tool that is free for open-source projects (same as CircleCI). It is easy to work with and provides nice graphs and visualizations of what’s happening with your unit tests over time.

Check out their interactive sunburst graph – it’s a fun way to explore unit test coverage.

End-To-End Testing

Also known as integration testing, but we use the term end-to-end to distinguish it from the “typical” unit-testing done via a CI system. In our case, we are deploying our code to a dev environment every time we commit to master – more on that in a moment. But before we do that deployment, we want to make sure our system as a whole is working. To that end, we use our Virtual Device SDK.

What is the difference between our Virtual Device and Virtual Alexa libraries? Great question – the essential one is that Virtual Alexa just emulates Alexa – it mimics its behavior. The Virtual Device SDK is Alexa. It actually uses the Alexa Voice Service (AVS) to send real audio to Alexa, and in turn to our skill. Both are testing our skill but in different ways.

Virtual Alexa is for:

  • Running unit tests against code, with minimal dependencies
  • Measuring depth of testing and code coverage
  • Ensuring code is working properly

The Virtual Device SDK, on the other hand, is best for:

  • Ensuring the interaction model is configured properly (remember, it is using the real Alexa Voice Service)
  • Ensuring infrastructure (such as Dynamo and S3) is all in place and working correctly
  • Ensuring there are no speech recognition issues

The last point we will expand on in future posts, but suffice it to say, most Alexa developers have run into the situation where they designed an interaction model that looked great on paper but did not survive first contact with the “enemy”: real users speaking to real devices.

The Virtual Device SDK can help tease these issues out. And since it’s part of our Alexa skill automation process, it will ensure as intents are added, as the code is enhanced, and as Alexa’s machine learning evolves, everything is still working perfectly. Awesome, right?

So, enough background – let’s look at an actual integration test:

It looks pretty similar to our unit test. Not surprising – they are using the same entry point, an utterance, to test with. But as just explained, under the covers, it is quite different. Also note that the expectations on our tests are a bit simpler, such as this line:

assert.include(result.transcript, "your product is");

We have more narrow tests, because what Alexa gives us back is audio. Our Virtual Device SDK performs speech-to-text on this, which is pretty accurate. But it’s not perfect – so we write out tests with enough specificity to know things are working, but not so much they are vulnerable to quirkiness from the speech-to-text.

Now, let’s bring it all together.

Continuous Deployment

In CircleCI, we set up our continuous deployment to run our end-to-end tests whenever commits are made to master. Then we deploy.

Our deployment is done via a shell script, which uses the ASK CLI. Our shell script:

  • Sets up the AWS credentials (from environment variables securely set in Circle)
  • Sets up the ASK credentials (again, from secure environment variables)
  • Packages the Lambda code into a zip file
  • Uploads it using the ASK CLI

We use shell script because it has so many handy file manipulation tools. It allows us to succinctly and easily do all the steps above.

With that in place, our deployment is set to run automatically whenever pull requests are merged. So we know that when updates are made, a new development version will be delivered to our testers to work with right away. Everything is in sync, and we have a smooth, highly-assured build pipeline.

And what about production? We do not auto-deploy to production – a manual step is required. But it’s a simple one – just tag a release with a name like “prod-*” and it will be pushed to production. In this way, we use a manual trigger to kickoff our automated workflow.

Summary

We’ve gone through a lot here – unit testing, CI, CD, end-to-end-testing.

We hope it has all been helpful. We will be expanding on these different points in future posts – we know it is a lot take in all at once.

Feel free to use this project as a template for creating your own highly-automated, highly-tested Alexa skill pipeline. And if you would like to go into more depth, as well as talk to the author (John Kelvie), sign up for one of our webinars as we do a deep dive on skill testing and automation – you can register here for a webinar the week of December 11th or 17th.