Bespoken LLM Benchmark: Does ChatGPT know more than Google and Amazon?
Learn more
April 4, 2024 in Blog

Notes From The Field – How Much Testing Is Enough?

TL;DR Explore the delicate balance of software testing, emphasizing efficient automation while avoiding over-testing. Considering factors like test type and team size, I propose two practical approaches: comprehensive flow coverage or starting small and expanding gradually.

Building on my last post, a common topic I discuss with customers is – how much should I test? When I was getting started with Bespoken, my default answer was: “As much as possible!”

But as our business has matured, and I have had the chance to work with many more customers, I have a slightly more nuanced answer:

  • Test
  • Not too much*

Typically when customers come to us, we find they are performing all their testing manually. Functional and regression testing are inconsistent in scope and frequency, and often poorly documented. This is not the right answer, and leveraging automated testing saves time, reduces errors, and improves overall product quality and user satisfaction. All (very) good things, right?

But there is too much of a good thing. Part of it lies in teams that want to figure out how to automate everything before automating anything  – this sort of analysis paralysis does no one any favors.

If that trap is avoided, then the next challenge after initial testing and monitoring is in place is, “Where do I stop?” Here, my answer is nuanced – this depends on a few factors:

  • The type of testing (functional, regression, integration, etc.) – more on that in a later post
  • The stability of the system (generally more stable means more coverage)
  • The size of the team working in QA and devops
  • The size of the system being tested (both in terms of code/flow paths as well as integrations and third-party dependencies)

So here are a couple of heuristics/practical goals you might start with, either of which are a great basis for an automated testing regimen

100% flow coverage

One nice thing about conversational AI systems such as chatbots and IVRs is that often, there are not that many code paths, and writing tests to cover all of them are feasible. Now, that does not mean writing tests for every utterance in a system – it means writing a test for each unique path through a flow. So, focus on covering all the INTENTS, not all the utterances or slot values.

This is a good approach for well-staffed teams with systems that are of a manageable size.  What’s more, automated test generation, either from reverse-engineering flows or transcribing call logs, part of our offering at Bespoken, can help jump-start this approach.

~20% flow coverage, add more tests as issues are uncovered

A second approach, one that is perhaps not as satisfying to QA purists but brings many practical benefits, is to aim for around 20% test coverage initially (or even easier, 20 tests), and then build from there based on where errors are found. That means when bugs are reported, write an automated test to identify the bug, and then run the test again to verify it is fixed. This is a time-honored approach to automated testing, and it operates on a simple principle – when I fix something, I want to make sure it says fixed, forever. This is a readily achievable, satisfying goal to aim for.

Regardless of the approach your team takes, I encourage everyone to think through upfront what their objectives are for QA and test automation and come up with a thoughtful, practical process and architecture. It’s the key to ensuring quality software – in the short and long term.

All the best,

John Kelvie

CEO and Co-founder, Bespoken

Leave a Reply

Your email address will not be published. Required fields are marked *