Bespoken LLM Benchmark: Does ChatGPT know more than Google and Amazon?
Learn more

Automated Testing,
Training and Monitoring
for Chatbots

Building chatbots is challenging – quality, accuracy and reliability are all critical to deliver high levels of customer satisfaction.

And though AI has improved by leaps and bounds in recent years, it still requires constant attention to work well. That is where Bespoken can help out.
Get Started

We provide a full-cycle program for managing AI/NLU-based systems:

These stages are run cyclically,
iteratively, and continuously.


Real-world and Artificially-Generated Text and Speech Interactions, Across Accents, Dialects, and Background Environments.


Measure the performance of the system, identify problem areas, and suggest revisions.


Tune ASR and NLU Models to improve performance. Adjust model parameters, add training data, and fix configuration issues.


Constantly track what is happening with the live system issues.

Crowd-Sourced User Testing

We assist our customers with initial utterance gathering using our Device Service in conjunction with crowd-sourced task testing providers such as Amazon Mechanical Turk and Applause. Our team will gather input from real users to assist with:

  • Functional testing: Ensure the application works correctly with real users.
  • Utterance gathering: Acquiring a complete picture of what real users will say and how they will say it.
  • Usability Evaluation: Validate the design and UX of your application via objective and subjective feedback from actual users.

The output of our crowd-sourced testing is then used as the basis for creating a comprehensive automated testing regimen.

Automated Testing

Our automated testing assists across several key concerns for chat-based systems:

  • Functional Testing: ensuring the system works correctly and is bug-free, automatically and repeatably.
  • Monitoring: we run tests on a routine basis to ensure everything is working well in your system. If there are issues, we let you know right away.
  • Accuracy Testing: we measure performance to ensure that users are consistently understood across every utterance. When they are not understood correctly, we make specific recommendations.

All of this is driven off our unified API for conversation:

We support testing Conversational AI via the following platforms and channels:

And that is not a complete list of all the platforms we support – if you don’t see your preferred one listed, just reach out – we probably have you covered, and if not, it’s a small effort for us to extend our Device Service to support your platform/channel of choice.


Ongoing Reporting
and Alerting

Beyond the individual results for each test run, we also provide reporting on what is happening over time. Take a look here:

What’s more, you can setup highly granular alerting to notify you and your team when particular events occur. This is particularly useful to distinguish between critical events that demand an “all-hands-on-deck” response versus more routine bugs and minor, temporary outages.

Getting Started

It’s easy to get started – you can check out our sample project here:
Or just email us at and we will be happy to set you up with a guided trial.