No, the title is not a reference to a new rock band (or an old one – though it perhaps would have been a good alternative name for The Kinks). Instead, most are familiar with the infinite monkey theorem – that a limitless group of monkeys typing away in perpetuity would eventually replicate Shakespeare’s work.
Thinking about the future of bots and bot UX, the theorem comes to mind, though with a twist: we can see the monkeys as a complement instead of a substitute to Shakespeare – Shakespeare is the central, guiding AI, the genius, while third-party developers and systems are the humble, hard-working *ahem* monkeys. Together, they can craft experiences that surpass what is achievable by either alone.
A Better Programming Model But Is It A Better Bot UX?
Earlier, I wrote about why the Alexa programming model is better. The case for it is based on how for programmers, creating Alexa skills (or Actions on Google, Slack bots, etc.) is based on a simpler API, one that requires no on-device installation, is wholly contained within a single request/response payload, and has far less functional complexity to wrestle with. Compared to mobile apps, it means almost infinitely fewer configuration issues, radically easier upgrades, and far fewer API-related bugs, among other benefits.
Underpinning all of this is the principle that there is just less to do as a programmer. But that also makes a big assumption – that a simpler model can deliver real value: what does this matter if it’s not going to create great user experiences? And who cares that there are less bugs if an app barely does anything?
A key aspect needs more attention, which is how AI will knit these simple programming experiences together to create rich and delightful user experiences. Luckily, we are already seeing this happen, and we can only expect it to improve.
How AI Helps
There are several aspects of artificial intelligence that are brought to bear in creating great bot experiences.
Speech To Text
Speech-to-text is the bread-and-butter of Voice-based bot platforms. Speech-to-text has improved radically over the last few decades, where it is now on parity with what human translators can do.
And it has improved in other perhaps less well-known but equally important ways – the Amazon Echo is hugely innovative with its far-field recognition – the ability to recognize a speaker from a distance. The inclusion of an array of microphones on the speaker takes it from what might have been an interesting parlor trick to an essential user interface.
Similarly, wake word recognition is also critical. Properly recognizing when the user says “Alexa” or “OK Google”, while simultaneously discarding cases where the user does NOT, is key to a positive user experience. And it has traditionally been a surprisingly and extremely difficult problem to solve (it is the reason why you cannot just pick any phrase or word you want to address Alexa or Google Assistant). The Echo nailed this out of the gate, and now they have made it amazingly easy for other device manufacturers to take advantage of their innovation.
Text To Intent
A key aspect of any bot platform is its ability to translate what the user says into what the user meant. Creating meaning from free-form text is the heart of natural language processing.
The current platforms are in their nascency with this. For example, with Alexa there are built-in intents like “Help” that are triggered by saying “Help”, but also saying “Help Me” or “Help Now”. By and large, though the programmer needs to explicitly specify what phrases become which intents, referred to as the interaction model. Google Actions takes a similar approach, though Google provides the API.AI platform that is more advanced in correctly guessing what the user meant, as well as having user-friendly tools that make it easy to guide and correct.
But clearly, this is an area where machine learning can be applied to excellent effect, and where we expect rapid advances. As the systems see more and more user utterances, they will learn what intents they should be mapped to. The release of Amazon’s built-in intent library is a great example of how this is progressing.
Intent To Action
Perhaps the greatest limitation one currently encounters with Alexa and other bot platforms is having to explicitly direct requests to a specific “fulfiller” (a skill in Amazon’s parlance, an Action on Google Home, a bot on Facebook Messenger, etc.). For the user, this is often a distraction – finding the particular bot one wants to interact with is a tedious discovery process. It is also a frustration for developers, as they struggle to make users aware of their offerings.
How does this get solved? Google is already showing the way – they provide an ability to fulfill user requests via third-party code without specifying which third-party to use.
In some cases, some of an intent’s query patterns can trigger your action, even if users don’t use your invocation name. For example, if the user asks “I want recipes”, the Google Assistant may respond with “OK, for that, try saying let me talk to Personal Chef”. While these hints offer a powerful discovery tool for users, don’t be overly general as these patterns are only triggered if the Google Assistant doesn’t know how to handle the user’s query. The Google Assistant will match queries similar to the ones you specify, so an exhaustive list is not recommended. Provide a few examples as hints to the Google Assistant for the kind of action phrases you support. As a result of this expansion, users can say more phrases than just the examples you give in the action package. Of course, these query patterns are not unique to your agent and therefore the Google Assistant will determine which agent (or agents) to suggest for users. This determination is based on what’s best for the user.
And, of course, Google, Amazon and others already have their own fulfillment – they are using it, to varying degrees of success, when one asks simple questions like “How’s the weather today?”
The Impact On Bot Programming
We expect each of these areas to improve rapidly over the coming months and years. And to bring it back to our initial question – this progress will enable programmers to keep doing less and less.
Less code per skill – big “monolithic” skills can become “micro-skills” that are tethered together by the AI.
Less code/configuration to translate between text input and intents.
Less effort to market their skills – legions of skills can be brought to bear without any foreknowledge by the user (employing algorithms, such as Google describes above, to determine which skill is most relevant and useful to any particular request). Of course, this will likely mean skill and action SEO will become an important consideration. Something we can all to look forward to, to be sure 😉
The Holy Grail – Effortless Interwoven Actions
The collaboration between AI and third-party services presents tantalizing possibilities. Others have expressed this better than I ever could – one specific vision comes, of all places, from a patent application by the Samsung Viv team.
In a dynamically evolving cognitive architecture system based on third-party developers, the full functionality is not known in advance and is not designed by any one developer of the system. While some use cases are actively intended by developers of the system, many other use cases are fulfilled by the system itself in response to novel user requests. In essence, the system effectively writes a program to solve an end user request. The system is continually taught by the world via third-party developers, the system knows more than it is taught, and the system learns autonomously every day by evaluating system behavior and observing usage patterns.
(Thank you to Max Mansoubi for bringing this to my attention!)
We at Bespoken, as active voice and bot developers, welcome this brave new world. We are proud to be useful cogs in this system, fueling its intelligence and pushing it to greater heights. After all, as one great rocker said, we’re all just apemen anyway.
The Plug
Our goal at Bespoken is to make Alexa development as easy and accessible as simian-ly possible, so add your questions and/or comments below. And stay updated on our work through GitHub.
[…] Voice user experience is impacted by both the performance and effectiveness of a voice app. It is easier for outside observers to understand effectiveness and it often leads to interesting discussions. This is where analytics shine. […]