DevicesEvery human-machine spoken interaction begins and ends on a device. The recent boom in this technology has put devices of all shapes, sizes, and prices within our reach. From smart speakers like Google Home to glasses with which you can interact with Alexa, everywhere we look the Internet of Things is getting connected to voice.
ASRAutomatic Speech Recognition is the process that converts what has been said into text. This process is also known as STT (Speech-To-Text). ASR is one of the pillars of voice interactions since once the user’s request has been converted into text it is possible for a computer to determine what the user has requested, and therefore, to execute the correct computer program that satisfies the user’s need.
NLUThe role of Natural Language Understanding is to determine the user’s intention when issuing an oral request. If ASR is the literal transcript of what was said, NLU finds the meaning and purpose of the request made. One of the most essential tasks NLU performs is to understand the context of what has been said to discover the correct intention requested by the speaker. Let’s look at this sample interaction:
- What is the weather forecast for Barcelona on Friday?
- Moderate showers are expected in the afternoon.
- Then, book a flight at 11.
Cloud ProcessingHere lies the code that handles the user’s requests. When the Dialog Manager / NLU have identified the user’s intent, the related code is executed, and a response is generated for the user. To minimize response times it is very common to store and execute the code of voice apps in the cloud. Amazon Alexa encourages skills developers to use AWS Lambda, and Google Actions code is usually stored in Google Functions. Though these platforms offer great performance as well as flexible pricing (developers only pay for usage), they are not mandatory. Voice app code can also be exposed as web services from within your servers. You can rapidly accelerate your development cycle by running your voice app code locally, avoiding time-consuming deployments to the cloud while you are coding and debugging/testing easily with Bespoken CLI. Read here to learn how to get started.
TTSWhen the voice app code has generated a response, usually in text, it should be transmitted to the user in audio. The technology in charge of doing this transformation is called Text-To-Speech or Speech Synthesis. Since user requests can be made in multiple languages, the Google Assistant or Amazon Alexa platforms have different voices that allow reply back in the same language and possibly with the same accent to the user.
Where in the Voice First World is the AI?The NLU, and in general the Natural Language Processing (or NLP, a discipline that encompasses the NLU and incorporates others such as the NLG – Natural Language Generation), is a type of AI. These components use AI techniques such as Machine Learning or Neural Networks. Machine Learning is used to perform a job based on a series of examples. This is why it is said that Amazon Alexa or Google Assistant are always learning; they use all the utterances the users issue as input to their learning activities. Likewise, neural networks allow systems to perform activities such as skill discovery or contextual slot carryover, that is, to identify and obtain parameters of our conversations within the context of a conversation.
IoT and Voice First
All the components reviewed here can be packaged in small kits, and incorporated into almost any IoT device or experience. Miniaturization of hardware and cloud computing allow these things to be always connected to the Internet, delivering on the promise of IoT with user-friendly voice interaction. Through voice interaction technology the Internet of Things opens up to a world of new use cases and experiences, enabling smart devices to understand human needs and deliver what users want, regardless of location or context.
According to Gartner Research, there will be an estimated 20.4 Billion IoT devices deployed by the year 2020 and voice will be the primary mode of interaction of this world of connected ‘everything’. If you’re working in the area of connected products, your voice strategy can’t wait any longer – and without testing automation, your team will struggle to meet ship dates and deliver optimal voice experiences that meet users’ needs and deliver delightful brand experiences.
Bespokenis here to make sense of this new voice first world, and our world-leading testing automation solutions and services ensure your voice-first applications ship on time and deliver delightful user experiences and the outstanding app ratings your team deserves. Contact us today to learn more – we’re here to help!
(*) Icons attributions (all from the Noun Project):
- Amazon Echo by Ben Davis
- Cloud processing by Cezary Lopacinski
- wifi by Peter Borges
- Smart Car by Trevor Dsouza
- Smart Home by Manop Leklai
- Smartwatch by DTDesign
- Smart Speaker by mgpjm
- starfleet by Harpal Singh
- Robot vacuum cleaner by Gan Khoon Lay
- spray sprinkler by ProSymbols
- screen by Dinosoft Labs
- Amazon Dot by Nick Bluth