Did you know Alexa skills are not limited to Alexa’s voice? Your skill can respond to users with pre-recorded audio, a nice way to add some variety and personality to the listener’s experience.
By using a voice other than Alexa, skills can take on a distinct persona, one consistent with your brand or something altogether new and creative. It’s an awesome feature we use all the time! Here’s how we’ve been playing audio with Alexa.
The Audio Experience
To experience how produced audio sounds, check out the We Study Billionaires or Rise Above skills.
In the case of We Study Billionaires, the host of the podcast, Preston Pysh, is the voice that introduces the listener to the skill and explains how to use it. It’s a great way for the podcasters to extend their relationship with their audience. Take a listen:
Using SSML
So, how do you create great content like this? It relies on using SSML, which stands for Speech Synthesis Markup Language. SSML provides a variety of useful capabilities, including new SpeechCons and the ability to specify pronunciation as well as pauses in Text-To-Speech. Here is how SSML using custom audio content looks:
<speak> <audio src="https://example.com/audio/MyAudio.mp3" /> </speak>
The full response payload including the SSML looks like this:
"version":"1.0",
"response":{
"shouldEndSession":false,
"outputSpeech":{
"type":"SSML",
"ssml": "<speak> <audio src=\"https://example.com/audio/MyAudio.mp3\"></audio> </speak>"
Encoding Audio
So SSML is pretty easy, right? The catch with using produced audio is in making it available to Alexa. There are a few things to keep in mind with this:
- The audio must be publicly accessible via HTTPS
- The audio cannot be longer than 90 seconds
- The audio must be encoded as MP3 at 16 kHz and 48 kbps
You can take a look at the full requirements for audio here.
So how do you make sure that you have met all these requirements? You can use a popular tool like ffmpeg for your encoding. Here is an example command showing how it is used:
ffmpeg -y -i input.mp3 -ar 16000 -ab 48k -codec:a libmp3lame -ac 1 output.mp3
Hat tip to StackOverflow for this!
Audacity is also a popular and user-friendly encoding option – just follow these steps.
Alternatively, you can use our BSTEncode (*) tool. It works with Javascript – to use it simply call:
encoder.encodeFileAndPublishAs("input.mp3", "Introduction.mp3", function(error, encodedURL) {
done();
});
The encoder will automatically properly format the audio file and upload it to S3 for usage in your skill. Take a look at our streamer for a full example. The BSTEncode command allows audio to be encoded on the fly, so it can be added and changed as needed for a skill.
We hope that is useful! We think produced audio sounds awesome and is a great addition to any skill. It also can work well hand-in-hand with the AudioPlayer, which allows playback of long-form audio like music and podcasts.
Our goal at Bespoken is to make Alexa development as easy as possible, so if you have questions or comments, talk to us on Gitter. And stay updated through GitHub.
(*) Please note – our Virtual Alexa project has replaced the BSTAlexa classes. You can read more on Virtual Alexa here.
VoiceXP was here.